# How to find Censoring Rectangles

From the [r/Place dataset post](https://www.reddit.com/r/place/comments/txvk2d/rplace_datasets_april_fools_2022/):

> _Inside the dataset there are instances of moderators using a rectangle drawing tool to handle inappropriate content. These rows differ in the coordinate tuple which contain four values instead of two–“x1,y1,x2,y2” corresponding to the upper left x1, y1 coordinate and the lower right x2, y2 coordinate of the moderation rect. These events apply the specified color to all tiles within those two points, inclusive._

It may be useful to find all the times that the moderators used the rectangle tool to censor content. For example, you might want to make a NSFW compilation of the last moments of each artwork right before it gets censored.

We can do this by loading the original gzipped CSV file, filtering out only the rows with four coordinates, and storing them in a dataframe.


In [3]:
import pandas as pd

csv_iterator = pd.read_csv(
    "../../data/2022_place_canvas_history.csv.gzip",
    chunksize=100_000,
    compression="gzip",
)

df = pd.DataFrame(columns=["timestamp", "user_id", "pixel_color", "coordinate"])

for chunk in csv_iterator:
    for row in chunk.itertuples():
        if len(row[4].split(",")) == 4:
            df.loc[len(df)] = row[1:]

df = df.sort_values(by=["timestamp"], ignore_index=True)
df


Unnamed: 0,timestamp,user_id,pixel_color,coordinate
0,2022-04-01 14:44:08.158 UTC,UY7pWDspuiKPKlsEMYNjkKYnoLkwPW0/ezIl++dHkTpFQ5...,#898D90,862540868544
1,2022-04-01 14:46:23.652 UTC,yVC1UzwQLjSlhK6lV9kzwwlj0nzEfZ31d1EnR4enIpDpRO...,#898D90,862540873545
2,2022-04-01 14:46:39.702 UTC,2ES5f0pi/aKJosil5Q8+t4zrjOhkOgZKcvIOCVLW6djCrc...,#898D90,871546878550
3,2022-04-03 23:03:29.93 UTC,q/Dk6lmcXm8bcDbNIhDglz7kFuCmX6zkca9UPivDix5WWi...,#FFB470,29817703341803
4,2022-04-03 23:05:04.703 UTC,m8NEcPbf5XRV5ppeuZ3KLIYAG8GuHkNIxOEsCD06Ey8I1E...,#FFB470,29818053291839
5,2022-04-03 23:08:50.038 UTC,LKS2u3QL2N3Olv7rnUCWry4KJ5K4Ea+/9qKyadTNl01apE...,#FFB470,25717362961780
6,2022-04-03 23:10:36.803 UTC,HkR0yRQUJ1wsjh4Zo4VdKE43IctIGMFS9VuVm9IyCFcPOA...,#FFF8B8,25118052961812
7,2022-04-03 23:12:51.382 UTC,7JiQyrONpFJphvBEPVUGyxjBsdvU8fuiSVzpMuTkxFDjqt...,#FFF8B8,27118352961859
8,2022-04-03 23:29:52.139 UTC,gS0DWvPgaiQkHvG4NsHveLvpn8uf50t+sOY3nIykDvkyEd...,#FFB470,29717503641813
9,2022-04-04 01:22:50.891 UTC,q+XjkQ6WRx0aBLtb2xRGWBrsHALXejJxEE2hs6sDJHfN2L...,#000000,1349171814241752


It might be more useful to have the timestamp in milliseconds, like in the Parquet dataset we used to create the color and age maps.

In [4]:
from datetime import datetime

# The length of time in milliseconds after 1970-01-01T00:00:00.000 UTC that
# the first pixel was placed in r/Place 2022.
START_TIME = 1648806250315


def parse_timestamp(timestamp):
    """Convert a YYYY-MM-DD HH:MM:SS.SSS timestamp to milliseconds after the start of r/Place 2022."""
    date_format = "%Y-%m-%d %H:%M:%S.%f"
    try:
        # Remove the UTC timezone from the timestamp and convert it to a POSIX timestamp.
        timestamp = datetime.strptime(timestamp[:-4], date_format).timestamp()
    except ValueError:
        # The timestamp is exactly on the second, so there is no decimal (%f).
        # This happens 1/1000 of the time.
        timestamp = datetime.strptime(timestamp[:-4], date_format[:-3]).timestamp()

    # Convert from a float in seconds to an int in milliseconds
    timestamp *= 1000.0
    timestamp = int(timestamp)

    # The earliest timestamp is 1648806250315, so subtract that from each timestamp
    # to get the time in milliseconds since the beginning of the experiment.
    timestamp -= START_TIME

    return timestamp


df = df.assign(timestamp=df.timestamp.apply(parse_timestamp))
df

Unnamed: 0,timestamp,user_id,pixel_color,coordinate
0,7197843,UY7pWDspuiKPKlsEMYNjkKYnoLkwPW0/ezIl++dHkTpFQ5...,#898D90,862540868544
1,7333337,yVC1UzwQLjSlhK6lV9kzwwlj0nzEfZ31d1EnR4enIpDpRO...,#898D90,862540873545
2,7349387,2ES5f0pi/aKJosil5Q8+t4zrjOhkOgZKcvIOCVLW6djCrc...,#898D90,871546878550
3,209959615,q/Dk6lmcXm8bcDbNIhDglz7kFuCmX6zkca9UPivDix5WWi...,#FFB470,29817703341803
4,210054388,m8NEcPbf5XRV5ppeuZ3KLIYAG8GuHkNIxOEsCD06Ey8I1E...,#FFB470,29818053291839
5,210279723,LKS2u3QL2N3Olv7rnUCWry4KJ5K4Ea+/9qKyadTNl01apE...,#FFB470,25717362961780
6,210386488,HkR0yRQUJ1wsjh4Zo4VdKE43IctIGMFS9VuVm9IyCFcPOA...,#FFF8B8,25118052961812
7,210521067,7JiQyrONpFJphvBEPVUGyxjBsdvU8fuiSVzpMuTkxFDjqt...,#FFF8B8,27118352961859
8,211541824,gS0DWvPgaiQkHvG4NsHveLvpn8uf50t+sOY3nIykDvkyEd...,#FFB470,29717503641813
9,218320576,q+XjkQ6WRx0aBLtb2xRGWBrsHALXejJxEE2hs6sDJHfN2L...,#000000,1349171814241752
