---
# Compute the Dependent Variable 'aggressiveLanguage'

The original file for the askscience subreddit has 4 parts- part 1= reading the data, part 2= Compute the 'aggressiveLanguage' variables for every comment, part 3=Compute the 'aggressiveLanguage' variables for each unique user pair, part 4=Update the 'aggressiveLanguage' variables for each comment with the average from each user pair. <br>

This notebook does only part 1 and 2.

**Other than these steps each row needs to be given a timestamp value, which is done after part 1 and part 2**


---

OUTPUT FILE:
1. 'data_result_askscience_subreddits_aggressive_language.csv': Contains the threat, insult, and toxicity scores for each comment<br>


additional details about the data-<br>
the askscience unprocessed data has 26605 comments<br>
there are 13270 comments left after rows with either author as '[deleted]' or body as '[removed]' are removed<br>
there are 7478 user pairs for which there is a network similarity<br>
there are 7478 user pairs for which there is a cultural similarity<br>
thus there should be 7478 unique user pairs<br>
thus for 7952 comments, there is a network similarity and cultural similarity and a parent comment, out of which there are 7478 unique user pairs and 474 repeated user pairs (5.9 percent repeats)


references for detoxify-<br>
https://www.kaggle.com/code/renokan/toxic-comments-using-detoxify-model/notebook?scriptVersionId=87256021 <br>
https://github.com/unitaryai/detoxify <br>



.



.

---
# **Part 1: Reading the data**

In this section, I have read the the output file 'data_fifteen_subreddits_similarity.csv' from the python notebook 'Comment_Level_Network_Cultural_Similarity_askscience.ipynb'. This file contains all the comments for which both network similarity and cultural similarity were computed.

---
.

Check if cuda is being used

In [1]:
import torch
if torch.cuda.is_available():
    device_name = torch.device("cuda")
else:
    device_name = torch.device('cpu')
print("Using {}.".format(device_name))

Using cuda.


Connect to drive

In [2]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


reading the file which contains the comments for which both network and cultural similarity, along with parent author was computed

In [3]:
import pandas as pd
import numpy as np
data_similarity_askscience_subreddits = pd.read_csv('/content/gdrive/MyDrive/Colab Notebooks/askscience/data_similarity_askscience_subreddits.csv', low_memory=False, index_col = 0)
print(len(data_similarity_askscience_subreddits)) #length of data = 16370
print(len(pd.unique(data_similarity_askscience_subreddits['subreddit']))) #number of subreddits considered = 1
print(len(pd.unique(data_similarity_askscience_subreddits['id']))) #unique number of comments = , the data is at the comment level =
print(len(pd.unique(data_similarity_askscience_subreddits['parent_id']))) #number of parent nodes =
print(len(pd.unique(data_similarity_askscience_subreddits['link_id']))) #number of submissions =
print(len(pd.unique(data_similarity_askscience_subreddits['author']))) #number of submissions =
print(len(data_similarity_askscience_subreddits.columns))

7952
1
7952
4612
368
5002
15


In [4]:
data_similarity_askscience_subreddits.head(3)

Unnamed: 0,id,subreddit,body,author,score,gilded,created_utc,parent_id,link_id,retrieved_on,controversiality,is_submitter,network_similarity,cultural_similarity,parent_comment_author
0,iqker6l,askscience,No it does not imply that. “We don’t yet know”...,omniskeptic,2,0,1664582942,iqkee0k,xs73nx,1664960533,0,False,0.99421,0.318494,chop1n
3,iqkfl8j,askscience,Pasteurization works by heating (generally a l...,jeweledjuniper,11,0,1664583360,iqke0xc,xs1k1y,1664960508,0,False,0.947459,0.642043,feitingen
4,iqkfmj9,askscience,"It *absolutely* implies an expectation, even i...",chop1n,3,0,1664583378,iqker6l,xs73nx,1664960507,0,False,0.99421,0.421561,omniskeptic


In [5]:
d1 = data_similarity_askscience_subreddits[~data_similarity_askscience_subreddits['parent_comment_author'].isna()]
print(len(d1))

7952


note: this means that 7952 comments have a cultural, network similairty as well as a parent comment

In [6]:
print(data_similarity_askscience_subreddits.columns)

Index(['id', 'subreddit', 'body', 'author', 'score', 'gilded', 'created_utc',
       'parent_id', 'link_id', 'retrieved_on', 'controversiality',
       'is_submitter', 'network_similarity', 'cultural_similarity',
       'parent_comment_author'],
      dtype='object')


confirm that the following columns do not have missing values

In [7]:
print(data_similarity_askscience_subreddits['network_similarity'].isna().sum())
print(data_similarity_askscience_subreddits['cultural_similarity'].isna().sum())
print(data_similarity_askscience_subreddits['parent_comment_author'].isna().sum())
print(data_similarity_askscience_subreddits['body'].isna().sum())

0
0
0
0


check the unique user-parent value counts

In [8]:
print(len(data_similarity_askscience_subreddits[['author', 'parent_comment_author']].value_counts()))

7478


the askscience unprocessed data has 26605 comments
there are 13270 comments left after rows with either author as '[deleted]' or body as '[removed]' are removed
there are 7478 user pairs for which there is a network similarity
there are 7478 user pairs for which there is a cultural similarity
thus there should be 7478 unique user pairs
thus for 7952 comments, there is a network similarity and cultural similarity and a parent comment, out of which there are 7478 unique user pairs and 474 repeated user pairs (5.9 percent repeats)

---
# **Part 2: Compute the 'aggressiveLanguage' variables for every comment**


---
Compute the threatScore and insultScore and toxicityScore for every comment
.

In [9]:
pip install detoxify

Collecting detoxify
  Downloading detoxify-0.5.1-py3-none-any.whl (12 kB)
Collecting transformers==4.22.1 (from detoxify)
  Downloading transformers-4.22.1-py3-none-any.whl (4.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.9/4.9 MB[0m [31m24.3 MB/s[0m eta [36m0:00:00[0m
Collecting sentencepiece>=0.1.94 (from detoxify)
  Downloading sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m47.6 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.13,>=0.11.1 (from transformers==4.22.1->detoxify)
  Downloading tokenizers-0.12.1-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.6/6.6 MB[0m [31m75.0 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tokenizers, sentencepiece, transformers, detoxify
  Attempting uninstall: tokenizers
    Found 

In [10]:
from detoxify import Detoxify

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


Moving 0 files to the new cache system


0it [00:00, ?it/s]

In [11]:
detoxify_model = Detoxify(
    model_type='original',
    device='cuda'
)

Downloading: "https://github.com/unitaryai/detoxify/releases/download/v0.1-alpha/toxic_original-c1212f89.ckpt" to /root/.cache/torch/hub/checkpoints/toxic_original-c1212f89.ckpt
100%|██████████| 418M/418M [00:02<00:00, 193MB/s]


config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

In [12]:
predicts_dict = detoxify_model.predict("sample sentence")
print(len(predicts_dict))
print(type(predicts_dict))
print(predicts_dict)
print(predicts_dict['threat'])

6
<class 'dict'>
{'toxicity': 0.0006331979, 'severe_toxicity': 0.00012220243, 'obscene': 0.0001922156, 'threat': 0.000112057016, 'insult': 0.00018101634, 'identity_attack': 0.00014015858}
0.000112057016


In [13]:
print(len(data_similarity_askscience_subreddits))
data_similarity_askscience_subreddits.head(3)

7952


Unnamed: 0,id,subreddit,body,author,score,gilded,created_utc,parent_id,link_id,retrieved_on,controversiality,is_submitter,network_similarity,cultural_similarity,parent_comment_author
0,iqker6l,askscience,No it does not imply that. “We don’t yet know”...,omniskeptic,2,0,1664582942,iqkee0k,xs73nx,1664960533,0,False,0.99421,0.318494,chop1n
3,iqkfl8j,askscience,Pasteurization works by heating (generally a l...,jeweledjuniper,11,0,1664583360,iqke0xc,xs1k1y,1664960508,0,False,0.947459,0.642043,feitingen
4,iqkfmj9,askscience,"It *absolutely* implies an expectation, even i...",chop1n,3,0,1664583378,iqker6l,xs73nx,1664960507,0,False,0.99421,0.421561,omniskeptic


check that no row has a missing body

In [14]:
data_similarity_askscience_subreddits['body'].isnull().sum()

0

In [15]:
def aggressiveLanguage_mapper(input_data):

  input_data['insult_prob'] = np.nan
  input_data['toxicity_prob'] = np.nan
  input_data['threat_prob'] = np.nan

  ignore_comments_counter = 0
  j = 0

  #additional code to resolve an error
  type_base = type(input_data['parent_id'].iloc[0])

  for ind, row in input_data.iterrows():
    j += 1
    if j % 1000 == 0:
      print('finished comment '+str(j)+'/'+str(len(input_data)))

    curr_author = row['author']
    #curr_subreddit_id = row['subreddit_id'] no subreddit here as there is only one subreddit askscience
    curr_id = row['id']
    curr_comment_body = row['body']

    #check the aggressive language probability in the comment body
    predicts_dict = detoxify_model.predict(curr_comment_body)
    input_data.at[ind,'insult_prob'] = predicts_dict["insult"]
    input_data.at[ind,'toxicity_prob'] = predicts_dict["toxicity"]
    input_data.at[ind,'threat_prob'] = predicts_dict["threat"]

  print('total number of comments ignored: ' +str(ignore_comments_counter))
  return input_data


In [16]:
data_similarity_askscience_subreddits = aggressiveLanguage_mapper(data_similarity_askscience_subreddits)
print(len(data_similarity_askscience_subreddits)) #length of data
print(len(pd.unique(data_similarity_askscience_subreddits['subreddit']))) #number of subreddits considered =
print(len(pd.unique(data_similarity_askscience_subreddits['id']))) #unique number of comments = , the data is at the comment level =
print(len(pd.unique(data_similarity_askscience_subreddits['parent_id']))) #number of parent nodes =
print(len(pd.unique(data_similarity_askscience_subreddits['link_id']))) #number of submissions =
print(len(data_similarity_askscience_subreddits[['author', 'parent_comment_author']].value_counts())) #number of unique counts of speaker-receiver pairs
print(len(data_similarity_askscience_subreddits.columns))

finished comment 1000/7952
finished comment 2000/7952
finished comment 3000/7952
finished comment 4000/7952
finished comment 5000/7952
finished comment 6000/7952
finished comment 7000/7952
total number of comments ignored: 0
7952
1
7952
4612
368
7478
18


In [17]:
print(len(pd.unique(data_similarity_askscience_subreddits['author']))) #number of authors =
data_similarity_askscience_subreddits.head(3)

5002


Unnamed: 0,id,subreddit,body,author,score,gilded,created_utc,parent_id,link_id,retrieved_on,controversiality,is_submitter,network_similarity,cultural_similarity,parent_comment_author,insult_prob,toxicity_prob,threat_prob
0,iqker6l,askscience,No it does not imply that. “We don’t yet know”...,omniskeptic,2,0,1664582942,iqkee0k,xs73nx,1664960533,0,False,0.99421,0.318494,chop1n,0.000171,0.000725,0.000118
3,iqkfl8j,askscience,Pasteurization works by heating (generally a l...,jeweledjuniper,11,0,1664583360,iqke0xc,xs1k1y,1664960508,0,False,0.947459,0.642043,feitingen,0.000198,0.000734,0.000142
4,iqkfmj9,askscience,"It *absolutely* implies an expectation, even i...",chop1n,3,0,1664583378,iqker6l,xs73nx,1664960507,0,False,0.99421,0.421561,omniskeptic,0.000175,0.000864,0.000121


confirm that there are no missing 'insult', 'threat', and 'toxicity' columns

In [18]:
print(data_similarity_askscience_subreddits['insult_prob'].isnull().sum())
print(data_similarity_askscience_subreddits['threat_prob'].isnull().sum())
print(data_similarity_askscience_subreddits['toxicity_prob'].isnull().sum())

0
0
0


In [19]:
data_similarity_askscience_subreddits.loc[data_similarity_askscience_subreddits.author == '[deleted]', 'author'].count()

0

just a check to ensure that we had removed all these comments in the beginning

In [20]:
data_similarity_askscience_subreddits.head(3)

Unnamed: 0,id,subreddit,body,author,score,gilded,created_utc,parent_id,link_id,retrieved_on,controversiality,is_submitter,network_similarity,cultural_similarity,parent_comment_author,insult_prob,toxicity_prob,threat_prob
0,iqker6l,askscience,No it does not imply that. “We don’t yet know”...,omniskeptic,2,0,1664582942,iqkee0k,xs73nx,1664960533,0,False,0.99421,0.318494,chop1n,0.000171,0.000725,0.000118
3,iqkfl8j,askscience,Pasteurization works by heating (generally a l...,jeweledjuniper,11,0,1664583360,iqke0xc,xs1k1y,1664960508,0,False,0.947459,0.642043,feitingen,0.000198,0.000734,0.000142
4,iqkfmj9,askscience,"It *absolutely* implies an expectation, even i...",chop1n,3,0,1664583378,iqker6l,xs73nx,1664960507,0,False,0.99421,0.421561,omniskeptic,0.000175,0.000864,0.000121


find number of unique values of the 'created_utc'

In [21]:
print(len(pd.unique(data_similarity_askscience_subreddits['created_utc'])))

7927


In [22]:
print(len(pd.unique(data_similarity_askscience_subreddits['retrieved_on'])))

7647


code snippet to convert 'created_utc' timestamp to a readable form

In [24]:
from datetime import datetime
ts = 1664960502
utcDate = datetime.utcfromtimestamp(ts)
print(utcDate.strftime('%Y-%m-%d %H:%M:%S'))
print(utcDate.strftime('%H:%M:%S'))
print(utcDate.strftime('%Y-%m-%d'))
print(utcDate.strftime('%Y-%m-%d:%H'))

2022-10-05 09:01:42
09:01:42
2022-10-05
2022-10-05:09


double check that no column has a missing created timestamp

In [None]:
print(data_similarity_askscience_subreddits['created_utc'].isnull().sum())

0


In [39]:
def assign_comment_timestamp(input_data):

  input_data['date_time'] = np.nan
  input_data['date'] = np.nan
  input_data['time'] = np.nan
  input_data['date_hour'] = np.nan
  input_data['date_hour_min'] = np.nan
  j = 0

  for ind, row in input_data.iterrows():

    j += 1
    if j % 1000 == 0:
      print('finished comment '+str(j)+'/'+str(len(input_data)))

    ts = row['created_utc']
    utcDate = datetime.utcfromtimestamp(ts)
    input_data.at[ind,'date_time'] = utcDate.strftime('%Y-%m-%d_%H:%M:%S')
    input_data.at[ind,'date'] = utcDate.strftime('%Y-%m-%d')
    input_data.at[ind,'time'] = utcDate.strftime('%H:%M:%S')
    input_data.at[ind,'date_hour'] = utcDate.strftime('%Y-%m-%d_%H')
    input_data.at[ind,'date_hour_min'] = utcDate.strftime('%Y-%m-%d_%H:%M')

  return input_data



In [45]:
data_similarity_askscience_subreddits = assign_comment_timestamp(data_similarity_askscience_subreddits)
print(len(data_similarity_askscience_subreddits)) #length of data
print(len(pd.unique(data_similarity_askscience_subreddits['subreddit']))) #number of subreddits considered =
print(len(pd.unique(data_similarity_askscience_subreddits['id']))) #unique number of comments = , the data is at the comment level =
print(len(pd.unique(data_similarity_askscience_subreddits['author']))) #number of author =
print(len(pd.unique(data_similarity_askscience_subreddits['parent_id']))) #number of parent nodes =
print(len(pd.unique(data_similarity_askscience_subreddits['link_id']))) #number of submissions =
print(len(data_similarity_askscience_subreddits[['author', 'parent_comment_author']].value_counts())) #number of unique counts of speaker-receiver pairs
print(len(data_similarity_askscience_subreddits.columns))

finished comment 1000/7952
finished comment 2000/7952
finished comment 3000/7952
finished comment 4000/7952
finished comment 5000/7952
finished comment 6000/7952
finished comment 7000/7952
7952
1
7952
5002
4612
368
7478
23


In [46]:
data_similarity_askscience_subreddits.head(30)

Unnamed: 0,id,subreddit,body,author,score,gilded,created_utc,parent_id,link_id,retrieved_on,...,cultural_similarity,parent_comment_author,insult_prob,toxicity_prob,threat_prob,date_time,date,time,date_hour,date_hour_min
0,iqker6l,askscience,No it does not imply that. “We don’t yet know”...,omniskeptic,2,0,1664582942,iqkee0k,xs73nx,1664960533,...,0.318494,chop1n,0.000171,0.000725,0.000118,2022-10-01_00:09:02,2022-10-01,00:09:02,2022-10-01_00,2022-10-01_00:09
3,iqkfl8j,askscience,Pasteurization works by heating (generally a l...,jeweledjuniper,11,0,1664583360,iqke0xc,xs1k1y,1664960508,...,0.642043,feitingen,0.000198,0.000734,0.000142,2022-10-01_00:16:00,2022-10-01,00:16:00,2022-10-01_00,2022-10-01_00:16
4,iqkfmj9,askscience,"It *absolutely* implies an expectation, even i...",chop1n,3,0,1664583378,iqker6l,xs73nx,1664960507,...,0.421561,omniskeptic,0.000175,0.000864,0.000121,2022-10-01_00:16:18,2022-10-01,00:16:18,2022-10-01_00,2022-10-01_00:16
38,iqkrd5j,askscience,Thats also what I remember. There was speculat...,greese007,2,0,1664589335,iqklvbl,xs4rhf,1664960145,...,0.20336,blscratch,0.000195,0.000915,0.000128,2022-10-01_01:55:35,2022-10-01,01:55:35,2022-10-01_01,2022-10-01_01:55
39,iqkre4b,askscience,"Not sure if you’re writing only about insects,...",viciousfishous08,34,0,1664589349,iqke7g3,xs9pjy,1664960143,...,0.032065,thelogicalghost,0.000226,0.00161,0.000116,2022-10-01_01:55:49,2022-10-01,01:55:49,2022-10-01_01,2022-10-01_01:55
52,iqkt0ym,askscience,"Ahaha, so, short version is, its a scifi story...",thelogicalghost,34,0,1664590185,iqkre4b,xs9pjy,1664960094,...,0.236366,viciousfishous08,0.00018,0.000896,0.000108,2022-10-01_02:09:45,2022-10-01,02:09:45,2022-10-01_02,2022-10-01_02:09
56,iqkww4r,askscience,Also your hearing recognizes the tones as cert...,tin_man6328,4,0,1664592247,iqkfsgy,xs73nx,1664959974,...,0.430613,moewind420,0.000179,0.000659,0.000124,2022-10-01_02:44:07,2022-10-01,02:44:07,2022-10-01_02,2022-10-01_02:44
67,iqkyg8o,askscience,This is why you feel blinded when youre drivin...,yeswehavenotomatoes,34,0,1664593112,iqke5ya,xs73nx,1664959927,...,0.239771,balazer,0.001518,0.027269,0.0004,2022-10-01_02:58:32,2022-10-01,02:58:32,2022-10-01_02,2022-10-01_02:58
76,iqkz2yw,askscience,this sounds very interesting! would love to se...,mib_sum1ls,13,0,1664593459,iqkt0ym,xs9pjy,1664959907,...,0.310097,thelogicalghost,0.000253,0.001617,0.000111,2022-10-01_03:04:19,2022-10-01,03:04:19,2022-10-01_03,2022-10-01_03:04
86,iql02a4,askscience,Give me multiverse version of Starship Trooper...,glomgore,4,0,1664594017,iqkz2yw,xs9pjy,1664959877,...,0.320632,mib_sum1ls,0.000183,0.000962,0.000116,2022-10-01_03:13:37,2022-10-01,03:13:37,2022-10-01_03,2022-10-01_03:13


learn more about the time period of the data. We can see thats its a 1 month snapshot. Looks like since 'date' has 31 values it might be a goodvariable for the fixed effect. Also, if we take hour wise, there are 720 different values.

In [47]:
print(len(pd.unique(data_similarity_askscience_subreddits['date_time'])))
print(pd.unique(data_similarity_askscience_subreddits['date_time']))
print(len(pd.unique(data_similarity_askscience_subreddits['date'])))
print(pd.unique(data_similarity_askscience_subreddits['date']))
print(len(pd.unique(data_similarity_askscience_subreddits['time'])))
print(pd.unique(data_similarity_askscience_subreddits['time']))
print(len(pd.unique(data_similarity_askscience_subreddits['date_hour'])))
print(pd.unique(data_similarity_askscience_subreddits['date_hour']))
print(len(pd.unique(data_similarity_askscience_subreddits['date_hour_min'])))
print(pd.unique(data_similarity_askscience_subreddits['date_hour_min']))

7927
['2022-10-01_00:09:02' '2022-10-01_00:16:00' '2022-10-01_00:16:18' ...
 '2022-10-31_23:25:52' '2022-10-31_23:50:29' '2022-10-31_23:42:20']
31
['2022-10-01' '2022-10-30' '2022-10-29' '2022-10-02' '2022-10-03'
 '2022-10-04' '2022-10-05' '2022-10-06' '2022-10-31' '2022-10-07'
 '2022-10-08' '2022-10-12' '2022-10-09' '2022-10-10' '2022-10-11'
 '2022-10-13' '2022-10-14' '2022-10-15' '2022-10-16' '2022-10-17'
 '2022-10-18' '2022-10-19' '2022-10-20' '2022-10-21' '2022-10-22'
 '2022-10-23' '2022-10-24' '2022-10-25' '2022-10-26' '2022-10-27'
 '2022-10-28']
7566
['00:09:02' '00:16:00' '00:16:18' ... '23:25:52' '23:50:29' '23:42:20']
720
['2022-10-01_00' '2022-10-01_01' '2022-10-01_02' '2022-10-01_03'
 '2022-10-01_04' '2022-10-01_05' '2022-10-01_07' '2022-10-01_08'
 '2022-10-01_09' '2022-10-01_10' '2022-10-01_11' '2022-10-01_12'
 '2022-10-30_15' '2022-10-01_14' '2022-10-01_15' '2022-10-29_18'
 '2022-10-01_16' '2022-10-01_17' '2022-10-01_18' '2022-10-01_19'
 '2022-10-01_20' '2022-10-01_21' '20

In [48]:
data_similarity_askscience_subreddits.to_csv('/content/gdrive/MyDrive/Colab Notebooks/askscience/data_result_askscience_subreddits_aggressive_language.csv')

ignore from here

In [None]:
import pandas as pd
data_similarity_askscience_subreddits = pd.read_csv('/content/gdrive/MyDrive/Colab Notebooks/askscience/data_result_askscience_subreddits_aggressive_language.csv',encoding_errors='ignore', index_col=0)

In [None]:
import pandas as pd
d = pd.read_json('/content/gdrive/MyDrive/Colab Notebooks/askscience/askcience_processed_1.ndjson',encoding_errors='ignore')

In [None]:
d.head(10)

Unnamed: 0,id,subreddit,body,author,score,gilded,created_utc,parent_id,link_id,retrieved_on,controversiality,is_submitter
0,iqker6l,askscience,No it does not imply that. “We don’t yet know”...,omniskeptic,2,0,1664582942,iqkee0k,xs73nx,1664960533,0,False
1,iqkewq0,askscience,while insect muscle might be similar to ours s...,regular_modern_girl,452,0,1664583016,iqjssf5,xs9pjy,1664960528,0,False
2,iqkfdmz,askscience,[removed],[deleted],1,0,1664583252,iqkb49u,xs9pjy,1664960514,0,False
3,iqkfl8j,askscience,Pasteurization works by heating (generally a l...,jeweledjuniper,11,0,1664583360,iqke0xc,xs1k1y,1664960508,0,False
4,iqkfmj9,askscience,"It *absolutely* implies an expectation, even i...",chop1n,3,0,1664583378,iqker6l,xs73nx,1664960507,0,False
5,iqkfrm5,askscience,"PhD in yeast genetics here, so I’ve streaked t...",smallwhitedog,4,0,1664583450,xs1k1y,xs1k1y,1664960502,0,False
6,iqkfsgy,askscience,Others have given great reasons for why our si...,moewind420,6,0,1664583462,xs73nx,xs73nx,1664960501,0,False
7,iqkft4v,askscience,[removed],[deleted],1,0,1664583472,xs1k1y,xs1k1y,1664960501,0,False
8,iqkfzvn,askscience,Inside a living human isn’t lightless dark. Li...,sovietamerican,7,0,1664583564,iqk1nsq,xs4rhf,1664960495,0,False
9,iqkg3r0,askscience,Wordy is good. your explanation is helping me...,tonytoews,2,0,1664583615,iqk8u6o,xs73nx,1664960492,0,False


In [None]:
print(d['retrieved_on'].unique())

[1664960533 1664960528 1664960514 ... 1667844629 1667844624 1667844621]


In [None]:
print(len(d['retrieved_on'].unique()))

23480
