The issue of racism in public education has been a problem in the United States for years, with marginalized communities receiving unequal access to quality education. To understand the issue better, this [post](https://www.reddit.com/r/politics/comments/11wh15i/not_just_florida_the_entire_gop_is_waging_a/) with 1,376 comments was chosen. This script shows the necessary web scrapping that was performed on the post.

In [None]:
#import packages
!pip install praw
import requests
import pandas as pd
import praw

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
# Passing In Reddit Details to Allow For Scraping
reddit = praw.Reddit(client_id='blAe-t33B9OyUVnEfgvaRw', client_secret='q1GlcqP3i-tt4OTaUWqZMnjvoyYiQw', user_agent='WebScrappingAndSentimentAnalysis')

In [None]:
#Get the submission for the URL and check the post has more than 1000 comments
PostUrl = 'https://www.reddit.com/r/politics/comments/11wh15i/not_just_florida_the_entire_gop_is_waging_a/'
SocialIssue = reddit.submission(url=PostUrl)
SocialIssue.num_comments 

It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.



1376

In [None]:
# Get comments (including nested comments) from the post
# Create an empty list
SocialIssueComments = []  
# Initiate a "for" loop to obtain desired properties and add them to the empty list
SocialIssue.comments.replace_more(limit = None)
for comment in SocialIssue.comments.list():
  SocialIssueComments.append([comment.body, comment.id, comment.score, comment.created]) 
# Create a dataframe with the comments and desired properties
SocialIssuesCommentsDF = pd.DataFrame(SocialIssueComments, columns=['Body','ID', 'Score','Date Created'])
print(SocialIssuesCommentsDF)

It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.

It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.

It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.

It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.

It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/l

                                                   Body       ID  Score  \
0     \nAs a reminder, this subreddit [is for civil ...  jcxulbs      1   
1     They've been waging a war against Public Educa...  jcxwnz4   2442   
2     >Earlier this month, the Washington Post repor...  jcy2u23   1039   
3     An educated electorate is the GOP's worst poli...  jcxumpx   1856   
4     Who is coordinating this? My guess is ALEC. Am...  jcxx1jj    330   
...                                                 ...      ...    ...   
1242                                          [deleted]  jd9yiiq      1   
1243  >Yes.\n\nLmao \n\n>Idk anything about any Saud...  jdfus47      0   
1244  Fact check the post. Or just continue vaguely ...  jd9yvsv      1   
1245  > You skipped right past Saudi Arabia, a terro...  jdjwjdi      0   
1246  >Addressed right there. You even quoted it bac...  jdk31q7      0   

      Date Created  
0     1.679314e+09  
1     1.679315e+09  
2     1.679318e+09  
3     1.679314e

In [None]:
SocialIssuesCommentsDF.shape

(1247, 5)

We see that from the initial 1376 commnets, only 1247 commenst were scrapped which is still sufficient enough for analysis. 

Now that we have scrapped the comments and stored them in a dataset with other relevant properties, we see that  the date created is not appearing in the right format and needs to be converted into the right format. 

# **Changing the Date Created Into A Understandable Format**

In [None]:
from datetime import datetime
# Convert the "Date Created" column to a datetime column
# Here, we are using the pd.to_datetime() function to convert the "Date Created" column into a datetime column. 
# We specify the unit as "s" to indicate that the timestamps in the "Date Created" column are in seconds.
SocialIssuesCommentsDF['Date/Time'] = pd.to_datetime(SocialIssuesCommentsDF['Date Created'],  unit='s')    
# Extract the date component from the "date_time" column and store it in a new "date" column
# We are using the "dt" accessor to access the datetime properties of the "date_time" column.
# We use the strftime() function to format the date component of the datetime column as a string with the format '%Y-%m-%d'.
SocialIssuesCommentsDF['Date'] = SocialIssuesCommentsDF['Date/Time'].dt.strftime('%Y-%m-%d') 
SocialIssuesCommentsDF.head()

Unnamed: 0,Body,ID,Score,Date Created,Date/Time,Date
0,"\nAs a reminder, this subreddit [is for civil ...",jcxulbs,1,1679314000.0,2023-03-20 12:03:02,2023-03-20
1,They've been waging a war against Public Educa...,jcxwnz4,2442,1679315000.0,2023-03-20 12:23:28,2023-03-20
2,">Earlier this month, the Washington Post repor...",jcy2u23,1039,1679318000.0,2023-03-20 13:18:11,2023-03-20
3,An educated electorate is the GOP's worst poli...,jcxumpx,1856,1679314000.0,2023-03-20 12:03:26,2023-03-20
4,Who is coordinating this? My guess is ALEC. Am...,jcxx1jj,330,1679315000.0,2023-03-20 12:27:15,2023-03-20


Now that all our data is readable, we need to ensure that we do not have any repeating rows within our dataset as this would cause incorrect analysis of the data.  

# **Removing Duplicate Rows**

In [None]:
# Remove the duplicate and save data to csv file
SocialIssuesCommentsDFNoDuplicates = SocialIssuesCommentsDF
SocialIssuesCommentsDFNoDuplicates.drop_duplicates(subset='Body', keep='last', inplace=True)
SocialIssuesCommentsDFNoDuplicates.shape

(1217, 6)

1217 Comments remain after getting rid of the duplicates. 

# **Saving File As CSV**

In [None]:
SocialIssuesCommentsDFNoDuplicates.to_csv('SocialIssue.csv', index=True, header=True)

Now that the File has been saved as a CSV, it can be used for sentiment analysis.