# 1. Reddit Scraping

1.1 Import of all relevant libraries

In [1]:
!pip install psaw



In [2]:
import pandas as pd
pd.set_option('max_colwidth', 500)
pd.set_option('max_columns', 50)


In [3]:
from psaw import PushshiftAPI

# Initialize PushShift
api = PushshiftAPI()

In [4]:
import datetime as dt

1.2 In order to filter comments so that we only collect comments by Chilean people themselves, we scrape the subreddit 'chile'.

In [5]:
api_request_generator = api.search_submissions(subreddit='chile')


1.3 Next we filter comments from the subreddit 'chile' from the 16th of november up until the 24th of december 2021 on the keywords [ellección, ellecion, Kast, Boric, Gabriel, José]

In [23]:
start_epoch_1=int(dt.datetime(2021, 11, 16).timestamp())
end_epoch_1=int(dt.datetime(2021, 12, 24).timestamp())

api_request_generator = api.search_comments(q = '(José)|(Kast)', 
                                            subreddit='chile', after = start_epoch_1, before=end_epoch_1)

1.4 Next, we store the collected results in a dataframe called 'chile_comments'. 

In [24]:
chile_comments = pd.DataFrame([comment.d_ for comment in api_request_generator])



In [25]:
chile_comments.shape

(15188, 53)

1.5 We clean the dataframe by reformatting the 'date' collumn into a datetime64 format. Also, we remove any collumns that we don't intend to use for our analysis. 

In [26]:
chile_comments['date'] = pd.to_datetime(chile_comments['created_utc'], utc=True, unit='s')

In [27]:
chile_comments.columns

Index(['all_awardings', 'archived', 'associated_award', 'author',
       'author_flair_background_color', 'author_flair_css_class',
       'author_flair_richtext', 'author_flair_template_id',
       'author_flair_text', 'author_flair_text_color', 'author_flair_type',
       'author_fullname', 'author_patreon_flair', 'author_premium', 'body',
       'body_sha1', 'can_gild', 'collapsed', 'collapsed_because_crowd_control',
       'collapsed_reason', 'collapsed_reason_code', 'comment_type',
       'controversiality', 'created_utc', 'distinguished', 'gilded',
       'gildings', 'id', 'is_submitter', 'link_id', 'locked', 'no_follow',
       'parent_id', 'permalink', 'retrieved_utc', 'score', 'score_hidden',
       'send_replies', 'stickied', 'subreddit', 'subreddit_id',
       'subreddit_name_prefixed', 'subreddit_type', 'top_awarded_type',
       'total_awards_received', 'treatment_tags', 'unrepliable_reason',
       'created', 'author_cakeday', 'media_metadata', 'awarders',
       'retriev

In [28]:
chile_comments = chile_comments.drop(columns=['all_awardings', 'archived', 'associated_award',
       'author_flair_background_color', 'author_flair_css_class',
       'author_flair_richtext', 'author_flair_template_id',
       'author_flair_text', 'author_flair_text_color', 'author_flair_type',
       'author_fullname', 'author_patreon_flair', 'author_premium',
       'body_sha1', 'can_gild', 'collapsed', 'collapsed_because_crowd_control',
       'collapsed_reason', 'collapsed_reason_code', 'comment_type',
       'controversiality', 'distinguished', 'gilded',
       'gildings', 'id', 'is_submitter', 'link_id', 'locked', 'no_follow',
       'parent_id', 'permalink', 'retrieved_utc', 'score', 'score_hidden',
       'send_replies', 'stickied', 'subreddit', 'subreddit_id',
       'subreddit_name_prefixed', 'subreddit_type', 'top_awarded_type',
       'total_awards_received', 'treatment_tags', 'unrepliable_reason',
       'created', 'author_cakeday', 'awarders', 'retrieved_on'])

In [29]:
dictionary = chile_comments['body'].to_dict()
print(dictionary)

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



# 2 Sentiment Analysis. Source: https://pypi.org/project/sentiment-analysis-spanish/

2.1 First step is to import all the relevant libraries.

In [30]:
pip install sentiment-analysis-spanish

Note: you may need to restart the kernel to use updated packages.


In [31]:
pip install keras tensorflow

Note: you may need to restart the kernel to use updated packages.


In [32]:
from sentiment_analysis_spanish import sentiment_analysis

In [33]:
sentiment = sentiment_analysis.SentimentAnalysisSpanish()



2.2 We create a dictionary wherein we will store the sentiment score to each individual comment. 

In [34]:
dictionary2 = {"Sentiment_score" : []}

In [35]:
for key, value in dictionary.items():
    dictionary2['Sentiment_score'].append(sentiment.sentiment(dictionary[key]))

print(dictionary2)

{'Sentiment_score': [0.49789225920557484, 5.41884560885998e-08, 2.5986532360540637e-27, 0.5150746189765654, 1.47988058045008e-55, 0.1670618913119924, 0.0278901690180084, 0.09704562195345641, 4.5052938659410544e-08, 1.0833453429486635e-12, 9.10729831938328e-30, 0.04900898792819789, 0.004457962567953445, 0.036643951695517686, 7.764887977738993e-11, 0.04328281596472852, 8.871545159444672e-06, 1.7165254247172467e-13, 0.3385252644667176, 4.902502850865687e-06, 6.333844553258356e-13, 2.6363273160703595e-07, 0.1325711227119672, 0.00012967867875418688, 1.311912443229272e-15, 1.899501153665078e-12, 0.00030124931006017964, 2.0496614705119323e-25, 0.0008482322187759382, 1.4678972860689571e-27, 1.220791169608325e-37, 2.0104665470514345e-26, 3.183744822710551e-31, 2.2853326349474405e-14, 1.0266098234581688e-17, 0.7996978960941313, 1.4834963209178015e-12, 6.912191884414632e-13, 2.8394847791694184e-18, 6.516672644339928e-05, 1.9106148025943914e-10, 0.01133216452803198, 1.3518029910823242e-12, 8.30406

2.3 Next we create a dataframe out of this dictionary, and merge this dataframe with the original dataframe 'chile_comments'

In [36]:
dfSentiment_score = pd.DataFrame(dictionary2)
print(dfSentiment_score)

       Sentiment_score
0         4.978923e-01
1         5.418846e-08
2         2.598653e-27
3         5.150746e-01
4         1.479881e-55
...                ...
15183     9.514428e-04
15184     5.543139e-01
15185     2.111674e-02
15186     4.433163e-02
15187     1.587786e-02

[15188 rows x 1 columns]


In [37]:
chile_comments1 = chile_comments.join(dfSentiment_score)

In [38]:
chile_comments1

Unnamed: 0,author,body,created_utc,media_metadata,edited,date,Sentiment_score
0,MariaJoseBlanchester,Oye.. y Kast??,1640300103,,,2021-12-23 22:55:03+00:00,4.978923e-01
1,ketoske,"ctm nooo!, que tellier se ponga full totalitarista pa que la carol cariola con la camila vallejos tengan que reunir las bolas de lenin, stalin, marx y 1 de trotsky para invocar al shenlong comunista y revivir a allende que seria el unico que puede salvarnos, siguiente temporada viendo que allende revivio los hermanos kast junto al negro piñera juntan las bolas de kissinger, nixon, smith y 1 de jaime guzman pa invocar a satanas y revivir a pinocho quien quiere puro arrancar del infierno pk ll...",1640299090,,,2021-12-23 22:38:10+00:00,5.418846e-08
2,JesusM3R,"Yo salí de chile poco antes del estallido. Y, si bien me gusta mucho mucho Chile, de fuera se ve que hay cosas que no funcionan. Me da mucha pena que la gente, por ejemplo, haya votado por Kast pensando que Boric es comunista… sin embargo, al menos en el primer mundo, el fascismo es el que está penado y condenado.\nDicho esto, a pesar de que hay cosas muy buenas en chile, instituciones que valen la pena y mucho progreso, en mi opinión, lo más terrible es el clasismo tan intrínseco en la soci...",1640297055,,,2021-12-23 22:04:15+00:00,2.598653e-27
3,way2menace,"*y es el shileno, el shileno, el shileno,* ***José Marcelo Salas***",1640296246,,,2021-12-23 21:50:46+00:00,5.150746e-01
4,Daigonik,"A diferencia de otras épocas ahora veo una gran ansía de participación. La gente no quiere elegir políticos y sentarse a esperar que hagan cambios, quieren colaborar con el proceso y hacer todo lo posible para que al gobierno le vaya bien.\n\nLa transición despolitizo a la ciudadanía, y el estallido la ha vuelto a politizar.\n\nSi Boric es capaz de canalizar esas ansias de ser incluidos en la política en cosas concretas, aunque no logre cumplir todo lo que prometió, siento que la gente no va...",1640295664,,,2021-12-23 21:41:04+00:00,1.479881e-55
...,...,...,...,...,...,...,...
15183,darthdeckard,Me imagino viendo el capitulo 1000 de One Piece y al terminar enterarme que JA Kast no pasó a segunda vuelta...,1637017706,,,2021-11-15 23:08:26+00:00,9.514428e-04
15184,pan_con_queso_gouda,"ZppingTV tiene una encuesta, Kast va ganando :O",1637017651,,,2021-11-15 23:07:31+00:00,5.543139e-01
15185,gattopardista,Espero que sichel le pegue a jose kast,1637017508,,,2021-11-15 23:05:08+00:00,2.111674e-02
15186,Strange_River_2747,cara de curado que tiene kast,1637017389,,,2021-11-15 23:03:09+00:00,4.433163e-02


2.4 We store the merged dataframe into a csv file, which we will use for further analysis in our second python document.

In [39]:
chile_comments1.to_csv("C:/Users/alexa/OneDrive/Documenten/Collecting Data & Tools and Methods/Chilean_election_sentiment_Kast.csv")