<a href="https://colab.research.google.com/github/PKpacheco/final_project_neural_networks/blob/main/Paola_Pacheco_Final_project_NN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Neural Networks and Deep Learning:
#### Paola Katherine Pacheco
#### Elections

# Executive Summary

This project has as input data set with Twitter texts related to the candidates of the last election in the United States, Donald Trump and Joe Biden.
The main objective is to understand when posts have a positive or negative feeling towards the candidates and how we could understand this as something that would affect the outcome of the elections

For the stages we will use the content of what was published about the candidates, using a previously created dataset available on kaggle.
* [kaggle link](https://statics.teams.cdn.office.net/evergreen-assets/safelinks/1/atp-safelinks.html)


We will pre-process the data so that it undergoes cleaning, removing special characters, links and after that we must tokenize it, to obtain the sentiment, whether positive, negative or neutral.


Furthermore, we will try to explore some of the advanced methods such as stacking long-term memory networks (LSTMs), closed recurrent units (GRUs) and bidirectional recurrent neural networks (RNNs), seeking to improve the accuracy of results.


This analysis seeks to provide metrics to understand the relationship between social networks and the real election results

In [23]:
!pip install autocorrect

Collecting autocorrect
  Downloading autocorrect-2.6.1.tar.gz (622 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/622.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m204.8/622.8 kB[0m [31m6.0 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m622.8/622.8 kB[0m [31m10.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: autocorrect
  Building wheel for autocorrect (setup.py) ... [?25l[?25hdone
  Created wheel for autocorrect: filename=autocorrect-2.6.1-py3-none-any.whl size=622363 sha256=41624ef1f66aede23e93169490dd12b1a28754d6a72a7ebd6aa4ae58d10fe0c8
  Stored in directory: /root/.cache/pip/wheels/b5/7b/6d/b76b29ce11ff8e2521c8c7dd0e5bfee4fb1789d76193124343
Successfully built autocorrect
Installing collected packages: autocorrect
Successfully installed autocorrec

In [24]:
# import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import re
import string

import nltk
nltk.download('punkt')
from nltk.tokenize import word_tokenize
from autocorrect import Speller
from nltk.corpus import stopwords

# import seaborn as sns

# from mpl_toolkits.mplot3d import Axes3D
# from sklearn.ensemble import RandomForestClassifier
# from sklearn.metrics import accuracy_score, confusion_matrix

# from sklearn.model_selection import (
#     train_test_split,
#     cross_val_score,
#     GridSearchCV
#     )
# from sklearn.ensemble import StackingClassifier, VotingClassifier
# from sklearn.neighbors import KNeighborsClassifier
# from sklearn.svm import SVC
# from sklearn.model_selection import GridSearchCV
# from sklearn.tree import DecisionTreeClassifier
# from tabulate import tabulate





[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [3]:
# import files
biden_url = 'https://raw.githubusercontent.com/PKpacheco/final_project_neural_networks/main/Bidenall2.csv'
trump_url = 'https://raw.githubusercontent.com/PKpacheco/final_project_neural_networks/main/Trumpall2.csv'

In [12]:
# read files
df_biden = pd.read_csv(biden_url)
df_trump = pd.read_csv(trump_url)

In [13]:
df_biden.head()

Unnamed: 0,user,text
0,MarkHodder3,@JoeBiden And we’ll find out who won in 2026...
1,K87327961G,@JoeBiden Your Democratic Nazi Party cannot be...
2,OldlaceA,@JoeBiden So did Lying Barr
3,penblogger,@JoeBiden It's clear you didnt compose this tw...
4,Aquarian0264,@JoeBiden I will vote in person thank you.


In [6]:
df_trump.head()

Unnamed: 0,user,text
0,manny_rosen,@sanofi please tell us how many shares the Cr...
1,osi_abdul,"https://t.co/atM98CpqF7 Like, comment, RT #P..."
2,Patsyrw,Your AG Barr is as useless &amp; corrupt as y...
3,seyedebrahimi_m,Mr. Trump! Wake Up! Most of the comments bel...
4,James09254677,After 4 years you think you would have figure...


As we can see, we have first column the user id (twitter) and the second column , the text

In [7]:
# Checking for NAN values
df_trump.isna().sum()

user    0
text    0
dtype: int64

In [14]:
# Checking for NAN values
df_biden.isna().sum()

user    0
text    0
dtype: int64

In [11]:
# clean text
def clean_text(text):
    # Remove URLs
    text = re.sub(r'http\S+', '', text)

    # Remove special characters and symbols
    text = re.sub(r'[^\w\s]', '', text)

    # Convert to lowercase
    text = text.lower()

    return text

In [18]:
# clean the text using function created
df_trump['text'] = df_trump['text'].apply(clean_text)
df_biden['text'] = df_biden['text'].apply(clean_text)

In [25]:
spell = Speller()
def correct_spelling(text):
    corrected_text = [spell(word) for word in text.split()]
    return ' '.join(corrected_text)

In [None]:
df_trump['text'] = df_trump['text'].apply(correct_spelling)
df_biden['text'] = df_biden['text'].apply(correct_spelling)

In [19]:
# tokenize
df_trump['tokens'] = df_trump['text'].apply(word_tokenize)
df_biden['tokens'] = df_biden['text'].apply(word_tokenize)

In [None]:
# create function to remove twop words
stop_words = set(stopwords.words('english'))

def remove_stopwords(tokens):
    filtered_tokens = [word for word in tokens if word.lower() not in stop_words]
    return filtered_tokens

In [None]:
# apply remove stop words
df_trump['tokens'] = df_trump['tokens'].apply(remove_stopwords)
df_biden['tokens'] = df_biden['tokens'].apply(remove_stopwords)


In [21]:
df_biden

Unnamed: 0,user,text,tokens
0,MarkHodder3,joebiden and well find out who won in 2026,"[joebiden, and, well, find, out, who, won, in,..."
1,K87327961G,joebiden your democratic nazi party cannot be ...,"[joebiden, your, democratic, nazi, party, can,..."
2,OldlaceA,joebiden so did lying barr,"[joebiden, so, did, lying, barr]"
3,penblogger,joebiden its clear you didnt compose this twee...,"[joebiden, its, clear, you, didnt, compose, th..."
4,Aquarian0264,joebiden i will vote in person thank you,"[joebiden, i, will, vote, in, person, thank, you]"
...,...,...,...
2535,meryn1977,joebiden youll just try to calm those waters a...,"[joebiden, youll, just, try, to, calm, those, ..."
2536,BSNelson114,joebiden 96 days 96 dias votejoebiden2020 vot...,"[joebiden, 96, days, 96, dias, votejoebiden202..."
2537,KenCapel,joebiden you think you can do that you cant re...,"[joebiden, you, think, you, can, do, that, you..."
2538,LeslyeHale,joebiden trump wants our children back at scho...,"[joebiden, trump, wants, our, children, back, ..."


In [20]:
df_trump

Unnamed: 0,user,text,tokens
0,manny_rosen,sanofi please tell us how many shares the cri...,"[sanofi, please, tell, us, how, many, shares, ..."
1,osi_abdul,like comment rt prayer4tachantitans prayer4...,"[like, comment, rt, prayer4tachantitans, praye..."
2,Patsyrw,your ag barr is as useless amp corrupt as you...,"[your, ag, barr, is, as, useless, amp, corrupt..."
3,seyedebrahimi_m,mr trump wake up most of the comments below ...,"[mr, trump, wake, up, most, of, the, comments,..."
4,James09254677,after 4 years you think you would have figure...,"[after, 4, years, you, think, you, would, have..."
...,...,...,...
2783,4diva63,realdonaldtrump for the 1100 time absentee bal...,"[realdonaldtrump, for, the, 1100, time, absent..."
2784,hidge826,realdonaldtrump if youre so scared of losing r...,"[realdonaldtrump, if, youre, so, scared, of, l..."
2785,SpencerRossy,realdonaldtrump i rarely get involved with for...,"[realdonaldtrump, i, rarely, get, involved, wi..."
2786,ScoobyMcpherson,realdonaldtrump this is the moment when trump ...,"[realdonaldtrump, this, is, the, moment, when,..."
