# "I Feel" Analysis

In this project, I return the most frequently occurring tokens that occur after the phrase "I feel." I often described the webpage on which I am basing this project as the most beautiful thing I have ever seen. I would like to replicate it on a smaller scale.

In [130]:
import kagglehub
import pandas as pd
import plotly.express as px
from emotionextractor.emotionextractor import EmotionExtractor

In [131]:
# Get the datasets
handle = "kazanova/sentiment140"
path = kagglehub.dataset_download(handle)

print("Path to dataset files:", path)

Path to dataset files: /home/shroom/.cache/kagglehub/datasets/kazanova/sentiment140/versions/2


In [132]:
# Now load the dataset and do stuff
df = pd.read_csv(path + '/training.1600000.processed.noemoticon.csv',
                 encoding='ISO-8859-1', 
                 header=None,
                 names = ['id','time','query','username','text'])
df.head(10)

Unnamed: 0,id,time,query,username,text
0,1467810369,Mon Apr 06 22:19:45 PDT 2009,NO_QUERY,_TheSpecialOne_,"@switchfoot http://twitpic.com/2y1zl - Awww, t..."
0,1467810672,Mon Apr 06 22:19:49 PDT 2009,NO_QUERY,scotthamilton,is upset that he can't update his Facebook by ...
0,1467810917,Mon Apr 06 22:19:53 PDT 2009,NO_QUERY,mattycus,@Kenichan I dived many times for the ball. Man...
0,1467811184,Mon Apr 06 22:19:57 PDT 2009,NO_QUERY,ElleCTF,my whole body feels itchy and like its on fire
0,1467811193,Mon Apr 06 22:19:57 PDT 2009,NO_QUERY,Karoli,"@nationwideclass no, it's not behaving at all...."
0,1467811372,Mon Apr 06 22:20:00 PDT 2009,NO_QUERY,joy_wolf,@Kwesidei not the whole crew
0,1467811592,Mon Apr 06 22:20:03 PDT 2009,NO_QUERY,mybirch,Need a hug
0,1467811594,Mon Apr 06 22:20:03 PDT 2009,NO_QUERY,coZZ,@LOLTrish hey long time no see! Yes.. Rains a...
0,1467811795,Mon Apr 06 22:20:05 PDT 2009,NO_QUERY,2Hood4Hollywood,@Tatiana_K nope they didn't have it
0,1467812025,Mon Apr 06 22:20:09 PDT 2009,NO_QUERY,mimismo,@twittera que me muera ?


In [133]:
# Preprocess the text.
df['text'] = df['text'].apply(lambda x: x.lower())
df['text']

0    @switchfoot http://twitpic.com/2y1zl - awww, t...
0    is upset that he can't update his facebook by ...
0    @kenichan i dived many times for the ball. man...
0      my whole body feels itchy and like its on fire 
0    @nationwideclass no, it's not behaving at all....
                           ...                        
4    just woke up. having no school is the best fee...
4    thewdb.com - very cool to hear old walt interv...
4    are you ready for your mojo makeover? ask me f...
4    happy 38th birthday to my boo of alll time!!! ...
4    happy #charitytuesday @thenspcc @sparkscharity...
Name: text, Length: 1600000, dtype: object

In [134]:
i_feel = df['text'].str.findall(rf"i feel\s+(\w+)")

In [135]:
# Return all instances with "i feel"
i_feel = df['text'].str.findall(rf"i feel\s+(\w+)")
i_feel = i_feel[i_feel.map(len) > 0]
i_feel

0           [bad]
0          [like]
0    [unbearable]
0     [miserable]
0          [good]
         ...     
4        [better]
4          [bleh]
4            [so]
4         [sorry]
4         [today]
Name: text, Length: 13648, dtype: object

In [136]:
# Explode to deal with rows with multiple words
i_feel = i_feel.explode()
i_feel

0           bad
0          like
0    unbearable
0     miserable
0          good
        ...    
4        better
4          bleh
4            so
4         sorry
4         today
Name: text, Length: 13778, dtype: object

In [137]:
# Count instances of each word
i_feel_counts = i_feel.value_counts().to_frame('Count').reset_index(names='Word')
i_feel_counts

Unnamed: 0,Word,Count
0,like,3676
1,so,1787
2,bad,763
3,sick,484
4,really,323
...,...,...
966,indy,1
967,unbelievably,1
968,flaked,1
969,agitated,1


In [139]:
# Keep only emotion words
ee = EmotionExtractor()
i_feel_counts['is_emotion'] = i_feel_counts['Word'].apply(lambda word: len(ee.extract_emotion([word])) > 0)
i_feel_counts = i_feel_counts.loc[i_feel_counts['is_emotion']]
i_feel_counts

[nltk_data] Downloading package punkt to /home/shroom/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
INFO:emotionextractor.emotionextractor:remove_puncs_digits: ['like']
[nltk_data] Downloading package stopwords to /home/shroom/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to /home/shroom/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
INFO:emotionextractor.emotionextractor:remove_stopwords: ['like']
INFO:emotionextractor.emotionextractor:extract_emo_words: []
[nltk_data] Downloading package punkt to /home/shroom/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
INFO:emotionextractor.emotionextractor:remove_puncs_digits: ['so']
[nltk_data] Downloading package stopwords to /home/shroom/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to /home/shroom/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
INFO:emotione

Unnamed: 0,Word,Count,is_emotion
2,bad,763,True
3,sick,484,True
8,better,266,True
10,sorry,173,True
13,sad,138,True
...,...,...,...
954,grand,1,True
958,heartless,1,True
959,unappreciated,1,True
960,swine,1,True


In [140]:
fig = px.bar(
    data_frame = i_feel_counts.iloc[:10],
    x = 'Word',
    y = 'Count',
    color = 'Word'
)
fig.update_layout(showlegend=False)
fig.show()