# Emoji Sentiment

Are popular emojis generally associated with positive or negative sentiments?

The file `"emoji-sentiment.csv"` provides data on the sentiment associated with various emojis.

Researchers examined 1.6 million tweets across 13 European languages. Each tweet was labeled by annotators as positive (+1), negative (-1), or neutral (0). About 4% of these tweets included emojis.

Columns include:
- `Occurrences [5...max]`: Number of times the emoji appears in the dataset.
- `Position [0...1]`: Average position of the emoji in tweets, from start (0) to end (1).
- `Neg [0...1]`: Percentage of tweets with the emoji that are 'negative'.
- `Neu [0...1]`: Percentage of tweets with the emoji that are 'neutral'.
- `Pos [0...1]`: Percentage of tweets with the emoji that are 'positive'.



In [50]:
# FOR GOOGLE COLAB ONLY.
# Uncomment and run the code below. A dialog will appear to upload files.
# Upload 'emoji-sentiment.csv'.

# from google.colab import files
# uploaded = files.upload()

In [51]:
import pandas as pd
df = pd.read_csv('emoji-sentiment.csv')
df.head(3)

Unnamed: 0,Char,Image [twemoji],Unicode codepoint,Occurrences [5...max],Position [0...1],Neg [0...1],Neut [0...1],Pos [0...1],Sentiment bar (c.i. 95%),Unicode name,Unicode block
0,😂,😂,0x1f602,14622,0.805,0.247,0.285,0.468,,FACE WITH TEARS OF JOY,Emoticons
1,❤,❤,0x2764,8050,0.747,0.044,0.166,0.79,,HEAVY BLACK HEART,Dingbats
2,♥,♥,0x2665,7144,0.754,0.035,0.272,0.693,,BLACK HEART SUIT,Miscellaneous Symbols


### Project Ideas:

Data Cleaning: 
- Remove unnecessary columns that are not useful for your analysis.

- Rename the remaining columns using `snake_case` (all lowercase letters with underscores between words).

New Variables:
- Add a new column called `sentiment`, where sentiment = (% positive tweets) - (% negative tweets).

- Add a `positive_flag` column that is `True` if `sentiment > 0` (or above a set threshold), otherwise `False`.

Types of questions you can now answer more easily:
- What percentage of emojis in the dataset have a positive sentiment?

- What percentage of the top 20 most popular emojis are positive?

- Which emoji (with more than 500 mentions) is the most positive?

- Which emoji (with more than 500 mentions) is the most negative?

- Where in the tweets are most emojis located (i.e. at the beginning or the end)?

- Is there a difference in the placement of positive versus negative emojis within a tweet?

In [None]:
# YOUR CODE HERE (add additional cells as needed)
df.info()
-df.describe()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 751 entries, 0 to 750
Data columns (total 11 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Char                      751 non-null    object 
 1   Image [twemoji]           751 non-null    object 
 2   Unicode codepoint         751 non-null    object 
 3   Occurrences [5...max]     751 non-null    int64  
 4   Position [0...1]          751 non-null    float64
 5   Neg [0...1]               751 non-null    float64
 6   Neut [0...1]              751 non-null    float64
 7   Pos [0...1]               751 non-null    float64
 8   Sentiment bar (c.i. 95%)  0 non-null      float64
 9   Unicode name              751 non-null    object 
 10  Unicode block             751 non-null    object 
dtypes: float64(5), int64(1), object(5)
memory usage: 64.7+ KB


Unnamed: 0,Occurrences [5...max],Position [0...1],Neg [0...1],Neut [0...1],Pos [0...1],Sentiment bar (c.i. 95%)
count,751.0,751.0,751.0,751.0,751.0,0.0
mean,208.331558,0.665549,0.163784,0.388993,0.447237,
std,804.865155,0.16445,0.137368,0.18169,0.186525,
min,5.0,0.012,0.006,0.014,0.007,
25%,12.0,0.575,0.0695,0.254,0.313,
50%,33.0,0.688,0.121,0.349,0.447,
75%,115.0,0.7895,0.209,0.5,0.5955,
max,14622.0,0.994,0.778,0.987,0.972,


In [59]:
emoji = df[['Occurrences [5...max]','Position [0...1]','Neg [0...1]','Neut [0...1]','Pos [0...1]']]
emoji

Unnamed: 0,Occurrences [5...max],Position [0...1],Neg [0...1],Neut [0...1],Pos [0...1]
0,14622,0.805,0.247,0.285,0.468
1,8050,0.747,0.044,0.166,0.790
2,7144,0.754,0.035,0.272,0.693
3,6359,0.765,0.052,0.219,0.729
4,5526,0.803,0.436,0.220,0.343
...,...,...,...,...,...
746,5,0.937,0.125,0.625,0.250
747,5,0.977,0.375,0.375,0.250
748,5,0.971,0.125,0.750,0.125
749,5,0.435,0.125,0.750,0.125


In [61]:
new_name={
    'Occurrences [5...max]':'Occurrences',
    'Position [0...1]':'Position',
    'Neg [0...1]':'Neg',
    'Neut [0...1]':'Neut',
    'Pos [0...1]':'Pos'
}

emoji = emoji.rename(columns=new_name)
emoji

Unnamed: 0,Occurrences,Position,Neg,Neut,Pos
0,14622,0.805,0.247,0.285,0.468
1,8050,0.747,0.044,0.166,0.790
2,7144,0.754,0.035,0.272,0.693
3,6359,0.765,0.052,0.219,0.729
4,5526,0.803,0.436,0.220,0.343
...,...,...,...,...,...
746,5,0.937,0.125,0.625,0.250
747,5,0.977,0.375,0.375,0.250
748,5,0.971,0.125,0.750,0.125
749,5,0.435,0.125,0.750,0.125


In [65]:
emoji['Sentiment']=emoji.eval('Pos-Neg')
emoji

Unnamed: 0,Occurrences,Position,Neg,Neut,Pos,Sentiment
0,14622,0.805,0.247,0.285,0.468,0.221
1,8050,0.747,0.044,0.166,0.790,0.746
2,7144,0.754,0.035,0.272,0.693,0.658
3,6359,0.765,0.052,0.219,0.729,0.677
4,5526,0.803,0.436,0.220,0.343,-0.093
...,...,...,...,...,...,...
746,5,0.937,0.125,0.625,0.250,0.125
747,5,0.977,0.375,0.375,0.250,-0.125
748,5,0.971,0.125,0.750,0.125,0.000
749,5,0.435,0.125,0.750,0.125,0.000


In [69]:
emoji['positive_flag']=emoji['Sentiment']>0
emoji

Unnamed: 0,Occurrences,Position,Neg,Neut,Pos,Sentiment,positive_flag
0,14622,0.805,0.247,0.285,0.468,0.221,True
1,8050,0.747,0.044,0.166,0.790,0.746,True
2,7144,0.754,0.035,0.272,0.693,0.658,True
3,6359,0.765,0.052,0.219,0.729,0.677,True
4,5526,0.803,0.436,0.220,0.343,-0.093,False
...,...,...,...,...,...,...,...
746,5,0.937,0.125,0.625,0.250,0.125,True
747,5,0.977,0.375,0.375,0.250,-0.125,False
748,5,0.971,0.125,0.750,0.125,0.000,False
749,5,0.435,0.125,0.750,0.125,0.000,False


In [73]:
per_pos=emoji['positive_flag'].mean()*100
per_pos

82.42343541944075

In [81]:
emoji=emoji.sort_values('Occurrences',ascending=False)
print(emoji['positive_flag'].head(20).mean()*100)

90.0
