# Emoji Sentiment

Are popular emojis generally associated with positive or negative sentiments?

The file `"emoji-sentiment.csv"` provides data on the sentiment associated with various emojis.

Researchers examined 1.6 million tweets across 13 European languages. Each tweet was labeled by annotators as positive (+1), negative (-1), or neutral (0). About 4% of these tweets included emojis.

Columns include:
- `Occurrences [5...max]`: Number of times the emoji appears in the dataset.
- `Position [0...1]`: Average position of the emoji in tweets, from start (0) to end (1).
- `Neg [0...1]`: Percentage of tweets with the emoji that are 'negative'.
- `Neu [0...1]`: Percentage of tweets with the emoji that are 'neutral'.
- `Pos [0...1]`: Percentage of tweets with the emoji that are 'positive'.



In [66]:
# FOR GOOGLE COLAB ONLY.
# Uncomment and run the code below. A dialog will appear to upload files.
# Upload 'emoji-sentiment.csv'.

# from google.colab import files
# uploaded = files.upload()

In [67]:
import pandas as pd
df = pd.read_csv('emoji-sentiment.csv')
df.head(3)

Unnamed: 0,Char,Image [twemoji],Unicode codepoint,Occurrences [5...max],Position [0...1],Neg [0...1],Neut [0...1],Pos [0...1],Sentiment bar (c.i. 95%),Unicode name,Unicode block
0,😂,😂,0x1f602,14622,0.805,0.247,0.285,0.468,,FACE WITH TEARS OF JOY,Emoticons
1,❤,❤,0x2764,8050,0.747,0.044,0.166,0.79,,HEAVY BLACK HEART,Dingbats
2,♥,♥,0x2665,7144,0.754,0.035,0.272,0.693,,BLACK HEART SUIT,Miscellaneous Symbols


### Project Ideas:

Data Cleaning: 
- Remove unnecessary columns that are not useful for your analysis.

- Rename the remaining columns using `snake_case` (all lowercase letters with underscores between words).

New Variables:
- Add a new column called `sentiment`, where sentiment = (% positive tweets) - (% negative tweets).

- Add a `positive_flag` column that is `True` if `sentiment > 0` (or above a set threshold), otherwise `False`.

Types of questions you can now answer more easily:
- What percentage of emojis in the dataset have a positive sentiment?

- What percentage of the top 20 most popular emojis are positive?

- Which emoji (with more than 500 mentions) is the most positive?

- Which emoji (with more than 500 mentions) is the most negative?

- Where in the tweets are most emojis located (i.e. at the beginning or the end)?

- Is there a difference in the placement of positive versus negative emojis within a tweet?

In [68]:
# YOUR CODE HERE (add additional cells as needed)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 751 entries, 0 to 750
Data columns (total 11 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Char                      751 non-null    object 
 1   Image [twemoji]           751 non-null    object 
 2   Unicode codepoint         751 non-null    object 
 3   Occurrences [5...max]     751 non-null    int64  
 4   Position [0...1]          751 non-null    float64
 5   Neg [0...1]               751 non-null    float64
 6   Neut [0...1]              751 non-null    float64
 7   Pos [0...1]               751 non-null    float64
 8   Sentiment bar (c.i. 95%)  0 non-null      float64
 9   Unicode name              751 non-null    object 
 10  Unicode block             751 non-null    object 
dtypes: float64(5), int64(1), object(5)
memory usage: 64.7+ KB


In [69]:
df.columns

Index(['Char', 'Image [twemoji]', 'Unicode codepoint', 'Occurrences [5...max]',
       'Position [0...1]', 'Neg [0...1]', 'Neut [0...1]', 'Pos [0...1]',
       'Sentiment bar (c.i. 95%)', 'Unicode name', 'Unicode block'],
      dtype='object')

In [70]:
analyze_columns=['Image [twemoji]','Occurrences [5...max]','Neg [0...1]', 'Neut [0...1]','Pos [0...1]','Position [0...1]']
columns_renamed={'Image [twemoji]':'image_icones','Occurrences [5...max]':'occurences','Neg [0...1]':'emotion_neg', 'Neut [0...1]':'emotion_neut','Pos [0...1]':'emotion_pos','Position [0...1]':'position'}

analyze_data=df[analyze_columns]
analyze_data.rename(columns=columns_renamed,inplace=True)
analyze_data

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  analyze_data.rename(columns=columns_renamed,inplace=True)


Unnamed: 0,image_icones,occurences,emotion_neg,emotion_neut,emotion_pos,position
0,😂,14622,0.247,0.285,0.468,0.805
1,❤,8050,0.044,0.166,0.790,0.747
2,♥,7144,0.035,0.272,0.693,0.754
3,😍,6359,0.052,0.219,0.729,0.765
4,😭,5526,0.436,0.220,0.343,0.803
...,...,...,...,...,...,...
746,♮,5,0.125,0.625,0.250,0.937
747,🅾,5,0.375,0.375,0.250,0.977
748,🔄,5,0.125,0.750,0.125,0.971
749,☄,5,0.125,0.750,0.125,0.435


In [71]:
analyze_data['sentiment'] =analyze_data.eval('emotion_pos - emotion_neg')
analyze_data['positive_flag']=analyze_data['sentiment'] >0
analyze_data

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  analyze_data['sentiment'] =analyze_data.eval('emotion_pos - emotion_neg')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  analyze_data['positive_flag']=analyze_data['sentiment'] >0


Unnamed: 0,image_icones,occurences,emotion_neg,emotion_neut,emotion_pos,position,sentiment,positive_flag
0,😂,14622,0.247,0.285,0.468,0.805,0.221,True
1,❤,8050,0.044,0.166,0.790,0.747,0.746,True
2,♥,7144,0.035,0.272,0.693,0.754,0.658,True
3,😍,6359,0.052,0.219,0.729,0.765,0.677,True
4,😭,5526,0.436,0.220,0.343,0.803,-0.093,False
...,...,...,...,...,...,...,...,...
746,♮,5,0.125,0.625,0.250,0.937,0.125,True
747,🅾,5,0.375,0.375,0.250,0.977,-0.125,False
748,🔄,5,0.125,0.750,0.125,0.971,0.000,False
749,☄,5,0.125,0.750,0.125,0.435,0.000,False


In [72]:
analyze_data['positive_flag'].mean()*100

np.float64(82.42343541944075)

In [73]:
top_20=analyze_data.sort_values(by='occurences',ascending=False).head(20)
top_20['positive_flag'].mean()*100

np.float64(90.0)

In [75]:
most_occurences= analyze_data[analyze_data.eval('occurences > 500') == True]

display(most_occurences[['positive_flag','image_icones']].max())
display(most_occurences[['positive_flag','image_icones']].min())

positive_flag    True
image_icones        🙏
dtype: object

positive_flag    False
image_icones         █
dtype: object

In [None]:
analyze_data['position'].mean()

np.float64(0.6655486018641811)

In [76]:
analyze_data[analyze_data['positive_flag']==True]['position'].mean()

np.float64(0.662248788368336)

In [77]:
analyze_data[analyze_data['positive_flag']==False]['position'].mean()

np.float64(0.6810227272727273)