In [1]:
import csv
import pandas as pd

In [2]:
df = pd.read_csv('EmojiData.csv', delimiter=',')

In [3]:
df

Unnamed: 0,Name,icon,"EmojiXpress, mln","Instagram, mln","Twitter, mln"
0,Grinning,![image](https://pictures.s3.yandex.net/resour...,2.26,1.02,87.3
1,Beaming,![image](https://pictures.s3.yandex.net/resour...,19.1,1.69,150.0
2,ROFL,![image](https://pictures.s3.yandex.net/resour...,25.6,0.774,0.0
3,Tears of Joy,![image](https://pictures.s3.yandex.net/resour...,233.0,7.31,2270.0
4,Winking,![image](https://pictures.s3.yandex.net/resour...,15.2,2.36,264.0
5,Happy,![image](https://pictures.s3.yandex.net/resour...,22.7,4.26,565.0
6,Heart Eyes,![image](https://pictures.s3.yandex.net/resour...,64.6,11.2,834.0
7,Kissing,![image](https://pictures.s3.yandex.net/resour...,87.5,5.13,432.0
8,Thinking,![image](https://pictures.s3.yandex.net/resour...,6.81,0.636,0.0
9,Unamused,![image](https://pictures.s3.yandex.net/resour...,6.0,0.236,478.0


In [4]:
top_EmojiXpress = df[['Name', 'EmojiXpress, mln']].sort_values('EmojiXpress, mln', ascending=False).head()

In [5]:
top_Instagram = df[['Name', 'Instagram, mln']].sort_values('Instagram, mln', ascending=False).head()

In [6]:
top_Twitter = df[['Name', 'Twitter, mln']].sort_values('Twitter, mln', ascending=False).head()

In [7]:
top_EmojiXpress

Unnamed: 0,Name,"EmojiXpress, mln"
3,Tears of Joy,233.0
14,Heart,118.0
7,Kissing,87.5
6,Heart Eyes,64.6
2,ROFL,25.6


In [8]:
top_Instagram

Unnamed: 0,Name,"Instagram, mln"
14,Heart,26.0
6,Heart Eyes,11.2
3,Tears of Joy,7.31
13,Two Hearts,5.69
7,Kissing,5.13


In [9]:
top_Twitter

Unnamed: 0,Name,"Twitter, mln"
3,Tears of Joy,2270.0
14,Heart,1080.0
19,Recycle,932.0
6,Heart Eyes,834.0
15,Heart Suit,697.0



We can't sort by the data of Twitter alone - an artifact has fallen into the data.
<p>'Recycle' here looks suspicious. So sorting by others is risky: what if there are hidden problems too?</p>
<p>We need a more reliable criterion for the popularity of emoji.</p>


In [10]:
with open('EmojiData.csv') as f:
    df1 = list(csv.DictReader(f))

In [11]:
df1[0]

{'Name': 'Grinning',
 'icon': '![image](https://pictures.s3.yandex.net/resources/grinning_1548433261.png)',
 'EmojiXpress, mln': '2.26',
 'Instagram, mln': '1.02',
 'Twitter, mln': '87.3'}

In [25]:
print('{: <15} | {: >25}'.format('Emoji name', 'Twitter/Instagram ratio'))
print('-'*43)
for emoji in df1:
    print('{: <15} | {: >25.2f}'.format(emoji['Name'], 
                                        float(emoji['Twitter, mln'])/float(emoji['Instagram, mln']))
         )

Emoji name      |   Twitter/Instagram ratio
-------------------------------------------
Grinning        |                     85.59
Beaming         |                     88.76
ROFL            |                      0.00
Tears of Joy    |                    310.53
Winking         |                    111.86
Happy           |                    132.63
Heart Eyes      |                     74.46
Kissing         |                     84.21
Thinking        |                      0.00
Unamused        |                   2025.42
Sunglasses      |                     50.38
Loudly Crying   |                    484.44
Kiss Mark       |                     34.39
Two Hearts      |                     78.21
Heart           |                     41.54
Heart Suit      |                    382.97
Thumbs Up       |                     60.53
Shrugging       |                      0.00
Fire            |                     60.24
Recycle         |                  16642.86


<p>Conclusions:</p>
<p>Many values deviate 100 times from the average ratio.
But most striking is the “Recycling” emoji with incredible popularity
it is on Twitter (16 thousand times more than on Instagram).
I investigated this phenomenon and found out that posts with emoji “Recycling”
presumably created automatically, and completely put in
another meaning (more details can be found here:
https://medium.com/@mroth/why-the-emoji-recycling-symbol-is-taking-over-twitter-65ad4b18b04b).
</p>
<p>Emoji statistics have a heavy tail - people love variety.
Different platforms have different emoji preferences.
There are also very strange features - for example, as with the symbol "Recycling".
</p>

In [12]:
sum_use = 0
sum_use_all = []

In [13]:
for emoji in df1:
    name = emoji['Name']
    sum_use = round(float(emoji['EmojiXpress, mln']) + float(emoji['Instagram, mln']) 
               + float(emoji['Twitter, mln']), 2)
    sum_use_all.append([name, sum_use])
    sum_use = 0


In [14]:
sum_use_all

[['Grinning', 90.58],
 ['Beaming', 170.79],
 ['ROFL', 26.37],
 ['Tears of Joy', 2510.31],
 ['Winking', 281.56],
 ['Happy', 591.96],
 ['Heart Eyes', 909.8],
 ['Kissing', 524.63],
 ['Thinking', 7.45],
 ['Unamused', 484.24],
 ['Sunglasses', 206.65],
 ['Loudly Crying', 680.05],
 ['Kiss Mark', 123.27],
 ['Two Hearts', 460.69],
 ['Heart', 1224.0],
 ['Heart Suit', 702.13],
 ['Thumbs Up', 253.85],
 ['Shrugging', 1.85],
 ['Fire', 156.99],
 ['Recycle', 932.09]]

In [15]:
sum_use_all.sort(key = lambda n: n[1], reverse = True)

In [16]:
sum_use_all[:5]

[['Tears of Joy', 2510.31],
 ['Heart', 1224.0],
 ['Recycle', 932.09],
 ['Heart Eyes', 909.8],
 ['Heart Suit', 702.13]]

Recycle still in top 5.
<p>Calculate a new value for the analysis of the popularity of emojis:
for each column its scale is estimated (as an average value);
each value in the column is normalized (divided by scale);
normalized values add up.</p>
<p>Call this amount the “usage index”.</p>

In [17]:
emojixpress_sum = 0
instagram_sum = 0
twitter_sum = 0
for row in df1:
    emojixpress_sum += float(row['EmojiXpress, mln'])
    instagram_sum += float(row['Instagram, mln'])
    twitter_sum += float(row['Twitter, mln'])
    
emojixpress_mean = emojixpress_sum / len(df1)
instagram_mean = instagram_sum / len(df1)
twitter_mean = twitter_sum / len(df1)

In [18]:
data_norm = []
for row in df1:
    emojixpress_normalized = float(row['EmojiXpress, mln']) / emojixpress_mean
    instagram_normalized = float(row['Instagram, mln']) / instagram_mean
    twitter_normalized = float(row['Twitter, mln']) / twitter_mean
    index = round((emojixpress_normalized + instagram_normalized + twitter_normalized), 2)
    data_norm.append([row['Name'], index])

In [19]:
data_norm.sort(key=lambda x: x[1], reverse=True)

In [20]:
data_norm

[['Tears of Joy', 13.23],
 ['Heart', 11.95],
 ['Heart Eyes', 6.31],
 ['Kissing', 4.66],
 ['Happy', 2.87],
 ['Two Hearts', 2.6],
 ['Loudly Crying', 2.41],
 ['Thumbs Up', 2.05],
 ['Heart Suit', 1.99],
 ['Recycle', 1.96],
 ['Winking', 1.56],
 ['Kiss Mark', 1.53],
 ['Sunglasses', 1.5],
 ['Beaming', 1.27],
 ['Unamused', 1.23],
 ['Fire', 1.05],
 ['ROFL', 0.92],
 ['Grinning', 0.49],
 ['Thinking', 0.35],
 ['Shrugging', 0.08]]

<p>- The most popular emojis are associated with positive emotions - fun and love.</p>
<p>- On different platforms, the order and choice of emojis is slightly different, 
but the most popular emojis are everywhere alike.</p>
<p>- Individual resources have their own characteristics that generate statistical emissions, 
as is the case with the recycling symbol on Twitter. You can get rid of artifacts 
by collecting values from all platforms and rationing.</p>