# Extract emoji symbols

In [1]:
import pandas as pd
import emoji

pd.set_option('display.max_colwidth', None)

In [2]:
train_dataset = pd.read_csv('../dataset/train.tsv', sep='\t')
validation_dataset = pd.read_csv('../dataset/valid.tsv', sep='\t')
test_dataset = pd.read_csv('../dataset/test.tsv', sep='\t')
train_dataset.head(3)

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,Unnamed: 0.1.1,tweet_no,tweet_text,q1_label,q2_label,q3_label,q4_label,q5_label,q6_label,q7_label,language,tweet_link,tweet_link_count,preprocessed_tweet_text,emojis,translated_emojis
0,0,0,0,1,For the average American the best way to tell if you have covid-19 is to cough in a rich person’s face and wait for their test results,no,,,,,no,no,en,[],0,For the average American the best way to tell if you have covid-19 is to cough in a rich person’s face and wait for their test results,,
1,1,1,1,2,this is fucking bullshit,no,,,,,no,no,en,[],0,this is fucking bullshit,,
2,2,2,2,3,Can y’all please just follow the government’s instructions so we can knock this COVID-19 out and be done?! I feel like a kindergartner that keeps losing more recess time because one or two kids can’t follow directions.,no,,,,,no,no,en,[],0,Can y’all please just follow the government’s instructions so we can knock this COVID-19 out and be done?! I feel like a kindergartner that keeps losing more recess time because one or two kids can’t follow directions.,,


## Extraction of emojis and translation of them into words

In [3]:
text = '🤔🙈 Well, this is interesting 😌'
result = ''.join([(emoji.demojize(c) + ' ' if c in emoji.UNICODE_EMOJI['en'] else c) for c in text])
result

':thinking_face: :see-no-evil_monkey:  Well, this is interesting :relieved_face: '

In [4]:
def preprocess_tweet_text_with_emojis(dataframe):
    dataframe['preprocessed_tweet_text'] = dataframe.apply(lambda row: ''.join([(' ' + emoji.demojize(c) + ' ' if c in emoji.UNICODE_EMOJI['en'] else c) for c in row['tweet_text']]), axis=1)
    dataframe['emojis'] = dataframe.apply(lambda row: ''.join(c for c in row['tweet_text'] if c in emoji.UNICODE_EMOJI['en']), axis=1)
    dataframe['translated_emojis'] = dataframe.apply(lambda row: ''.join(' ' + emoji.demojize(c) + ' ' for c in row['tweet_text'] if c in emoji.UNICODE_EMOJI['en']), axis=1)
    return dataframe

In [5]:
train_dataset = preprocess_tweet_text_with_emojis(train_dataset)
validation_dataset = preprocess_tweet_text_with_emojis(validation_dataset)
test_dataset = preprocess_tweet_text_with_emojis(test_dataset)

In [6]:
train_dataset.loc[7:8,['tweet_text', 'preprocessed_tweet_text', 'emojis', 'translated_emojis']]

Unnamed: 0,tweet_text,preprocessed_tweet_text,emojis,translated_emojis
7,This is Dr. Usama Riaz. He spent past weeks screening and treating patients with Corona Virus in Pakistan. He knew there was no PPE. He persisted anyways. Today he lost his own battle with coronavirus but he gave life and hope to so many more. KNOW HIS NAME 😭❤ URL,This is Dr. Usama Riaz. He spent past weeks screening and treating patients with Corona Virus in Pakistan. He knew there was no PPE. He persisted anyways. Today he lost his own battle with coronavirus but he gave life and hope to so many more. KNOW HIS NAME :loudly_crying_face: :red_heart: URL,😭❤,:loudly_crying_face: :red_heart:
8,fun fact: its tradition for europeans to spread a potentially fatal disease to every other country not fully inhabited by white people &amp; not take accountability for the subsequent devastation &amp; lives lost,fun fact: its tradition for europeans to spread a potentially fatal disease to every other country not fully inhabited by white people &amp; not take accountability for the subsequent devastation &amp; lives lost,,


In [7]:
train_dataset.to_csv('../dataset/train.tsv', sep='\t')
validation_dataset.to_csv('../dataset/valid.tsv', sep='\t')
test_dataset.to_csv('../dataset/test.tsv', sep='\t')

## Q1 analysis on emojis
**Verifiable Factual Claim: Does the tweet contain a verifiable factual claim?**

A verifiable factual claim is a sentence claiming that something is true, and this can be verified using factual, verifiable information such as statistics, specific examples, or personal testimony.

In [8]:
train_dataset[train_dataset['q1_label'] == 'no']['emojis'].value_counts()

       1682
👉        13
😂        10
🤔         7
👇         7
       ... 
😅😂        1
😈         1
📷         1
🏡📱        1
🔴🎥➡       1
Name: emojis, Length: 210, dtype: int64

In [9]:
train_dataset[train_dataset['q1_label'] == 'yes']['emojis'].value_counts()

            4059
🔴             39
👇             16
👉             12
➡              9
            ... 
🤓😃🎁😷👩⚕👨⚕       1
⤵              1
🤩🎉🎉            1
📧              1
😀👏             1
Name: emojis, Length: 219, dtype: int64

In [10]:
train_dataset[train_dataset['emojis'] == '💉💉🗓'][['tweet_text', 'preprocessed_tweet_text', 'translated_emojis']]

Unnamed: 0,tweet_text,preprocessed_tweet_text,translated_emojis
5172,"The Brazilian government @jairbolsonaro signs an agreement with AstraZeneca @AstraZeneca to transfer technology and manufacture the 💉AZD1222 vaccine (Oxford vaccination) The agreement guarantees Brazil the right to manufacture 💉100 million doses, and the future of the pharmaceutical industry 🗓The start of manufacturing the vaccine is expected in December 2020 https://t.co/uIzOQY3782 (277) https://t.co/9zgjx1vUDe","The Brazilian government @jairbolsonaro signs an agreement with AstraZeneca @AstraZeneca to transfer technology and manufacture the :syringe: AZD1222 vaccine (Oxford vaccination) The agreement guarantees Brazil the right to manufacture :syringe: 100 million doses, and the future of the pharmaceutical industry :spiral_calendar: The start of manufacturing the vaccine is expected in December 2020 https://t.co/uIzOQY3782 (277) https://t.co/9zgjx1vUDe",:syringe: :syringe: :spiral_calendar:


In [11]:
train_dataset[train_dataset['emojis'] == '🔴'][['tweet_text', 'preprocessed_tweet_text', 'translated_emojis']].head(5)

Unnamed: 0,tweet_text,preprocessed_tweet_text,translated_emojis
1028,"🔴 # Coronavirus: After a successful limited test in Marseille, Sanofi launches a national trial in # France of the malaria drug #Plaquenil, which has shown encouraging results against # COVID19 https://t.co/lpZFi0gkaR",":red_circle: # Coronavirus: After a successful limited test in Marseille, Sanofi launches a national trial in # France of the malaria drug #Plaquenil, which has shown encouraging results against # COVID19 https://t.co/lpZFi0gkaR",:red_circle:
3871,Vote 🔴. . Do you think that the (curfew) implemented by some countries prevents the spread of the Corona virus .. and why? . . Yes = retweet. no = favourite,Vote :red_circle: . . Do you think that the (curfew) implemented by some countries prevents the spread of the Corona virus .. and why? . . Yes = retweet. no = favourite,:red_circle:
3910,"#video 🔴 . . The Saudi journalist “Ahmed Al-Maliki” sends a strongly worded message to the #banks: The rest of you, where are your initiatives for the homeland and the citizen?, stressing, “If money is your greatest concern, then not all citizens will forget your betrayal.”","#video :red_circle: . . The Saudi journalist “Ahmed Al-Maliki” sends a strongly worded message to the #banks: The rest of you, where are your initiatives for the homeland and the citizen?, stressing, “If money is your greatest concern, then not all citizens will forget your betrayal.”",:red_circle:
3916,Urgent 🔴. . A spokesman for the Ministry of Health: The coming period will witness a rise in cases infected with the Corona virus worldwide... and we may resort to a curfew if we detect a complacency in the precautionary measures. . . #Saudi,Urgent :red_circle: . . A spokesman for the Ministry of Health: The coming period will witness a rise in cases infected with the Corona virus worldwide... and we may resort to a curfew if we detect a complacency in the precautionary measures. . . #Saudi,:red_circle:
3934,"Video 🔴. . With words from a sincere heart... he witnessed a security man giving parental advice to a group of young men who went on a road trip, ignoring repeated warnings. . . Corona Virus . URL","Video :red_circle: . . With words from a sincere heart... he witnessed a security man giving parental advice to a group of young men who went on a road trip, ignoring repeated warnings. . . Corona Virus . URL",:red_circle:


## Q2 analysis on emojis 
**False Information: To what extent does the tweet appear to contain false information?**

The stated claim may contain false information. This question labels the tweets with the categories mentioned below. False Information appears on social media platforms, blogs, and news-articles to deliberately misinform or deceive the readers.

In [12]:
train_dataset[train_dataset['q2_label'] == 'no']['emojis'].value_counts()

       3607
🔴        33
👉        12
➡         9
👇         9
       ... 
📧         1
💉😮        1
🤔         1
📲💻👉       1
‼🔴🕛       1
Name: emojis, Length: 181, dtype: int64

In [13]:
train_dataset[train_dataset['q2_label'] == 'yes']['emojis'].value_counts()

                            408
👇                             7
🔴                             5
🔻                             3
👍🏻                            2
🚨                             2
‼                             2
✋🤔                            2
🛑                             2
🔴‼                            2
⚠                             2
😏                             1
🔴🔴🔴                           1
❤🌍                            1
⚠☠                            1
🤧🤧👇                           1
📌📌📌                           1
💰                             1
⤵                             1
⁉                             1
😭                             1
👇🏼                            1
💉💉💉🗓👥🧠💉                       1
🤔👇                            1
🤷🏻♂                           1
🛑♦                            1
⛔📍                            1
🤝                             1
❣❤💉💉                          1
🤚                             1
🔴🔴                            1
😂       

In [14]:
train_dataset[train_dataset['emojis'] == '💉💉💉🗓👥🧠💉'][['tweet_text', 'preprocessed_tweet_text', 'translated_emojis']]

Unnamed: 0,tweet_text,preprocessed_tweet_text,translated_emojis
4903,"The US government is booking 💉100 million doses of the Johnson &amp; Johnson vaccination 💉Ad26.COV2.S💉 and the possibility of buying an additional 200 million doses https://t.co/NaqT1jmWuf 🗓 In September, the third phase clinical trials will begin on 👥60 thousand people in 🇺🇸🇧🇷🇲🇽🇿🇦🇺 🇦🇵🇪🇵🇭🇨🇱🇨🇴 https://t.co/p1aEGY5e72 🧠 Vaccination: Single dose💉 (258) https://t.co/fEvUBLe1Ck","The US government is booking :syringe: 100 million doses of the Johnson &amp; Johnson vaccination :syringe: Ad26.COV2.S :syringe: and the possibility of buying an additional 200 million doses https://t.co/NaqT1jmWuf :spiral_calendar: In September, the third phase clinical trials will begin on :busts_in_silhouette: 60 thousand people in 🇺🇸🇧🇷🇲🇽🇿🇦🇺 🇦🇵🇪🇵🇭🇨🇱🇨🇴 https://t.co/p1aEGY5e72 :brain: Vaccination: Single dose :syringe: (258) https://t.co/fEvUBLe1Ck",:syringe: :syringe: :syringe: :spiral_calendar: :busts_in_silhouette: :brain: :syringe:


## Q3 analysis on emojis
**Interest to General Public: Will the tweet have an effect on or be of interest to the general public?**

Most often people do not make interesting claims, which can be verified by our general knowledge. For example, "Sky is blue'' is a claim, however, it is not interesting to the general public. In general, topics such as healthcare, political news and findings, and current events are of higher interest to the general public. Using the five point Likert scale the labels are defined below.

In [15]:
train_dataset[train_dataset['q3_label'] == 'no']['emojis'].value_counts()

           116
😭            1
🤲🏼           1
🙂            1
👴🥰           1
😂😂😂          1
💜💜💜💜💜💜💜      1
😷            1
📆🚀🔜          1
©            1
🎖🏆🏆🏆💷📰✍      1
😊            1
💔            1
Name: emojis, dtype: int64

In [16]:
train_dataset[train_dataset['q3_label'] == 'yes']['emojis'].value_counts()

            3957
🔴             39
👇             16
👉             13
➡              9
            ... 
🤓😃🎁😷👩⚕👨⚕       1
⤵              1
🤩🎉🎉            1
📧              1
😀👏             1
Name: emojis, Length: 210, dtype: int64

## Q4 analysis on emojis
**Harmfulness: To what extent is the tweet harmful to the society/person(s)/company(s)/product(s)?**

The purpose of this question is to determine if the content of the tweet aims to and can negatively affect the society as a whole, specific person(s), company(s), product(s) or spread rumors about them. The content intends to harm or weaponize the information. A rumor involves a form of a statement whose veracity is not quickly or ever confirmed.

In [17]:
train_dataset[train_dataset['q4_label'] == 'no']['emojis'].value_counts()

       3443
🔴        33
👉        13
👇        10
➡         9
       ... 
🤩🎉🎉       1
📧         1
💉😮        1
⛔📍        1
🔵🔵        1
Name: emojis, Length: 185, dtype: int64

In [18]:
train_dataset[train_dataset['q4_label'] == 'yes']['emojis'].value_counts()

           623
👇            6
🔴            4
⚠            2
🔴‼           2
✋🤔           2
🔻            2
‼            2
©            2
👀            1
💵            1
🤔👇           1
🌐🌐           1
👈            1
🤦🏿♂          1
❤🌍           1
🤧🤧👇          1
💶            1
💉👇           1
⁉            1
🌐            1
🤷🏻♂          1
🙈            1
🤬            1
❌❌❌❌         1
😏            1
❌👇           1
🤝            1
🦠🧪🎤          1
⚠👇🏻👇🏻👇🏻      1
🔴🔴           1
😂            1
👁            1
⚠☠           1
📌            1
🌍🌍🌍☀🌍🌍🌍      1
📹            1
🚨            1
💡            1
❗❗           1
😂😂😂          1
⭕            1
😀            1
🔴🔴👇🏻         1
🔴🔴‼🌹         1
📽            1
😏🌚           1
🧪            1
🔴👇🏻          1
😊            1
👇🏼           1
😔            1
Name: emojis, dtype: int64