In [1]:
import numpy as np
import pandas as pd
import nltk
import re
import spacy
import string

# Set the chained assignment mode to None to avoid pandas warnings
pd.options.mode.chained_assignment = None

In [2]:
# !pip install spacy 

In [3]:
# import dataset and take it's first 500 rows and 'Tweet' column and change data type of a column 

df = pd.read_csv(r'C:\Users\HP\Downloads\archive (4)\twitter_validation.csv', nrows=500)

df = df[['Tweet']]

df["Tweet"] = df["Tweet"].astype(str) # Assuming you want to select the "text" column and convert its values to strings

df

Unnamed: 0,Tweet
0,I mentioned on Facebook that I was struggling ...
1,BBC News - Amazon boss Jeff Bezos rejects clai...
2,@Microsoft Why do I pay for WORD when it funct...
3,"CSGO matchmaking is so full of closet hacking,..."
4,Now the President is slapping Americans in the...
...,...
495,special shoutouts to microsoft excel 2013
496,Dumb Lucky☘️ (Fortnite Montage) youtu.be/psW...
497,Dang there goes my birthday present but maybe ...
498,It was ab fab seeing the 6 bungalows built in ...


In [4]:
df.shape

(500, 1)

* **Tweet is the main variable, but there are many problems with text data, so we do not directly feed this variable into the model.**


* **For example, the @ symbol is not useful in text, and many tweets contain numbers that are not important. Therefore, we remove digits and special characters from the text. These special characters are sometimes present in the text because the data is not compactable with database or not properly stored in the dataset. When we fetch the data from the database, these special characters are created. This usually happens in Hadoop, where special characters are created after the data is stored and then fetched. Therefore, we need to remove these special characters.**


* **There are links and spelling errors in the tweet, so we need to remove the links and correct the spelling.**

# 1) Lower Casing

Lower casing is a common text preprocessing technique. The idea is to convert the input text into same casing format so that text, 'Text' and 'TEXT' are treated the same way.

In [5]:
df['Tweet_lower'] = df['Tweet'].str.lower()
df.head()

Unnamed: 0,Tweet,Tweet_lower
0,I mentioned on Facebook that I was struggling ...,i mentioned on facebook that i was struggling ...
1,BBC News - Amazon boss Jeff Bezos rejects clai...,bbc news - amazon boss jeff bezos rejects clai...
2,@Microsoft Why do I pay for WORD when it funct...,@microsoft why do i pay for word when it funct...
3,"CSGO matchmaking is so full of closet hacking,...","csgo matchmaking is so full of closet hacking,..."
4,Now the President is slapping Americans in the...,now the president is slapping americans in the...


# 2) Removal of Punctuations


* One another common text preprocessing technique is to remove the punctuations from the text data. This is again a text standardization process that will help to treat 'hurray' and 'hurray!' in the same way.


* We also need to carefully choose the list of punctuations to exclude depending on the use case. For example, the string punctuation in python contains the following punctuation symbols


* #$%&'()*+-<=>?@N^_-


* We can add or remove more punctuations as per our mood.

In [6]:
df.drop(['Tweet_lower'], axis=1, inplace=True)

In [7]:
PUNCT_TO_REMOVE = string.punctuation  # Punctuation characters to remove

def remove_punctuation(text):
    """Custom function to remove punctuation from the text."""
    return text.translate(str.maketrans('', '', PUNCT_TO_REMOVE))


df["text_wo_punct"] = df["Tweet"].apply(lambda text: remove_punctuation(text)) # Apply the custom function to remove punctuation and create a new column "text_wo_punct"
df.head(15)

Unnamed: 0,Tweet,text_wo_punct
0,I mentioned on Facebook that I was struggling ...,I mentioned on Facebook that I was struggling ...
1,BBC News - Amazon boss Jeff Bezos rejects clai...,BBC News Amazon boss Jeff Bezos rejects claim...
2,@Microsoft Why do I pay for WORD when it funct...,Microsoft Why do I pay for WORD when it functi...
3,"CSGO matchmaking is so full of closet hacking,...",CSGO matchmaking is so full of closet hacking ...
4,Now the President is slapping Americans in the...,Now the President is slapping Americans in the...
5,Hi @EAHelp I’ve had Madeleine McCann in my cel...,Hi EAHelp I’ve had Madeleine McCann in my cell...
6,Thank you @EAMaddenNFL!! \n\nNew TE Austin Hoo...,Thank you EAMaddenNFL \n\nNew TE Austin Hooper...
7,"Rocket League, Sea of Thieves or Rainbow Six: ...",Rocket League Sea of Thieves or Rainbow Six Si...
8,my ass still knee-deep in Assassins Creed Odys...,my ass still kneedeep in Assassins Creed Odyss...
9,FIX IT JESUS ! Please FIX IT ! What In the wor...,FIX IT JESUS Please FIX IT What In the world...


**If /n is showing after removing punctuation marks, ignore it becuase /n is a new line character.**

In [8]:
from string import punctuation
type(punctuation)

str

In [9]:
punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

**Here is a list of punctuation marks. You can see that the apostrophe (') is included in the list. If you do not want to remove this, please see below.**

In [10]:
my_punctuation = punctuation.replace("'", "")
my_punctuation

'!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~'

**See now apostrophe (') is removed from the list.**

In [11]:
"It's right, Isn't it?".translate(str.maketrans("", "", my_punctuation))

"It's right Isn't it"

# 3) Removal of stopwords

* Stop words are commonly occuring words in a language like 'the', 'a' and so on. They can be removed from the text most of the times, as they don't provide valuable information for downstream analysis. In cases like Part of Speech tagging, we should not remove them as provide very valuable information about the POS.


* These stop word lists are already compiled for different languages and we can safely use them. For example, the stopword list for english language from the nltk package can be seen below.

In [12]:
from nltk.corpus import stopwords

', '.join(stopwords.words('english'))

"i, me, my, myself, we, our, ours, ourselves, you, you're, you've, you'll, you'd, your, yours, yourself, yourselves, he, him, his, himself, she, she's, her, hers, herself, it, it's, its, itself, they, them, their, theirs, themselves, what, which, who, whom, this, that, that'll, these, those, am, is, are, was, were, be, been, being, have, has, had, having, do, does, did, doing, a, an, the, and, but, if, or, because, as, until, while, of, at, by, for, with, about, against, between, into, through, during, before, after, above, below, to, from, up, down, in, out, on, off, over, under, again, further, then, once, here, there, when, where, why, how, all, any, both, each, few, more, most, other, some, such, no, nor, not, only, own, same, so, than, too, very, s, t, can, will, just, don, don't, should, should've, now, d, ll, m, o, re, ve, y, ain, aren, aren't, couldn, couldn't, didn, didn't, doesn, doesn't, hadn, hadn't, hasn, hasn't, haven, haven't, isn, isn't, ma, mightn, mightn't, mustn, mus

**List of stop words in NLTK library**

In [13]:
STOPWORDS = set(stopwords.words("english"))  # Get the set of stopwords for the English language

def remove_stopwords(text):
    """Custom function to remove stopwords from the text."""
    return " ".join((word for word in str(text).split() if word.lower() not in STOPWORDS))

df["text_wo_stop"] = df["text_wo_punct"].apply(lambda text: remove_stopwords(text))

df.head(10)

Unnamed: 0,Tweet,text_wo_punct,text_wo_stop
0,I mentioned on Facebook that I was struggling ...,I mentioned on Facebook that I was struggling ...,mentioned Facebook struggling motivation go ru...
1,BBC News - Amazon boss Jeff Bezos rejects clai...,BBC News Amazon boss Jeff Bezos rejects claim...,BBC News Amazon boss Jeff Bezos rejects claims...
2,@Microsoft Why do I pay for WORD when it funct...,Microsoft Why do I pay for WORD when it functi...,Microsoft pay WORD functions poorly SamsungUS ...
3,"CSGO matchmaking is so full of closet hacking,...",CSGO matchmaking is so full of closet hacking ...,CSGO matchmaking full closet hacking truly awf...
4,Now the President is slapping Americans in the...,Now the President is slapping Americans in the...,President slapping Americans face really commi...
5,Hi @EAHelp I’ve had Madeleine McCann in my cel...,Hi EAHelp I’ve had Madeleine McCann in my cell...,Hi EAHelp I’ve Madeleine McCann cellar past 13...
6,Thank you @EAMaddenNFL!! \n\nNew TE Austin Hoo...,Thank you EAMaddenNFL \n\nNew TE Austin Hooper...,Thank EAMaddenNFL New TE Austin Hooper ORANGE ...
7,"Rocket League, Sea of Thieves or Rainbow Six: ...",Rocket League Sea of Thieves or Rainbow Six Si...,Rocket League Sea Thieves Rainbow Six Siege🤔 l...
8,my ass still knee-deep in Assassins Creed Odys...,my ass still kneedeep in Assassins Creed Odyss...,ass still kneedeep Assassins Creed Odyssey way...
9,FIX IT JESUS ! Please FIX IT ! What In the wor...,FIX IT JESUS Please FIX IT What In the world...,FIX JESUS Please FIX world going PlayStation A...


# 4) Removal of Frequent words

* In the previos preprocessing step, we removed the stopwords based on language information. But say, if we have a domain specific corpus, we might also have some frequent words which are of not so much importance to us.


* So this step is to remove the frequent words in the given corpus. If we use something like tfidf, this is automatically taken care of.


* Let us get the most common words and then remove them in the next step

In [14]:
from collections import Counter

cnt = Counter()

for text in df["text_wo_stop"].values:  # Iterate through the "text_wo_stop" column and count word occurrences
    for word in text.split():
        cnt[word] += 1

cnt.most_common(10) # Get the 10 most common words and their counts

[('game', 47),
 ('like', 35),
 ('Johnson', 34),
 ('get', 30),
 ('2', 29),
 ('love', 23),
 ('playing', 22),
 ('best', 21),
 ('Im', 20),
 ('time', 20)]

In [15]:
# Code for removing most frequent words 

FREQWORDS = set([w for (w, wc) in cnt.most_common(10)])

def remove_freqwords(text):
    # Custom function to remove the frequent words
    return " ".join([word for word in str(text).split() if word.lower() not in FREQWORDS])

df["text_wo_stopfreq"] = df["text_wo_stop"].apply(lambda text: remove_freqwords(text))
df.head()

Unnamed: 0,Tweet,text_wo_punct,text_wo_stop,text_wo_stopfreq
0,I mentioned on Facebook that I was struggling ...,I mentioned on Facebook that I was struggling ...,mentioned Facebook struggling motivation go ru...,mentioned Facebook struggling motivation go ru...
1,BBC News - Amazon boss Jeff Bezos rejects clai...,BBC News Amazon boss Jeff Bezos rejects claim...,BBC News Amazon boss Jeff Bezos rejects claims...,BBC News Amazon boss Jeff Bezos rejects claims...
2,@Microsoft Why do I pay for WORD when it funct...,Microsoft Why do I pay for WORD when it functi...,Microsoft pay WORD functions poorly SamsungUS ...,Microsoft pay WORD functions poorly SamsungUS ...
3,"CSGO matchmaking is so full of closet hacking,...",CSGO matchmaking is so full of closet hacking ...,CSGO matchmaking full closet hacking truly awf...,CSGO matchmaking full closet hacking truly awful
4,Now the President is slapping Americans in the...,Now the President is slapping Americans in the...,President slapping Americans face really commi...,President slapping Americans face really commi...


# 5) Removal of Rare words

This is very similar to previous preprocessing step but we will remove the rare words from the corpus.

In [16]:
# Drop the two columns which are no more needed

df.drop(["text_wo_punct", "text_wo_stop"], axis=1, inplace=True)

In [17]:
# Code for removing rare words

n_rare_words = 10
RAREWORDS = set([w for (w, wc) in cnt.most_common()[:-n_rare_words-1:-1]])

def remove_rarewords(text):
    """Custom function to remove rare words"""
    return " ".join([word for word in str(text).split() if word not in RAREWORDS])

df["text_wo_stopfregrare"] = df["text_wo_stopfreq"].apply(lambda text: remove_rarewords(text))

df.head()

Unnamed: 0,Tweet,text_wo_stopfreq,text_wo_stopfregrare
0,I mentioned on Facebook that I was struggling ...,mentioned Facebook struggling motivation go ru...,mentioned Facebook struggling motivation go ru...
1,BBC News - Amazon boss Jeff Bezos rejects clai...,BBC News Amazon boss Jeff Bezos rejects claims...,BBC News Amazon boss Jeff Bezos rejects claims...
2,@Microsoft Why do I pay for WORD when it funct...,Microsoft pay WORD functions poorly SamsungUS ...,Microsoft pay WORD functions poorly SamsungUS ...
3,"CSGO matchmaking is so full of closet hacking,...",CSGO matchmaking full closet hacking truly awful,CSGO matchmaking full closet hacking truly awful
4,Now the President is slapping Americans in the...,President slapping Americans face really commi...,President slapping Americans face really commi...


**Rare words are useless that's why we remove them.**

In [18]:
RAREWORDS

{'17',
 'CommLedHousing',
 'Trustees',
 'beyond',
 'colours',
 'minecraft',
 'pictwittercomNsBHo8i85Z',
 'pictwittercomxKQnayVdHk',
 'spade',
 'viewers'}

# 6) Stemming

* Stemming is the process of reducing inflected (or sometimes derived) words to their word stem, base or root form


* For example, if there are two words in the corpus walks and walking, then stemming will stem the suffix to make them walk. But say in another example, we have two words console and consoling, the stemmer will remove the suffix and make them consol which is not a proper english word.

In [19]:
# Drop the two columns

df.drop(["text_wo_stopfreq", "text_wo_stopfregrare"], axis=1, inplace=True)

In [22]:
from nltk.stem.porter import PorterStemmer

stemmer = PorterStemmer()

def stem_words(text):
    return " ".join([stemmer.stem(word) for word in text.split()])

df["Tweet_stemmed"] = df["Tweet"].apply(lambda text: stem_words(text))

df.head(6)

Unnamed: 0,Tweet,Tweet_stemmed
0,I mentioned on Facebook that I was struggling ...,i mention on facebook that i wa struggl for mo...
1,BBC News - Amazon boss Jeff Bezos rejects clai...,bbc new - amazon boss jeff bezo reject claim c...
2,@Microsoft Why do I pay for WORD when it funct...,@microsoft whi do i pay for word when it funct...
3,"CSGO matchmaking is so full of closet hacking,...","csgo matchmak is so full of closet hacking, it..."
4,Now the President is slapping Americans in the...,now the presid is slap american in the face th...
5,Hi @EAHelp I’ve had Madeleine McCann in my cel...,hi @eahelp i’v had madelein mccann in my cella...


* We can see that word like i've have their e at the end chopped off due to stemming. This is not intented. What can we do fort hat? We can use Lemmatization in such cases.


* Also this porter stemmer is for English language. If we are working with other languages, we can use snowball stemmer. The supported languages for snowball stemmer are

In [23]:
from nltk.stem.snowball import SnowballStemmer
SnowballStemmer.languages

('arabic',
 'danish',
 'dutch',
 'english',
 'finnish',
 'french',
 'german',
 'hungarian',
 'italian',
 'norwegian',
 'porter',
 'portuguese',
 'romanian',
 'russian',
 'spanish',
 'swedish')

# 7) Lemmatization

* Lemmatization is similar to stemming in reducing inflected words to their word stem but differs in the way that it makes sure the root word (also called as lemma) belongs to the language.


* As a result, this one is generally slower than stemming process. So depending on the speed requirement, we can choose to use either stemming or lemmatization.


* Let us use the WordNetLemmatizer in nltk to lemmatize our sentences.

In [26]:
from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()

def lemmatize_words(text):
    return " ".join([lemmatizer.lemmatize(word) for word in text.split()])

df["Tweet_lemmatized"] = df["Tweet"].apply(lambda text: lemmatize_words(text))

df.head(6)

Unnamed: 0,Tweet,Tweet_stemmed,Tweet_lemmatized
0,I mentioned on Facebook that I was struggling ...,i mention on facebook that i wa struggl for mo...,I mentioned on Facebook that I wa struggling f...
1,BBC News - Amazon boss Jeff Bezos rejects clai...,bbc new - amazon boss jeff bezo reject claim c...,BBC News - Amazon bos Jeff Bezos reject claim ...
2,@Microsoft Why do I pay for WORD when it funct...,@microsoft whi do i pay for word when it funct...,@Microsoft Why do I pay for WORD when it funct...
3,"CSGO matchmaking is so full of closet hacking,...","csgo matchmak is so full of closet hacking, it...","CSGO matchmaking is so full of closet hacking,..."
4,Now the President is slapping Americans in the...,now the presid is slap american in the face th...,Now the President is slapping Americans in the...
5,Hi @EAHelp I’ve had Madeleine McCann in my cel...,hi @eahelp i’v had madelein mccann in my cella...,Hi @EAHelp I’ve had Madeleine McCann in my cel...


**We can see that the trailing e in the i've is retained when we use lemmatization unlike stemming.**

**Wait. There is one more thing in lemmatization Let us try to lommatize running now.**



In [27]:
lemmatizer.lemmatize("running")

'running'

**It returned running as such without converting it to the root form run. This is because the lemmatization process depends on the POS tag to come up with the correct lemma. Now let us lemmatize again by providing the POS tag for the word.**

In [29]:
lemmatizer.lemmatize("running", "v") # v for verb

'run'

**Now we are getting the root form run. So we also need to provide the POS tag of the word along with the word for lemmatizer in nltk. Depending on the POS, the lemmatizer may return different results.**

**Let us take the example, stripes and check the lemma when it is both verb and noun.****

In [30]:
print("Word is: stripes")

print("Lemma result for verb:", lemmatizer.lemmatize("stripes", 'v'))
print("Lemma result for noun:", lemmatizer.lemmatize("stripes", 'n'))

Word is: stripes
Lemma result for verb: strip
Lemma result for noun: stripe


**Now let us redo the lemmatization process for our dataset.**

In [33]:
import nltk
nltk.download('averaged_perceptron_tagger')

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\HP\AppData\Roaming\nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


True

In [35]:
from nltk.corpus import wordnet
from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()

wordnet_map = {
    "N": wordnet.NOUN,
    "V": wordnet.VERB,
    "A": wordnet.ADJ,  
    "R": wordnet.ADV
}

def lemmatize_words(text):
    pos_tagged_text = nltk.pos_tag(text.split())
    return " ".join([lemmatizer.lemmatize(word, wordnet_map.get(pos[0], wordnet.NOUN)) for word, pos in pos_tagged_text])

df["Tweet_lemmatized"] = df["Tweet"].apply(lambda text: lemmatize_words(text))
df.head()

Unnamed: 0,Tweet,Tweet_stemmed,Tweet_lemmatized
0,I mentioned on Facebook that I was struggling ...,i mention on facebook that i wa struggl for mo...,I mention on Facebook that I be struggle for m...
1,BBC News - Amazon boss Jeff Bezos rejects clai...,bbc new - amazon boss jeff bezo reject claim c...,BBC News - Amazon bos Jeff Bezos reject claim ...
2,@Microsoft Why do I pay for WORD when it funct...,@microsoft whi do i pay for word when it funct...,@Microsoft Why do I pay for WORD when it funct...
3,"CSGO matchmaking is so full of closet hacking,...","csgo matchmak is so full of closet hacking, it...","CSGO matchmaking be so full of closet hacking,..."
4,Now the President is slapping Americans in the...,now the presid is slap american in the face th...,Now the President be slap Americans in the fac...


# 8) Removal of Emojis
With more and more usage of social media platforms, there is an explosion in the usage of emojis in our day to day life as well. Probably we might need to remove these emojis for some of our textual analysis.

In [36]:
def remove_emoji(string):
    # Emoji pattern (including emoticons, symbols, flags, etc.)
    emoji_pattern = re.compile("["
                               u"\U0001F600-\U0001F64F"  # emoticons
                               u"\U0001F300-\U0001F5FF"  # symbols & pictographs
                               u"\U0001F680-\U0001F6FF"  # transport & map symbols
                               u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
                               u"\U00002702-\U000027B0"  # other emoticons
                               u"\U000024C2-\U0001F251"  # J symbols
                               "]+", flags=re.UNICODE)
    return emoji_pattern.sub('', string)

remove_emoji("game is on 🔥🎉")

'game is on '

In [39]:
remove_emoji('Hilarious 😀')

'Hilarious '

**This code remove emojis with blank space.**

# 9) Removal of Emoticons

* This is what we did in the last step right? Nope. We did remove emojis in the last step but not emoticons. There is a minor difference between emojis and emoticons.

* From Grammarist.com, emoticon is built from keyboard characters that when put together in a certain way represent a facial expression, an emoji is an actual image.

:) is an emoticon

😀 is an emoji

In [70]:
EMOTICONS = {
    ":-)": "Happy face or smiley",
    ":D": "Big smile",
    ":')": "Tears of joy",
    ":(": "Sad face",
    ":'(": "Crying",
    ":-(": "Frowning",
    ";-)": "Winking",
    ":P": "Sticking out tongue",
    ":O": "Surprised",
    ":/": "Confused",
    ">.<": "Annoyed",
    "X(": "Angry",
    ":*": "Kiss",
    ":@": "Angry face",
    ":|": "Neutral face",
    ":\\": "Skeptical or unsure",
    ":s": "Confused",
    ":$": "Embarrassed",
    ":x": "Love-struck",
    "O:)": "Angel",
    "3:)": "Devil",
    ":!": "Surprised",
    ":C": "Unhappy",
    ";(": "Crying",
    "8-)": "Cool",
    ">:)": "Evil grin",
    ":B": "Nerd or geek",
    ":&": "Embarrassed or blushing",
    ":#": "Hashtag",
    ":^)": "Upbeat or optimistic",
    ":/\\": "Doubtful",
    ":.)": "Happy face or smiley",
    "=^.^=": "Cat face",
    ":{)": "Smiling with a Santa hat",
    ":~)": "Nasal congestion or cold",
    ":|]": "Robot",
    ":-)~": "Drunk or tipsy",
    ":-@": "Screaming or yelling",
    "~:~": "Frosty the Snowman",
    ":'-)": "Blushing or shy",
    ":!:)": "Clown",
    "=D>": "Applause",
    "<(:)": "Party",
    ":--)": "Bandit",
    "8-|": "Rolling eyes",
    "<:-|": "Dunce",
    ":@)": "Pig",
    "0:)": "Innocent",
    ":9": "Licking lips",
    "^_^": "Kawaii face or cute",
    "<3<3<3": "Triple heart",
    "=:o": "Surprised",
    "8-o": "Surprised",
    "8o": "Surprised",
    ":%)": "Confused",
    ":O:)": "Angel",
    "^5": "High five",
    ":*D": "Laughing",
    ":'(": "Upside-down smile",
    ":Dx": "Biting lip",
    ":S": "Confused or sad",
    ":&D": "Laughing",
    ":#)": "Witch",
    "<><": "Fish",
    "8D": "Evil grin",
    "(:|": "Monkey",
    ":-))": "Very happy",
    "*-)": "Cyclops",
    ":)>-": "Alien",
    "(-:": "Left-handed smile",
    "O:-)": "Innocent or angelic",
    ";P": "Tongue sticking out in jest",
    "[:|]": "Robot",
    ":')(": "Tears of sadness",
    "=)": "Happy face or smiley",
    ":--)~": "Drunk or tipsy",
    "O.o": "Confused or unsure",
    "=;": "Cool or winking",
    ":-?": "Confused",
    "@:-)": "Turban",
    ">:[": "Angry face",
    ":-|": "Indifferent or skeptical",
    ":-3": "Curly lips or affectionate",
    "(*)": "Starstruck",
    ">:(": "Angry",
    ":+)": "Overjoyed",
    "-:-)": "Zebra",
    ":/\\/\\/": "Pacman",
    ":o)": "Clown",
    ">:-)": "Evil grin",
    ":-)o": "Smiling with an oversized nose",
    ":-?": "Skeptical",
    "<(-_-)>": "Kirby",
    ":{": "Wry smile",
    ":-*": "Kiss",
    "V.V": "Crying",
    ":/3": "Disapproval or confused",
    "^_^;": "Sweating or nervous",
    "VV": "Pacman",
    "<3": "Heart",
    ":|]": "Robot",
    "()": "Hug",
    ":-)8": "Wearing a bow tie",
    "(:": "Happy",
    ":<": "Sad",
    "=3": "Kissy face",
    "O:-D": "Laughing",
    ":]": "Smiley or happy",
    "=(": "Sad",
    ":-[": "Sad",
    ":{D": "Laughing",
    ":--": "Sad",
    ":-]": "Happy",
    ":->": "Grinning",
    ":-\"": "Smoking",
    ":(": "Sad",
    ":-))": "Very happy",
    ":#": "Hashtag",
    "8)": "Sunglasses",
    ":@": "Angry face",
    ":-$": "Embarrassed",
    "():": "Angry",
    ":-D": "Big smile",
    ":*)": "Blushing",
    ":-P": "Tongue sticking out",
    ":o": "Surprised",
    ">.<": "Annoyed",
    ":-*": "Kiss",
    "X(": "Angry",
    ":o)": "Clown",
    ":@)": "Pig",
    "<:)": "Party",
    ":-((:": "Crying",
    ":>)": "Evil grin",
    ":&": "Embarrassed or blushing",
    ":p": "Sticking out tongue",
    ":C": "Unhappy",
    ":.)": "Blushing",
    ";(": "Crying",
    ":Dx": "Biting lip",
    ":%": "Confused",
    "(:|": "Monkey",
    "O:)": "Angel",
    ":})": "Smirking",
    ":])": "Evil grin",
    ":-((:": "Crying",
    "=p": "Sticking out tongue",
    "=D": "Big smile",
    "=]": "Smiley or happy",
    "=(": "Sad",
    "='(": "Crying",
    ":o)": "Clown",
    ":0)": "Clown",
    ":O": "Surprised",
    ":o": "Surprised",
    ":-o": "Surprised",
    "=o": "Surprised",
    "=O": "Surprised",
    ":*(": "Sick",
    ":-*": "Kiss",
    ":*": "Kiss",
    ";-*": "Wink and kiss",
    ":;*": "Wink and kiss",
    ";-(": "Sad",
    ":-@": "Screaming or yelling",
    ":'-)": "Blushing or shy",
    ":'-(": "Crying",
    ":^D": "Big grin",
    "o:)": "Angel",
    ":*D": "Laughing",
    ":S": "Confused or sad",
    ":&D": "Laughing",
    ":#)": "Witch",
    ":})": "Smirking",
    ":>)": "Evil grin",
    ":])": "Evil grin",
    ":-))": "Very happy",
    ":))))": "Very happy",
    ":]]]": "Very happy",
    "=*": "Kiss",
    ">;]": "Evil grin",
    ">;)": "Wink and evil grin",
    ">:)": "Evil grin",
    ">:-)": "Evil grin",
    ">>:-)": "Evil grin",
    ">;->": "Evil grin",
    ">;)~": "Evil grin",
    ">:-D": "Evil grin",
    ">:D": "Evil grin",
    ">=(": "Angry",
    ">=)": "Happy",
    ">:-)": "Devilish",
    "|-)": "Sleepy",
    "|-o": "Sleepy",
    "|-D": "Sleepy",
    "|O": "Sleepy",
    "|o": "Sleepy",
    "|(": "Sleepy",
    "|:-)": "Sleepy",
    "|)": "Sleepy",
    "|-)": "Sleepy",
    "|3": "Sleepy",
    "|]": "Robot",
    "|<": "Robot",
    "|-|": "Robot",
    "|-*": "Robot",
    "|=\\": "Robot",
    "|=>": "Robot",
    "|?": "Robot",
    "|C": "Unhappy",
    "|P": "Playful",
    "|~(": "Upset",
    "|~)": "Happy",
    "|C": "Unhappy",
    "|c": "Unhappy",
    "|n": "Unhappy",
    "|D": "Smiling",
    "|o": "Scared",
    "|3": "Love",
    "|;": "Wink",
    "|=(": "Unhappy",
    "|:-(": "Frowning",
    "|:-)": "Smiling",
    "|;-)": "Winking",
    "|O": "Surprised",
    "|o": "Surprised",
    "|O": "Amazed",
    "|o": "Amazed",
    "|]": "Evil grin",
    "|^": "Confused",
    "|#": "Hashtag",
    "|D": "Big smile",
    "|@": "Angry face",
    "|*": "Kiss",
    "|P": "Sticking out tongue",
    "|(": "Sad face",
    "|s": "Confused",
    "|x": "Love-struck",
    "|-:": "Disappointed",
    "|-|": "Sleeping",
    "|-D": "Big grin",
    "|-o": "Yawning",
    "|-p": "Yuck",
    "|->": "Disappointed",
    "|-P": "Tongue sticking out",
    "|-$": "Embarrassed",
    "|:@": "Angry face",
    "|:<": "Sad face",
    "|B": "Nerd or geek",
    "|O": "Surprised",
    "|o": "Surprised",
    "|=D": "Big smile",
    "|=]": "Smiley or happy",
    "|=(": "Sad",
    "|='(": "Crying",
    "|o)": "Clown",
    "|@)": "Pig",
    "|<)": "Party",
    "|=(": "Crying",
    "|>)": "Evil grin",
    "|&": "Embarrassed or blushing",
    "|p": "Sticking out tongue",
    "|C": "Unhappy",
    "|.)": "Blushing",
    "|Dx": "Biting lip",
    "|C": "Unhappy",
    "|O": "Surprised",
    "|o": "Surprised",
    "|-*": "Kiss",
    "|-?": "Skeptical",
    "|-@": "Screaming or yelling",
    "|O": "Amazed",
    "|o": "Amazed",
    "|;)": "Flirting",
    "|<3": "Heart",
    "|{": "Smirking",
    "|*(": "Sick",
    "|?)": "Confused",
    "|;-*": "Wink and kiss",
    "|;D": "Wink and big grin",
    "|;p": "Wink and sticking out tongue",
    "|;]": "Wink and evil grin",
    "|;{": "Wink and smirk",
    "|;^)": "Wink and smirk",
    "|=]": "Happy",
    "|:(": "Unhappy",
    "|:)": "Happy",
    "|:|": "Neutral",
    "|:{": "Smirking",
    "|:}": "Smirking",
    "|:$": "Embarrassed",
    "|:o": "Surprised",
    "|:O": "Surprised",
    "|:P": "Sticking out tongue",
    "|;(": "Crying",
    "|:-#": "Angry",
    "|:-&": "Sick",
    "|:-(": "Frowning",
    "|:-)": "Smiling",
    "|:-*": "Kiss",
    "|:-/": "Confused",
    "|:-?": "Confused",
    "|:-@": "Screaming or yelling",
    "|:-C": "Unhappy",
    "|:-D": "Big smile",
    "|:-O": "Surprised",
    "|:-P": "Tongue sticking out",
    "|:-S": "Confused or sad",
    "|:-|": "Neutral",
    "|:->": "Grinning",
    "|:-[": "Sad",
    "|:-{": "Smirking",
    "|:-}": "Smirking",
    "|:-|": "Indifferent",
    "|:=": "Smirking",
    "|:~": "Nasal congestion or cold",
    "|;=": "Cool or winking",
    "|<": "Sad",
    "|<(": "Crying",
    "|<)": "Party",
    "|<?>": "Confused",
    "|<!>": "Angry",
    "|<3": "Heart",
    "|='(": "Crying",
    "|>": "Happy",
    "|>()": "Evil grin",
    "|>:-)": "Evil grin",
    "|>:-D": "Evil grin",
    "|>=(": "Angry",
    "|>=)": "Happy",
    "|>=D": "Evil grin",
    "|>_<": "Angry",
    "|?": "Confused",
    "|??": "Confused",
    "|?>": "Confused",
    "|?|": "Confused",
    "|@": "Angry face",
    "|@)": "Pig",
    "|@|": "Angry face",
    "|B": "Nerd or geek",
    "|C": "Unhappy",
    "|C:": "Unhappy",
    "|C;": "Unhappy",
    "|D": "Big smile",
    "|D:": "Big grin",
    "|D;": "Big grin",
    "|O": "Surprised",
    "|o": "Surprised",
    "|P": "Sticking out tongue",
    "|o": "Surprised",
    "|p": "Sticking out tongue",
    "|o": "Surprised",
    "|q": "Sad",
    "|r": "Confused",
    "|s": "Confused",
    "|t": "Indifferent",
    "|u": "Unhappy",
    "|v": "Shouting",
    "|w": "Cool",
    "|x": "Love-struck",
    "|y": "Cool",
    "|z": "Sleepy",
    "|\-D": "Big grin",
    "|\-o": "Yawning",
    "|\-p": "Yuck",
    "|\->": "Disappointed",
    "|\-P": "Tongue sticking out",
    "|\-$": "Embarrassed",
    "|\-@": "Screaming or yelling",
    "|\O": "Amazed",
    "|\o": "Amazed",
    "|\;)": "Flirting",
    "|\|": "Neutral",
    "|\|D": "Smiling",
    "|\|P": "Sticking out tongue",
    "|\|S": "Confused or sad",
    "|\|]": "Evil grin",
    "|\|^": "Confused",
    "|\/|": "Pacman",
    "|\/3": "Disapproval or confused",
    "|\/V": "Pacman",
    "|\/v": "Pacman",
    "|\/|": "Pacman",
    "|O": "Surprised",
    "|o": "Surprised",
    "|O": "Amazed",
    "|o": "Amazed",
    "|]": "Evil grin",
    "|^": "Confused",
    "|#": "Hashtag",
    "|D": "Big smile",
    "|@": "Angry face",
    "|*": "Kiss",
    "|P": "Sticking out tongue",
    "|(": "Sad face",
    "|s": "Confused",
    "|x": "Love-struck",
    "|-:": "Disappointed",
    "|-|": "Sleeping",
    "|-D": "Big grin",
    "|-o": "Yawning",
    "|-p": "Yuck",
    "|->": "Disappointed",
    "|-P": "Tongue sticking out",
    "|-$": "Embarrassed",
    "|:@": "Angry face",
    "|:<": "Sad face",
    "|B": "Nerd or geek",
    "|O": "Surprised",
    "|o": "Surprised",
    "|=D": "Big smile",
    "|=]": "Smiley or happy",
    "|=(": "Sad",
    "|='(": "Crying",
    "|o)": "Clown",
    "|@)": "Pig",
    "|<)": "Party",
    "|=(": "Crying",
    "|>)": "Evil grin",
    "|&": "Embarrassed or blushing",
    "|p": "Sticking out tongue",
    "|C": "Unhappy",
    "|.)": "Blushing",
    "|Dx": "Biting lip",
    "|C": "Unhappy",
    "|O": "Surprised",
    "|o": "Surprised",
    "|-*": "Kiss",
    "|-?": "Skeptical",
    "|-@": "Screaming or yelling",
    "|O": "Amazed",
    "|o": "Amazed",
    "|;)": "Flirting",
    "|<3": "Heart",
    "|{": "Smirking",
    "|*(": "Sick",
    "|?)": "Confused",
    "|;-*": "Wink and kiss",
    "|;D": "Wink and big grin",
    "|;p": "Wink and sticking out tongue",
    "|;]": "Wink and evil grin",
    "|;{": "Wink and smirk",
    "|;^)": "Wink and smirk",
    "|=]": "Happy",
    "|:(": "Unhappy",
    "|:)": "Happy",
    "|:|": "Neutral",
    "|:{": "Smirking",
    "|:}": "Smirking",
    "|:$": "Embarrassed",
    "|:o": "Surprised",
    "|:O": "Surprised",
    "|:P": "Sticking out tongue",
    "|;(": "Crying",
    "|:-#": "Angry",
    "|:-&": "Sick",
    "|:-(": "Frowning",
    "|:-)": "Smiling",
    "|:-*": "Kiss",
    "|:-/": "Confused",
    "|:-?": "Confused",
    "|:-@": "Screaming or yelling",
    "|:-C": "Unhappy",
    "|:-D": "Big smile",
    "|:-O": "Surprised",
    "|:-P": "Tongue sticking out",
    "|:-S": "Confused or sad",
    "|:-|": "Neutral",
    "|:->": "Grinning",
    "|:-[": "Sad",
    "|:-{": "Smirking",
    "|:-}": "Smirking",
    "|:-|": "Indifferent",
    "|:=": "Smirking",
    "|:~": "Nasal congestion or cold",
    "|;=": "Cool or winking",
    "|<": "Sad",
    "|<(": "Crying",
    "|<)": "Party",
    "|<?>": "Confused",
    "|<!>": "Angry",
    "|<3": "Heart",
    "|='(": "Crying",
    "|>": "Happy",
    "|>()": "Evil grin",
    "|>:-)": "Evil grin",
    "|>:-D": "Evil grin",
    "|>=(": "Angry",
    "|>=)": "Happy",
    "|>=D": "Evil grin",
    "|>_<": "Angry",
    "|?": "Confused",
    "|??": "Confused",
    "|?>": "Confused",
    "|?|": "Confused",
    "|@": "Angry face",
    "|@)": "Pig",
    "|@|": "Angry face",
    "|B": "Nerd or geek",
    "|C": "Unhappy",
    "|C:": "Unhappy",
    "|C;": "Unhappy",
    "|D": "Big smile",
    "|D:": "Big grin",
    "|D;": "Big grin",
    "|O": "Surprised",
    "|o": "Surprised",
    "|P": "Sticking out tongue",
    "|o": "Surprised",
    "|p": "Sticking out tongue",
    "|o": "Surprised",
    "|q": "Sad",
    "|r": "Confused",
    "|s": "Confused",
    "|t": "Indifferent",
    "|u": "Unhappy",
    "|v": "Shouting",
    "|w": "Cool",
    "|x": "Love-struck",
    "|y": "Cool",
    "|z": "Sleepy",
    "|\-D": "Big grin",
    "|\-o": "Yawning",
    "|\-p": "Yuck",
    "|\->": "Disappointed",
    "|\-P": "Tongue sticking out",
    "|\-$": "Embarrassed",
    "|\-@": "Screaming or yelling",
    "|\O": "Amazed",
    "|\o": "Amazed",
    "|\;)": "Flirting",
    "|\|": "Neutral",
    "|\|D": "Smiling",
    "|\|P": "Sticking out tongue",
    "|\|S": "Confused or sad",
    "|\|]": "Evil grin",
    "|\|^": "Confused",
    "|\/|": "Pacman",
    "|\/3": "Disapproval or confused",
    "|\/V": "Pacman",
    "|\/v": "Pacman",
    "|\/|": "Pacman",
    "|O": "Surprised",
    "|o": "Surprised",
    "|O": "Amazed",
    "|o": "Amazed",
    "|]": "Evil grin",
    "|^": "Confused",
    "|#": "Hashtag",
    "|D": "Big smile",
    "|@": "Angry face",
    "|*": "Kiss",
    "|P": "Sticking out tongue",
    "|(": "Sad face",
    "|s": "Confused",
    "|x": "Love-struck",
    "|-:": "Disappointed",
    "|-|": "Sleeping",
    "|-D": "Big grin",
    "|-o": "Yawning",
    "|-p": "Yuck",
    "|->": "Disappointed",
    "|-P": "Tongue sticking out",
    "|-$": "Embarrassed",
    "|:@": "Angry face",
    "|:<": "Sad face",
    "|B": "Nerd or geek",
    "|O": "Surprised",
    "|o": "Surprised",
    "|=D": "Big smile",
    "|=]": "Smiley or happy",
    "|=(": "Sad",
    "|='(": "Crying",
    "|o)": "Clown",
    "|@)": "Pig",
    "|<)": "Party",
    "|=(": "Crying",
    "|>)": "Evil grin",
    "|&": "Embarrassed or blushing",
    "|p": "Sticking out tongue",
    "|C": "Unhappy",
    "|.)": "Blushing",
    "|Dx": "Biting lip",
    "|C": "Unhappy",
    "|O": "Surprised",
    "|o": "Surprised",
    "|-*": "Kiss",
    "|-?": "Skeptical",
    "|-@": "Screaming or yelling",
    "|O": "Amazed",
    "|o": "Amazed",
    "|;)": "Flirting",
    "|<3": "Heart",
    "|{": "Smirking",
    "|*(": "Sick",
    "|?)": "Confused",
    "|;-*": "Wink and kiss",
    "|;D": "Wink and big grin",
    "|;p": "Wink and sticking out tongue",
    "|;]": "Wink and evil grin",
    "|;{": "Wink and smirk",
    "|;^)": "Wink and smirk",
    "|=]": "Happy",
    "|:(": "Unhappy",
    "|:)": "Happy",
    "|:|": "Neutral",
    "|:{": "Smirking",
    "|:}": "Smirking",
    "|:$": "Embarrassed",
    "|:o": "Surprised",
    "|:O": "Surprised",
    "|:P": "Sticking out tongue",
    "|;(": "Crying",
    "|:-#": "Angry",
    "|:-&": "Sick",
    "|:-(": "Frowning",
    "|:-)": "Smiling",
    "|:-*": "Kiss",
    "|:-/": "Confused",
    "|:-?": "Confused",
    "|:-@": "Screaming or yelling",
    "|:-C": "Unhappy",
    "|:-D": "Big smile",
    "|:-O": "Surprised",
    "|:-P": "Tongue sticking out",
    "|:-S": "Confused or sad",
    "|:-|": "Neutral",
    "|:->": "Grinning",
    "|:-[": "Sad",
    "|:-{": "Smirking",
    "|:-}": "Smirking",
    "|:-|": "Indifferent",
    "|:=": "Smirking",
    "|:~": "Nasal congestion or cold",
    "|;=": "Cool or winking",
    "|<": "Sad",
    "|<(": "Crying",
    "|<)": "Party",
    "|<?>": "Confused",
    "|<!>": "Angry",
    "|<3": "Heart",
    "|='(": "Crying",
    "|>": "Happy",
    "|>()": "Evil grin",
    "|>:-)": "Evil grin",
    "|>:-D": "Evil grin",
    "|>=(": "Angry",
    "|>=)": "Happy",
    "|>=D": "Evil grin",
    "|>_<": "Angry",
    "|?": "Confused",
    "|??": "Confused",
    "|?>": "Confused",
    "|?|": "Confused",
    "|@": "Angry face",
    "|@)": "Pig",
    "|@|": "Angry face",
    "|B": "Nerd or geek",
    "|C": "Unhappy",
    "|C:": "Unhappy",
    "|C;": "Unhappy",
    "|D": "Big smile",
    "|D:": "Big grin",
    "|D;": "Big grin",
    "|O": "Surprised",
    "|o": "Surprised",
    "|P": "Sticking out tongue",
    "|o": "Surprised",
    "|p": "Sticking out tongue",
    "|o": "Surprised",
    "|q": "Sad",
    "|r": "Confused",
    "|s": "Confused",
    "|t": "Indifferent",
    "|u": "Unhappy",
    "|v": "Shouting",
    "|w": "Cool",
    "|x": "Love-struck",
    "|y": "Cool",
    "|z": "Sleepy",
    "|\-D": "Big grin",
    "|\-o": "Yawning",
    "|\-p": "Yuck",
    "|\->": "Disappointed",
    "|\-P": "Tongue sticking out",
    "|\-$": "Embarrassed",
    "|\-@": "Screaming or yelling",
    "|\O": "Amazed",
    "|\o": "Amazed",
    "|\;)": "Flirting",
    "|\|": "Neutral",
    "|\|D": "Smiling",
    "|\|P": "Sticking out tongue",
    "|\|S": "Confused or sad",
    "|\|]": "Evil grin",
    "|\|^": "Confused",
    "|\/|": "Pacman",
    "|\/3": "Disapproval or confused",
    "|\/V": "Pacman",
    "|\/v": "Pacman",
    "|\/|": "Pacman",
    "|O": "Surprised",
    "|o": "Surprised",
    "|O": "Amazed",
    "|o": "Amazed",
    "|]": "Evil grin",
    "|^": "Confused",
    "|#": "Hashtag",
    "|D": "Big smile",
    "|@": "Angry face",
    "|*": "Kiss",
    "|P": "Sticking out tongue",
    "|(": "Sad face",
    "|s": "Confused",
    "|x": "Love-struck",
    "|-:": "Disappointed",
    "|-|": "Sleeping",
    "|-D": "Big grin",
    "|-o": "Yawning",
    "|-p": "Yuck",
    "|->": "Disappointed",
    "|-P": "Tongue sticking out",
    "|-$": "Embarrassed",
    "|:@": "Angry face",
    "|:<": "Sad face",
    "|B": "Nerd or geek",
    "|O": "Surprised",
    "|o": "Surprised",
    "|=D": "Big smile",
    "|=]": "Smiley or happy",
    "|=(": "Sad",
    "|='(": "Crying",
    "|o)": "Clown",
    "|@)": "Pig",
    "|<)": "Party",
    "|=(": "Crying",
    "|>)": "Evil grin",
    "|&": "Embarrassed or blushing",
    "|p": "Sticking out tongue",
    "|C": "Unhappy",
    "|.)": "Blushing",
    "|Dx": "Biting lip",
    "|C": "Unhappy",
    "|O": "Surprised",
    "|o": "Surprised",
    "|-*": "Kiss",
    "|-?": "Skeptical",
    "|-@": "Screaming or yelling",
    "|O": "Amazed",
    "|o": "Amazed",
    "|;)": "Flirting",
    "|<3": "Heart",
    "|{": "Smirking",
    "|*(": "Sick",
    "|?)": "Confused",
    "|;-*": "Wink and kiss",
    "|;D": "Wink and big grin",
    "|;p": "Wink and sticking out tongue",
    "|;]": "Wink and evil grin",
    "|;{": "Wink and smirk",
    "|;^)": "Wink and smirk",
    "|=]": "Happy",
    "|:(": "Unhappy",
    "|:)": "Happy",
    "|:|": "Neutral",
    "|:{": "Smirking",
    "|:}": "Smirking",
    "|:$": "Embarrassed",
    "|:o": "Surprised",
    "|:O": "Surprised",
    "|:P": "Sticking out tongue",
    "|;(": "Crying",
    "|:-#": "Angry",
    "|:-&": "Sick",
    "|:-(": "Frowning",
    "|:-)": "Smiling",
    "|:-*": "Kiss",
    "|:-/": "Confused",
    "|:-?": "Confused",
    "|:-@": "Screaming or yelling",
    "|:-C": "Unhappy",
    "|:-D": "Big smile",
    "|:-O": "Surprised",
    "|:-P": "Tongue sticking out",
    "|:-S": "Confused or sad",
    "|:-|": "Neutral",
    "|:->": "Grinning",
    "|:-[": "Sad",
    "|:-{": "Smirking",
    "|:-}": "Smirking",
    "|:-|": "Indifferent",
    "|:=": "Smirking",
    "|:~": "Nasal congestion or cold",
    "|;=": "Cool or winking",
    "|<": "Sad",
    "|<(": "Crying",
    "|<)": "Party",
    "|<?>": "Confused",
    "|<!>": "Angry",
    "|<3": "Heart",
    "|='(": "Crying",
    "|>": "Happy",
    "|>()": "Evil grin",
    "|>:-)": "Evil grin",
    "|>:-D": "Evil grin",
    "|>=(": "Angry",
    "|>=)": "Happy",
    "|>=D": "Evil grin",
    "|>_<": "Angry",
    "|?": "Confused",
    "|??": "Confused",
    "|?>": "Confused",
    "|?|": "Confused",
    "|@": "Angry face",
    "|@)": "Pig",
    "|@|": "Angry face",
    "|B": "Nerd or geek",
    "|C": "Unhappy",
    "|C:": "Unhappy",
    "|C;": "Unhappy",
    "|D": "Big smile",
    "|D:": "Big grin",
    "|D;": "Big grin",
    "|O": "Surprised",
    "|o": "Surprised",
    "|P": "Sticking out tongue",
    "|o": "Surprised",
    "|p": "Sticking out tongue",
    "|o": "Surprised",
    "|q": "Sad",
    "|r": "Confused",
    "|s": "Confused",
    "|t": "Indifferent",
    "|u": "Unhappy",
    "|v": "Shouting",
    "|w": "Cool",
    "|x": "Love-struck",
    "|y": "Cool",
    "|z": "Sleepy",
    "|\-D": "Big grin",
    "|\-o": "Yawning",
    "|\-p": "Yuck",
    "|\->": "Disappointed",
    "|\-P": "Tongue sticking out",
    "|\-$": "Embarrassed",
    "|\-@": "Screaming or yelling",
    "|\O": "Amazed",
    "|\o": "Amazed",
    "|\;)": "Flirting",
    "|\|": "Neutral",
    "|\|D": "Smiling",
    "|\|P": "Sticking out tongue",
    "|\|S": "Confused or sad",
    "|\|]": "Evil grin",
    "|\|^": "Confused",
    "|\/|": "Pacman",
    "|\/3": "Disapproval or confused",
    "|\/V": "Pacman",
    "|\/v": "Pacman",
    "|\/|": "Pacman",
    "|O": "Surprised",
    "|o": "Surprised",
    "|O": "Amazed",
    "|o": "Amazed",
    "|]": "Evil grin",
    "|^": "Confused",
    "|#": "Hashtag",
    "|D": "Big smile",
    "|@": "Angry face",
    "|*": "Kiss",
    "|P": "Sticking out tongue",
    "|(": "Sad face",
    "|s": "Confused",
    "|x": "Love-struck",
    "|-:": "Disappointed",
    "|-|": "Sleeping",
    "|-D": "Big grin",
    "|-o": "Yawning",
    "|-p": "Yuck",
    "|->": "Disappointed",
    "|-P": "Tongue sticking out",
    "|-$": "Embarrassed",
    "|:@": "Angry face",
    "|:<": "Sad face",
    "|B": "Nerd or geek",
    "|O": "Surprised",
    "|o": "Surprised",
    "|=D": "Big smile",
    "|=]": "Smiley or happy",
    "|=(": "Sad",
    "|='(": "Crying",
    "|o)": "Clown",
    "|@)": "Pig",
    "|<)": "Party",
    "|=(": "Crying",
    "|>)": "Evil grin",
    "|&": "Embarrassed or blushing",
    "|p": "Sticking out tongue",
    "|C": "Unhappy",
    "|.)": "Blushing",
    "|Dx": "Biting lip",
    "|C": "Unhappy",
    "|O": "Surprised",
    "|o": "Surprised",
    "|-*": "Kiss",
    "|-?": "Skeptical",
    "|-@": "Screaming or yelling",
    "|O": "Amazed",
    "|o": "Amazed",
    "|;)": "Flirting",
    "|<3": "Heart",
    "|{": "Smirking",
    "|*(": "Sick",
    "|?)": "Confused",
    "|;-*": "Wink and kiss",
    "|;D": "Wink and big grin",
    "|;p": "Wink and sticking out tongue",
    "|;]": "Wink and evil grin",
    "|;{": "Wink and smirk",
    "|;^)": "Wink and smirk",
    "|=]": "Happy",
    "|:(": "Unhappy",
    "|:)": "Happy",
    "|:|": "Neutral",
    "|:{": "Smirking",
    "|:}": "Smirking",
    "|:$": "Embarrassed",
    "|:o": "Surprised",
    "|:O": "Surprised",
    "|:P": "Sticking out tongue",
    "|;(": "Crying",
    "|:-#": "Angry",
    "|:-&": "Sick",
    "|:-(": "Frowning",
    "|:-)": "Smiling",
    "|:-*": "Kiss",
    "|:-/": "Confused",
    "|:-?": "Confused",
    "|:-@": "Screaming or yelling",
    "|:-C": "Unhappy",
    "|:-D": "Big smile",
    "|:-O": "Surprised",
    "|:-P": "Tongue sticking out",
    "|:-S": "Confused or sad",
    "|:-|": "Neutral",
    "|:->": "Grinning",
    "|:-[": "Sad",
    "|:-{": "Smirking",
    "|:-}": "Smirking",
    "|:-|": "Indifferent",
    "|:=": "Smirking",
    "|:~": "Nasal congestion or cold",
    "|;=": "Cool or winking",
    "|<": "Sad",
    "|<(": "Crying",
    "|<)": "Party",
    "|<?>": "Confused",
    "|<!>": "Angry",
    "|<3": "Heart",
    "|='(": "Crying",
    "|>": "Happy",
    "|>()": "Evil grin",
    "|>:-)": "Evil grin",
    "|>:-D": "Evil grin",
    "|>=(": "Angry",
    "|>=)": "Happy",
    "|>=D": "Evil grin",
    "|>_<": "Angry",
    "|?": "Confused",
    "|??": "Confused",
    "|?>": "Confused",
    "|?|": "Confused",
    "|@": "Angry face",
    "|@)": "Pig",
    "|@|": "Angry face",
    "|B": "Nerd or geek",
    "|C": "Unhappy",
    "|C:": "Unhappy",
    "|C;": "Unhappy",
    "|D": "Big smile",
    "|D:": "Big grin",
    "|D;": "Big grin",
    "|O": "Surprised",
    "|o": "Surprised",
    "|P": "Sticking out tongue",
    "|o": "Surprised",
    "|p": "Sticking out tongue",
    "|o": "Surprised",
    "|q": "Sad",
    "|r": "Confused",
    "|s": "Confused",
    "|t": "Indifferent",
    "|u": "Unhappy",
    "|v": "Shouting",
    "|w": "Cool",
    "|x": "Love-struck",
    "|y": "Cool",
    "|z": "Sleepy",
    "|\-D": "Big grin",
    "|\-o": "Yawning",
    "|\-p": "Yuck",
    "|\->": "Disappointed",
    "|\-P": "Tongue sticking out",
    "|\-$": "Embarrassed",
    "|\-@": "Screaming or yelling",
    "|\O": "Amazed",
    "|\o": "Amazed",
    "|\;)": "Flirting",
    "|\|": "Neutral",
    "|\|D": "Smiling",
    "|\|P": "Sticking out tongue",
    "|\|S": "Confused or sad",
    "|\|]": "Evil grin",
    "|\|^": "Confused",
    "|\/|": "Pacman",
    "|\/3": "Disapproval or confused",
    "|\/V": "Pacman",
    "|\/v": "Pacman",
    "|\/|": "Pacman",
    "|O": "Surprised",
    "|o": "Surprised",
    "|O": "Amazed",
    "|o": "Amazed",
    "|]": "Evil grin",
    "|^": "Confused",
    "|#": "Hashtag",
    "|D": "Big smile",
    "|@": "Angry face",
    "|*": "Kiss",
    "|P": "Sticking out tongue",
    "|(": "Sad face",
    "|s": "Confused",
    "|x": "Love-struck",
    "|-:": "Disappointed",
    "|-|": "Sleeping",
    "|-D": "Big grin",
    "|-o": "Yawning",
    "|-p": "Yuck",
    "|->": "Disappointed",
    "|-P": "Tongue sticking out",
    "|-$": "Embarrassed",
    "|:@": "Angry face",
    "|:<": "Sad face",
    "|B": "Nerd or geek",
    "|O": "Surprised",
    "|o": "Surprised",
    "|=D": "Big smile",
    "|=]": "Smiley or happy",
    "|=(": "Sad",
    "|='(": "Crying",
    "|o)": "Clown",
    "|@)": "Pig",
    "|<)": "Party",
    "|=(": "Crying",
    "|>)": "Evil grin",
    "|&": "Embarrassed or blushing",
    "|p": "Sticking out tongue",
    "|C": "Unhappy",
    "|.)": "Blushing",
    "|Dx": "Biting lip",
    "|C": "Unhappy",
    "|O": "Surprised",
    "|o": "Surprised",
    "|-*": "Kiss",
    "|-?": "Skeptical",
    "|-@": "Screaming or yelling",
    "|O": "Amazed",
    "|o": "Amazed",
    "|;)": "Flirting",
    "|<3": "Heart",
    "|{": "Smirking",
    "|*(": "Sick",
    "|?)": "Confused",
    "|;-*": "Wink and kiss",
    "|;D": "Wink and big grin",
    "|;p": "Wink and sticking out tongue",
    "|;]": "Wink and evil grin",
    "|;{": "Wink and smirk",
    "|;^)": "Wink and smirk",
    "|=]": "Happy",
    "|:(": "Unhappy",
    "|:)": "Happy",
    "|:|": "Neutral",
    "|:{": "Smirking",
    "|:}": "Smirking",
    "|:$": "Embarrassed",
    "|:o": "Surprised",
    "|:O": "Surprised",
    "|:P": "Sticking out tongue",
    "|;(": "Crying",
    "|:-#": "Angry",
    "|:-&": "Sick",
    "|:-(": "Frowning",
    "|:-)": "Smiling",
    "|:-*": "Kiss",
    "|:-/": "Confused",
    "|:-?": "Confused",
    "|:-@": "Screaming or yelling",
    "|:-C": "Unhappy",
    "|:-D": "Big smile",
    "|:-O": "Surprised",
    "|:-P": "Tongue sticking out",
    "|:-S": "Confused or sad",
    "|:-|": "Neutral",
    "|:->": "Grinning",
    "|:-[": "Sad",
    "|:-{": "Smirking",
    "|:-}": "Smirking",
    "|:-|": "Indifferent",
    "|:=": "Smirking",
    "|:~": "Nasal congestion or cold",
    "|;=": "Cool or winking",
    "|<": "Sad",
    "|<(": "Crying",
    "|<)": "Party",
    "|<?>": "Confused",
    "|<!>": "Angry",
    "|<3": "Heart",
    "|='(": "Crying",
    "|>": "Happy",
    "|>()": "Evil grin",
    "|>:-)": "Evil grin",
    "|>:-D": "Evil grin",
    "|>=(": "Angry",
    "|>=)": "Happy",
    "|>=D": "Evil grin",
    "|>_<": "Angry",
    "|?": "Confused",
    "|??": "Confused",
    "|?>": "Confused",
    "|?|": "Confused",
    "|@": "Angry face",
    "|@)": "Pig",
    "|@|": "Angry face",
    "|B": "Nerd or geek",
    "|C": "Unhappy",
    "|C:": "Unhappy",
    "|C;": "Unhappy",
    "|D": "Big smile",
    "|D:": "Big grin",
    "|D;": "Big grin",
    "|O": "Surprised",
    "|o": "Surprised",
    "|P": "Sticking out tongue",
    "|o": "Surprised",
    "|p": "Sticking out tongue",
    "|o": "Surprised",
    "|q": "Sad",
    "|r": "Confused",
    "|s": "Confused",
    "|t": "Indifferent",
    "|u": "Unhappy",
    "|v": "Shouting"
}

In [41]:
def remove_emoticons(text):
    emoticon_pattern = re.compile(r"(?u)" + "|".join(re.escape(k) for k in EMOTICONS))
    return emoticon_pattern.sub("", text)

remove_emoticons("Hello :-)")

'Hello '

# 10) Conversion of Emoticon to Words

* In the previous step, we have removed the emoticons. In case of use cases like sentiment analysis, the emoticons give some valuable information and so removing them might not be a good solution. What can we do in such cases?


* One way is to convert the emoticons to word format so that they can be used in downstream modeling processes.

In [44]:
def convert_emoticons(text):
    for emot in EMOTICONS:
        text = re.sub(r'(?u)' + re.escape(emot), " ".join(EMOTICONS[emot].split()), text)
    return text

In [74]:
text = "Hello :-) ;P :("
convert_emoticons(text)

'Hello Happy face or smiley Tongue sticking out in jest Sad'

In [50]:
text = "I am sad :()"
convert_emoticons(text)

'I am sad Frown, sad, angry, or pouting)'

# 11) Conversion of Emojis to Words

In [79]:
EMO_UNICODE = {
    ":1st_place_medal:": "\U0001F947",
    ":2nd_place_medal:": "\U0001F949",
    ":3rd_place_medal:": "\U0001F949",
    ":AB_button_(blood_type):": "\U0001F18E",
    ":smiley:": "\U0001F604",
    ":sad:": "\U0001F61E",
    ":heart:": "\U00002764",
    ":thumbs_up:": "\U0001F44D",
    ":thumbs_down:": "\U0001F44E",
    ":laughing:": "\U0001F602",
    ":angry:": "\U0001F620",
    ":crying:": "\U0001F622",
    ":kiss:": "\U0001F48B",
    ":wink:": "\U0001F609",
    ":confused:": "\U0001F615",
    ":cool:": "\U0001F60E",
    ":sunglasses:": "\U0001F60E",
    ":sleeping:": "\U0001F634",
    ":grinning:": "\U0001F600",
    ":grimacing:": "\U0001F62C",
    ":sweat_smile:": "\U0001F605",
    ":joy:": "\U0001F602",
    ":rofl:": "\U0001F923",
    ":relaxed:": "\U0000263A",
    ":blush:": "\U0001F60A",
    ":innocent:": "\U0001F607",
    ":kissing:": "\U0001F617",
    ":slight_smile:": "\U0001F642",
    ":hugging:": "\U0001F917",
    ":thinking:": "\U0001F914",
    ":neutral_face:": "\U0001F610",
    ":expressionless:": "\U0001F611",
    ":no_mouth:": "\U0001F636",
    ":rolling_eyes:": "\U0001F644",
    ":smirk:": "\U0001F60F",
    ":persevere:": "\U0001F623",
    ":disappointed:": "\U0001F61E",
    ":worried:": "\U0001F61F",
    ":frowning:": "\U0001F626",
    ":anguished:": "\U0001F627",
    ":cry:": "\U0001F622",
    ":sob:": "\U0001F62D",
    ":scream:": "\U0001F631",
    ":confounded:": "\U0001F616",
    ":tired_face:": "\U0001F62B",
    ":yawning:": "\U0001F971",
    ":triumph:": "\U0001F624",
    ":angry_face:": "\U0001F620",
    ":rage:": "\U0001F621",
    ":exploding_head:": "\U0001F92F",
    ":flushed:": "\U0001F633",
    ":hot_face:": "\U0001F975",
    ":cold_face:": "\U0001F976",
    ":scream_in_fear:": "\U0001F631",
    ":astonished:": "\U0001F632",
    ":flushed_face:": "\U0001F633",
    ":zany_face:": "\U0001F92A",
    ":grinning_face_with_sweat:": "\U0001F605",
    ":face_with_raised_eyebrow:": "\U0001F928",
    ":face_with_monocle:": "\U0001F9D0",
    ":nerd_face:": "\U0001F913",
    ":smiling_imp:": "\U0001F608",
    ":imp:": "\U0001F47F",
    ":japanese_ogre:": "\U0001F479",
    ":japanese_goblin:": "\U0001F47A",
    ":skull:": "\U0001F480",
    ":skull_and_crossbones:": "\U00002620",
    ":alien:": "\U0001F47D",
    ":space_invader:": "\U0001F47E",
    ":robot_face:": "\U0001F916",
    ":jack-o-lantern:": "\U0001F383",
    ":clown_face:": "\U0001F921",
    ":ghost:": "\U0001F47B",
    ":santa:": "\U0001F385",
    ":mrs_claus:": "\U0001F936",
    ":angel:": "\U0001F47C",
    ":pregnant_woman:": "\U0001F930",
    ":elf:": "\U0001F9DD",
    ":prince:": "\U0001F934",
    ":princess:": "\U0001F478",
    ":superhero:": "\U0001F9B8",
    ":supervillain:": "\U0001F9B9",
    ":mage:": "\U0001F9D9",
    ":fairy:": "\U0001F9DA",
    ":vampire:": "\U0001F9DB",
    ":merperson:": "\U0001F9DC",
    ":merman:": "\U0001F9DC\u200D\u2642",
    ":mermaid:": "\U0001F9DC\u200D\u2640",
    ":elf:": "\U0001F9DD",
    ":genie:": "\U0001F9DE",
    ":zombie:": "\U0001F9DF",
    ":brain:": "\U0001F9E0",
    ":orange_heart:": "\U0001F9E1",
    ":billed_cap:": "\U0001F9E2",
    ":scarf:": "\U0001F9E3",
    ":gloves:": "\U0001F9E4",
    ":coat:": "\U0001F9E5",
    ":socks:": "\U0001F9E6",
    ":red_envelope:": "\U0001F9E7",
    ":firecracker:": "\U0001F9E8",
    ":jigsaw:": "\U0001F9E9",
    ":test_tube:": "\U0001F9EA",
    ":petri_dish:": "\U0001F9EB",
    ":dna:": "\U0001F9EC",
    ":compass:": "\U0001F9ED",
    ":abacus:": "\U0001F9EE",
    ":fire_extinguisher:": "\U0001F9EF",
    ":toolbox:": "\U0001F9F0",
    ":bricks:": "\U0001F9F1",
    ":magnet:": "\U0001F9F2",
    ":luggage:": "\U0001F9F3",
    ":lotion_bottle:": "\U0001F9F4",
    ":thread:": "\U0001F9F5",
    ":yarn:": "\U0001F9F6",
    ":safety_pin:": "\U0001F9F7",
    ":teddy_bear:": "\U0001F9F8",
    ":broom:": "\U0001F9F9",
    ":basket:": "\U0001F9FA",
    ":roll_of_paper:": "\U0001F9FB",
    ":soap:": "\U0001F9FC",
    ":sponge:": "\U0001F9FD",
    ":receipt:": "\U0001F9FE",
    ":nazar_amulet:": "\U0001F9FF",
    ":barber_pole:": "\U0001F488",
    ":stopwatch:": "\U000023F1",
    ":timer_clock:": "\U000023F2",
    ":alarm_clock:": "\U000023F0",
    ":mantelpiece_clock:": "\U0001F570",
    ":twelve_o’clock:": "\U0001F55B",
    ":twelve-thirty:": "\U0001F567",
    ":one_o’clock:": "\U0001F550",
    ":one-thirty:": "\U0001F55C",
    ":two_o’clock:": "\U0001F551",
    ":two-thirty:": "\U0001F55D",
    ":three_o’clock:": "\U0001F552",
    ":three-thirty:": "\U0001F55E",
    ":four_o’clock:": "\U0001F553",
    ":four-thirty:": "\U0001F55F",
    ":five_o’clock:": "\U0001F554",
    ":five-thirty:": "\U0001F560",
    ":six_o’clock:": "\U0001F555",
    ":six-thirty:": "\U0001F561",
    ":seven_o’clock:": "\U0001F556",
    ":seven-thirty:": "\U0001F562",
    ":eight_o’clock:": "\U0001F557",
    ":eight-thirty:": "\U0001F563",
    ":nine_o’clock:": "\U0001F558",
    ":nine-thirty:": "\U0001F564",
    ":ten_o’clock:": "\U0001F559",
    ":ten-thirty:": "\U0001F565",
    ":eleven_o’clock:": "\U0001F55A",
    ":eleven-thirty:": "\U0001F566",
    ":new_moon:": "\U0001F311",
    ":waxing_crescent_moon:": "\U0001F312",
    ":first_quarter_moon:": "\U0001F313",
    ":waxing_gibbous_moon:": "\U0001F314",
    ":full_moon:": "\U0001F315",
    ":waning_gibbous_moon:": "\U0001F316",
    ":last_quarter_moon:": "\U0001F317",
    ":waning_crescent_moon:": "\U0001F318",
    ":crescent_moon:": "\U0001F319",
    ":new_moon_face:": "\U0001F31A",
    ":first_quarter_moon_face:": "\U0001F31B",
    ":last_quarter_moon_face:": "\U0001F31C",
    ":thermometer:": "\U0001F321",
    ":sun:": "\U00002600",
    ":full_moon_face:": "\U0001F31D",
    ":sun_with_face:": "\U0001F31E",
    ":ringed_planet:": "\U0001FA90",
    ":star:": "\U00002B50",
    ":glowing_star:": "\U0001F31F",
    ":shooting_star:": "\U0001F320",
    ":milky_way:": "\U0001F30C",
    ":cloud:": "\U00002601",
    ":sun_behind_cloud:": "\U0001F325",
    ":cloud_with_lightning_and_rain:": "\U000026C8",
    ":sun_behind_small_cloud:": "\U0001F324",
    ":sun_behind_large_cloud:": "\U0001F325",
    ":sun_behind_rain_cloud:": "\U0001F326",
    ":cloud_with_rain:": "\U0001F327",
    ":cloud_with_snow:": "\U0001F328",
    ":cloud_with_lightning:": "\U0001F329",
    ":tornado:": "\U0001F32A",
    ":fog:": "\U0001F32B",
    ":wind_face:": "\U0001F32C",
    ":cyclone:": "\U0001F300",
    ":rainbow:": "\U0001F308",
    ":closed_umbrella:": "\U0001F302",
    ":umbrella:": "\U00002602",
    ":umbrella_with_rain_drops:": "\U00002614",
    ":umbrella_on_ground:": "\U000026F1",
    ":high_voltage:": "\U000026A1",
    ":snowflake:": "\U00002744",
    ":snowman:": "\U000026C4",
    ":snowman_without_snow:": "\U000026C4\u200D\u02603",
    ":comet:": "\U00002604",
    ":fire:": "\U0001F525",
    ":droplet:": "\U0001F4A7",
    ":water_wave:": "\U0001F30A",
    ":jack_o_lantern:": "\U0001F383",
    ":christmas_tree:": "\U0001F384",
    ":fireworks:": "\U0001F386",
    ":sparkler:": "\U0001F387",
    ":firecracker:": "\U0001F9E8",
    ":sparkles:": "\U00002728",
    ":balloon:": "\U0001F388",
    ":party_popper:": "\U0001F389",
    ":confetti_ball:": "\U0001F38A",
    ":tanabata_tree:": "\U0001F38B",
    ":bamboo:": "\U0001F38D",
    ":dolls:": "\U0001F38E",
    ":flags:": "\U0001F38F",
    ":wind_chime:": "\U0001F390",
    ":rice_scene:": "\U0001F391",
    ":red_envelope:": "\U0001F9E7",
    ":ribbon:": "\U0001F380",
    ":wrapped_gift:": "\U0001F381",
    ":reminder_ribbon:": "\U0001F397",
    ":ticket:": "\U0001F3AB",
    ":admission_tickets:": "\U0001F39F",
    ":ticket:": "\U0001F3AB",
    ":military_medal:": "\U0001F396",
    ":trophy:": "\U0001F3C6",
    ":sports_medal:": "\U0001F3C5",
    ":medal:": "\U0001F947",
    ":first_place_medal:": "\U0001F947",
    ":second_place_medal:": "\U0001F948",
    ":third_place_medal:": "\U0001F949",
    ":soccer_ball:": "\U000026BD",
    ":baseball:": "\U000026BE",
    ":softball:": "\U0001F94E",
    ":basketball:": "\U0001F3C0",
    ":volleyball:": "\U0001F3D0",
    ":american_football:": "\U0001F3C8",
    ":rugby_football:": "\U0001F3C9",
    ":tennis:": "\U0001F3BE",
    ":flying_disc:": "\U0001F94F",
    ":bowling:": "\U0001F3B3",
    ":cricket_game:": "\U0001F3CF",
    ":field_hockey:": "\U0001F3D1",
    ":ice_hockey:": "\U0001F3D2",
    ":lacrosse:": "\U0001F94D",
    ":ping_pong:": "\U0001F3D3",
    ":badminton:": "\U0001F3F8",
    ":boxing_glove:": "\U0001F94A",
    ":martial_arts_uniform:": "\U0001F94B",
    ":goal_net:": "\U0001F945",
    ":dart:": "\U0001F3AF",
    ":kite:": "\U0001FA81",
    ":yo-yo:": "\U0001FA80",
    ":bow_and_arrow:": "\U0001F3F9",
    ":amusement_park:": "\U0001F3A1",
    ":carousel_horse:": "\U0001F3A0",
    ":ferris_wheel:": "\U0001F3A1",
    ":roller_coaster:": "\U0001F3A2",
    ":fishing_pole:": "\U0001F3A3",
    ":diving_mask:": "\U0001F93F",
    ":running_shirt:": "\U0001F3BD",
    ":climbing:": "\U0001F9D7",
    ":boxing_gloves:": "\U0001F94A",
    ":martial_arts_uniform:": "\U0001F94B",
    ":ice_skate:": "\U000026F8",
    ":sled:": "\U0001F6F7",
    ":curling_stone:": "\U0001F94C",
    ":bullseye:": "\U0001F3AF",
    ":boomerang:": "\U0001FA83",
    ":lacrosse_stick_and_ball:": "\U0001F94D",
    ":ballet_shoes:": "\U0001FA70",
    ":ice_skate:": "\U000026F8",
    ":skateboard:": "\U0001F6F9",
    ":bicycle:": "\U0001F6B2",
    ":scooter:": "\U0001F6F4",
    ":motor_scooter:": "\U0001F6F5",
    ":auto_rickshaw:": "\U0001F6FA",
    ":taxi:": "\U0001F695",
    ":blue_car:": "\U0001F699",
    ":pickup_truck:": "\U0001F6FB",
    ":truck:": "\U0001F69A",
    ":articulated_lorry:": "\U0001F69B",
    ":tractor:": "\U0001F69C",
    ":racing_car:": "\U0001F3CE",
    ":motorcycle:": "\U0001F3CD",
    ":motor_scooter:": "\U0001F6F5",
    ":manual_wheelchair:": "\U0001F9BD",
    ":motorized_wheelchair:": "\U0001F9BC",
    ":auto_rickshaw:": "\U0001F6FA",
    ":bike:": "\U0001F6B2",
    ":kick_scooter:": "\U0001F6F4",
    ":skateboard:": "\U0001F6F9",
    ":roller_skate:": "\U0001F6FC",
    ":bus_stop:": "\U0001F68F",
    ":motorway:": "\U0001F6E3",
    ":railway_track:": "\U0001F6E4",
    ":fuel_pump:": "\U000026FD",
    ":police_car:": "\U0001F693",
    ":oncoming_police_car:": "\U0001F694",
    ":taxi:": "\U0001F695",
    ":oncoming_taxi:": "\U0001F696",
    ":automobile:": "\U0001F697",
    ":oncoming_automobile:": "\U0001F698",
    ":blue_car:": "\U0001F699",
    ":pickup_truck:": "\U0001F6FB",
    ":truck:": "\U0001F69A",
    ":articulated_lorry:": "\U0001F69B",
    ":tractor:": "\U0001F69C",
    ":racing_car:": "\U0001F3CE",
    ":racing_motorcycle:": "\U0001F3CD",
    ":motorcycle:": "\U0001F3CD",
    ":motor_scooter:": "\U0001F6F5",
    ":motorized_wheelchair:": "\U0001F9BC",
    ":manual_wheelchair:": "\U0001F9BD",
    ":auto_rickshaw:": "\U0001F6FA",
    ":bicycle:": "\U0001F6B2",
    ":kick_scooter:": "\U0001F6F4",
    ":skateboard:": "\U0001F6F9",
    ":roller_skate:": "\U0001F6FC",
    ":bus:": "\U0001F68C",
    ":oncoming_bus:": "\U0001F68D",
    ":trolleybus:": "\U0001F68E",
    ":minibus:": "\U0001F690",
    ":ambulance:": "\U0001F691",
    ":fire_engine:": "\U0001F692",
    ":police_car:": "\U0001F693",
    ":oncoming_police_car:": "\U0001F694",
    ":taxi:": "\U0001F695",
    ":oncoming_taxi:": "\U0001F696",
    ":automobile:": "\U0001F697",
    ":oncoming_automobile:": "\U0001F698",
    ":blue_car:": "\U0001F699",
    ":pickup_truck:": "\U0001F6FB",
    ":truck:": "\U0001F69A",
    ":articulated_lorry:": "\U0001F69B",
    ":tractor:": "\U0001F69C",
    ":racing_car:": "\U0001F3CE",
    ":racing_motorcycle:": "\U0001F3CD",
    ":motorcycle:": "\U0001F3CD",
    ":motor_scooter:": "\U0001F6F5",
    ":manual_wheelchair:": "\U0001F9BD",
    ":motorized_wheelchair:": "\U0001F9BC",
    ":auto_rickshaw:": "\U0001F6FA",
    ":bicycle:": "\U0001F6B2",
    ":kick_scooter:": "\U0001F6F4",
    ":skateboard:": "\U0001F6F9",
    ":roller_skate:": "\U0001F6FC",
    ":bus:": "\U0001F68C",
    ":oncoming_bus:": "\U0001F68D",
    ":trolleybus:": "\U0001F68E",
    ":minibus:": "\U0001F690",
    ":ambulance:": "\U0001F691",
    ":fire_engine:": "\U0001F692",
    ":police_car:": "\U0001F693",
    ":oncoming_police_car:": "\U0001F694",
    ":taxi:": "\U0001F695",
    ":oncoming_taxi:": "\U0001F696",
    ":automobile:": "\U0001F697",
    ":oncoming_automobile:": "\U0001F698",
    ":blue_car:": "\U0001F699",
    ":pickup_truck:": "\U0001F6FB",
    ":truck:": "\U0001F69A",
    ":articulated_lorry:": "\U0001F69B",
    ":tractor:": "\U0001F69C",
    ":racing_car:": "\U0001F3CE",
    ":racing_motorcycle:": "\U0001F3CD",
    ":motorcycle:": "\U0001F3CD",
    ":motor_scooter:": "\U0001F6F5",
    ":manual_wheelchair:": "\U0001F9BD",
    ":motorized_wheelchair:": "\U0001F9BC",
    ":auto_rickshaw:": "\U0001F6FA",
    ":bicycle:": "\U0001F6B2",
    ":kick_scooter:": "\U0001F6F4",
    ":skateboard:": "\U0001F6F9",
    ":roller_skate:": "\U0001F6FC",
    ":bus:": "\U0001F68C",
    ":oncoming_bus:": "\U0001F68D",
    ":trolleybus:": "\U0001F68E",
    ":minibus:": "\U0001F690",
    ":ambulance:": "\U0001F691",
    ":fire_engine:": "\U0001F692",
    ":police_car:": "\U0001F693",
    ":oncoming_police_car:": "\U0001F694",
    ":taxi:": "\U0001F695",
    ":oncoming_taxi:": "\U0001F696",
    ":automobile:": "\U0001F697",
    ":oncoming_automobile:": "\U0001F698",
    ":blue_car:": "\U0001F699",
    ":pickup_truck:": "\U0001F6FB",
    ":truck:": "\U0001F69A",
    ":articulated_lorry:": "\U0001F69B",
    ":tractor:": "\U0001F69C",
    ":racing_car:": "\U0001F3CE",
    ":racing_motorcycle:": "\U0001F3CD",
    ":motorcycle:": "\U0001F3CD",
    ":motor_scooter:": "\U0001F6F5",
    ":manual_wheelchair:": "\U0001F9BD",
    ":motorized_wheelchair:": "\U0001F9BC",
    ":auto_rickshaw:": "\U0001F6FA",
    ":bicycle:": "\U0001F6B2",
    ":kick_scooter:": "\U0001F6F4",
    ":skateboard:": "\U0001F6F9",
    ":roller_skate:": "\U0001F6FC",
    ":bus:": "\U0001F68C",
    ":oncoming_bus:": "\U0001F68D",
    ":trolleybus:": "\U0001F68E",
    ":minibus:": "\U0001F690",
    ":ambulance:": "\U0001F691",
    ":fire_engine:": "\U0001F692",
    ":police_car:": "\U0001F693",
    ":oncoming_police_car:": "\U0001F694",
    ":taxi:": "\U0001F695",
    ":oncoming_taxi:": "\U0001F696",
    ":automobile:": "\U0001F697",
    ":oncoming_automobile:": "\U0001F698",
    ":blue_car:": "\U0001F699",
    ":pickup_truck:": "\U0001F6FB",
    ":truck:": "\U0001F69A",
    ":articulated_lorry:": "\U0001F69B",
    ":tractor:": "\U0001F69C",
    ":racing_car:": "\U0001F3CE",
    ":racing_motorcycle:": "\U0001F3CD",
    ":motorcycle:": "\U0001F3CD",
    ":motor_scooter:": "\U0001F6F5",
    ":manual_wheelchair:": "\U0001F9BD",
    ":motorized_wheelchair:": "\U0001F9BC",
    ":auto_rickshaw:": "\U0001F6FA",
    ":bicycle:": "\U0001F6B2",
    ":kick_scooter:": "\U0001F6F4",
    ":skateboard:": "\U0001F6F9",
    ":roller_skate:": "\U0001F6FC",
    ":bus:": "\U0001F68C",
    ":oncoming_bus:": "\U0001F68D",
    ":trolleybus:": "\U0001F68E",
    ":minibus:": "\U0001F690",
    ":ambulance:": "\U0001F691",
    ":fire_engine:": "\U0001F692",
    ":police_car:": "\U0001F693",
    ":oncoming_police_car:": "\U0001F694",
    ":taxi:": "\U0001F695",
    ":oncoming_taxi:": "\U0001F696",
    ":automobile:": "\U0001F697",
    ":oncoming_automobile:": "\U0001F698",
    ":blue_car:": "\U0001F699",
    ":pickup_truck:": "\U0001F6FB",
    ":truck:": "\U0001F69A",
    ":articulated_lorry:": "\U0001F69B",
    ":tractor:": "\U0001F69C",
    ":racing_car:": "\U0001F3CE",
    ":racing_motorcycle:": "\U0001F3CD",
    ":motorcycle:": "\U0001F3CD",
    ":motor_scooter:": "\U0001F6F5",
    ":manual_wheelchair:": "\U0001F9BD",
    ":motorized_wheelchair:": "\U0001F9BC",
    ":auto_rickshaw:": "\U0001F6FA",
    ":bicycle:": "\U0001F6B2",
    ":kick_scooter:": "\U0001F6F4",
    ":skateboard:": "\U0001F6F9",
    ":roller_skate:": "\U0001F6FC",
    ":bus:": "\U0001F68C",
    ":oncoming_bus:": "\U0001F68D",
    ":trolleybus:": "\U0001F68E",
    ":minibus:": "\U0001F690",
    ":ambulance:": "\U0001F691",
    ":fire_engine:": "\U0001F692",
    ":police_car:": "\U0001F693",
    ":oncoming_police_car:": "\U0001F694",
    ":taxi:": "\U0001F695",
    ":oncoming_taxi:": "\U0001F696",
    ":automobile:": "\U0001F697",
    ":oncoming_automobile:": "\U0001F698",
    ":blue_car:": "\U0001F699",
    ":pickup_truck:": "\U0001F6FB",
    ":truck:": "\U0001F69A",
    ":articulated_lorry:": "\U0001F69B",
    ":tractor:": "\U0001F69C",
    ":racing_car:": "\U0001F3CE",
    ":racing_motorcycle:": "\U0001F3CD",
    ":motorcycle:": "\U0001F3CD",
    ":motor_scooter:": "\U0001F6F5",
    ":manual_wheelchair:": "\U0001F9BD",
    ":motorized_wheelchair:": "\U0001F9BC",
    ":auto_rickshaw:": "\U0001F6FA",
    ":bicycle:": "\U0001F6B2",
    ":kick_scooter:": "\U0001F6F4",
    ":skateboard:": "\U0001F6F9",
    ":roller_skate:": "\U0001F6FC",
    ":bus:": "\U0001F68C",
    ":oncoming_bus:": "\U0001F68D",
    ":trolleybus:": "\U0001F68E",
    ":minibus:": "\U0001F690",
    ":ambulance:": "\U0001F691",
    ":fire_engine:": "\U0001F692",
    ":police_car:": "\U0001F693",
    ":oncoming_police_car:": "\U0001F694",
    ":taxi:": "\U0001F695",
    ":oncoming_taxi:": "\U0001F696",
    ":automobile:": "\U0001F697",
    ":oncoming_automobile:": "\U0001F698",
    ":blue_car:": "\U0001F699",
    ":pickup_truck:": "\U0001F6FB",
    ":truck:": "\U0001F69A",
    ":articulated_lorry:": "\U0001F69B",
    ":tractor:": "\U0001F69C",
    ":racing_car:": "\U0001F3CE",
    ":racing_motorcycle:": "\U0001F3CD",
    ":motorcycle:": "\U0001F3CD",
    ":motor_scooter:": "\U0001F6F5",
    ":manual_wheelchair:": "\U0001F9BD",
    ":motorized_wheelchair:": "\U0001F9BC",
    ":auto_rickshaw:": "\U0001F6FA",
    ":bicycle:": "\U0001F6B2",
    ":kick_scooter:": "\U0001F6F4",
    ":skateboard:": "\U0001F6F9",
    ":roller_skate:": "\U0001F6FC",
    ":bus:": "\U0001F68C",
    ":oncoming_bus:": "\U0001F68D",
    ":trolleybus:": "\U0001F68E",
    ":minibus:": "\U0001F690",
    ":ambulance:": "\U0001F691",
    ":fire_engine:": "\U0001F692",
    ":police_car:": "\U0001F693",
    ":oncoming_police_car:": "\U0001F694",
    ":taxi:": "\U0001F695",
    ":oncoming_taxi:": "\U0001F696",
    ":automobile:": "\U0001F697",
    ":oncoming_automobile:": "\U0001F698",
    ":blue_car:": "\U0001F699",
    ":pickup_truck:": "\U0001F6FB",
    ":truck:": "\U0001F69A",
    ":articulated_lorry:": "\U0001F69B",
    ":tractor:": "\U0001F69C",
    ":racing_car:": "\U0001F3CE",
    ":racing_motorcycle:": "\U0001F3CD",
    ":motorcycle:": "\U0001F3CD",
    ":motor_scooter:": "\U0001F6F5",
    ":manual_wheelchair:": "\U0001F9BD",
    ":motorized_wheelchair:": "\U0001F9BC",
    ":auto_rickshaw:": "\U0001F6FA",
    ":bicycle:": "\U0001F6B2",
    ":kick_scooter:": "\U0001F6F4",
    ":skateboard:": "\U0001F6F9",
    ":roller_skate:": "\U0001F6FC",
    ":bus:": "\U0001F68C",
    ":oncoming_bus:": "\U0001F68D",
    ":trolleybus:": "\U0001F68E",
    ":minibus:": "\U0001F690",
    ":ambulance:": "\U0001F691",
    ":fire_engine:": "\U0001F692",
    ":police_car:": "\U0001F693",
    ":oncoming_police_car:": "\U0001F694",
    ":taxi:": "\U0001F695",
    ":oncoming_taxi:": "\U0001F696",
    ":automobile:": "\U0001F697",
    ":oncoming_automobile:": "\U0001F698",
    ":blue_car:": "\U0001F699",
    ":pickup_truck:": "\U0001F6FB",
    ":truck:": "\U0001F69A",
    ":articulated_lorry:": "\U0001F69B",
    ":tractor:": "\U0001F69C",
    ":racing_car:": "\U0001F3CE",
    ":racing_motorcycle:": "\U0001F3CD",
    ":motorcycle:": "\U0001F3CD",
    ":motor_scooter:": "\U0001F6F5",
    ":manual_wheelchair:": "\U0001F9BD",
    ":motorized_wheelchair:": "\U0001F9BC",
    ":auto_rickshaw:": "\U0001F6FA",
    ":bicycle:": "\U0001F6B2",
    ":kick_scooter:": "\U0001F6F4",
    ":skateboard:": "\U0001F6F9",
    ":roller_skate:": "\U0001F6FC",
    ":bus:": "\U0001F68C",
    ":oncoming_bus:": "\U0001F68D",
    ":trolleybus:": "\U0001F68E",
    ":minibus:": "\U0001F690",
    ":ambulance:": "\U0001F691",
    ":fire_engine:": "\U0001F692",
    ":police_car:": "\U0001F693",
    ":oncoming_police_car:": "\U0001F694",
    ":taxi:": "\U0001F695",
    ":oncoming_taxi:": "\U0001F696",
    ":automobile:": "\U0001F697",
    ":oncoming_automobile:": "\U0001F698",
    ":blue_car:": "\U0001F699",
    ":pickup_truck:": "\U0001F6FB",
    ":truck:": "\U0001F69A",
    ":articulated_lorry:": "\U0001F69B",
    ":tractor:": "\U0001F69C",
    ":racing_car:": "\U0001F3CE",
    ":racing_motorcycle:": "\U0001F3CD",
    ":motorcycle:": "\U0001F3CD",
    ":motor_scooter:": "\U0001F6F5",
    ":manual_wheelchair:": "\U0001F9BD",
    ":motorized_wheelchair:": "\U0001F9BC",
    ":auto_rickshaw:": "\U0001F6FA",
    ":bicycle:": "\U0001F6B2",
    ":kick_scooter:": "\U0001F6F4",
    ":skateboard:": "\U0001F6F9",
    ":roller_skate:": "\U0001F6FC",
    ":bus:": "\U0001F68C",
    ":oncoming_bus:": "\U0001F68D",
    ":trolleybus:": "\U0001F68E",
    ":minibus:": "\U0001F690",
    ":ambulance:": "\U0001F691",
    ":fire_engine:": "\U0001F692",
    ":police_car:": "\U0001F693",
    ":oncoming_police_car:": "\U0001F694",
    ":taxi:": "\U0001F695",
    ":oncoming_taxi:": "\U0001F696",
    ":automobile:": "\U0001F697",
    ":oncoming_automobile:": "\U0001F698",
    ":blue_car:": "\U0001F699",
    ":pickup_truck:": "\U0001F6FB",
    ":truck:": "\U0001F69A",
    ":articulated_lorry:": "\U0001F69B",
    ":tractor:": "\U0001F69C",
    ":racing_car:": "\U0001F3CE",
    ":racing_motorcycle:": "\U0001F3CD",
    ":motorcycle:": "\U0001F3CD",
    ":motor_scooter:": "\U0001F6F5",
    ":manual_wheelchair:": "\U0001F9BD",
    ":motorized_wheelchair:": "\U0001F9BC",
    ":auto_rickshaw:": "\U0001F6FA",
    ":bicycle:": "\U0001F6B2",
    ":kick_scooter:": "\U0001F6F4",
    ":skateboard:": "\U0001F6F9",
    ":roller_skate:": "\U0001F6FC",
    ":bus:": "\U0001F68C",
    ":oncoming_bus:": "\U0001F68D",
    ":trolleybus:": "\U0001F68E",
    ":minibus:": "\U0001F690",
    ":ambulance:": "\U0001F691",
    ":fire_engine:": "\U0001F692",
    ":police_car:": "\U0001F693",
    ":oncoming_police_car:": "\U0001F694",
    ":taxi:": "\U0001F695",
    ":oncoming_taxi:": "\U0001F696",
    ":automobile:": "\U0001F697",
    ":oncoming_automobile:": "\U0001F698",
    ":blue_car:": "\U0001F699",
    ":pickup_truck:": "\U0001F6FB",
    ":truck:": "\U0001F69A",
    ":articulated_lorry:": "\U0001F69B",
    ":tractor:": "\U0001F69C",
    ":racing_car:": "\U0001F3CE",
    ":racing_motorcycle:": "\U0001F3CD",
    ":motorcycle:": "\U0001F3CD",
    ":motor_scooter:": "\U0001F6F5",
    ":manual_wheelchair:": "\U0001F9BD",
    ":motorized_wheelchair:": "\U0001F9BC",
    ":auto_rickshaw:": "\U0001F6FA",
    ":bicycle:": "\U0001F6B2",
    ":kick_scooter:": "\U0001F6F4",
    ":skateboard:": "\U0001F6F9",
    ":roller_skate:": "\U0001F6FC",
    ":bus:": "\U0001F68C",
    ":oncoming_bus:": "\U0001F68D",
    ":trolleybus:": "\U0001F68E",
    ":minibus:": "\U0001F690",
    ":ambulance:": "\U0001F691",
    ":fire_engine:": "\U0001F692",
    ":police_car:": "\U0001F693",
    ":oncoming_police_car:": "\U0001F694",
    ":taxi:": "\U0001F695",
    ":oncoming_taxi:": "\U0001F696",
    ":automobile:": "\U0001F697",
    ":oncoming_automobile:": "\U0001F698",
    ":blue_car:": "\U0001F699",
    ":pickup_truck:": "\U0001F6FB",
    ":truck:": "\U0001F69A",
    ":articulated_lorry:": "\U0001F69B",
    ":tractor:": "\U0001F69C",
    ":racing_car:": "\U0001F3CE",
    ":racing_motorcycle:": "\U0001F3CD",
    ":motorcycle:": "\U0001F3CD",
    ":motor_scooter:": "\U0001F6F5",
    ":manual_wheelchair:": "\U0001F9BD",
    ":motorized_wheelchair:": "\U0001F9BC",
    ":auto_rickshaw:": "\U0001F6FA",
    ":bicycle:": "\U0001F6B2",
    ":kick_scooter:": "\U0001F6F4",
    ":skateboard:": "\U0001F6F9",
    ":roller_skate:": "\U0001F6FC",
    ":bus:": "\U0001F68C",
    ":oncoming_bus:": "\U0001F68D",
    ":trolleybus:": "\U0001F68E",
    ":minibus:": "\U0001F690",
    ":ambulance:": "\U0001F691",
    ":fire_engine:": "\U0001F692",
    ":police_car:": "\U0001F693",
    ":oncoming_police_car:": "\U0001F694",
    ":taxi:": "\U0001F695",
    ":oncoming_taxi:": "\U0001F696",
    ":automobile:": "\U0001F697",
    ":oncoming_automobile:": "\U0001F698",
    ":blue_car:": "\U0001F699",
    ":pickup_truck:": "\U0001F6FB",
    ":truck:": "\U0001F69A",
    ":articulated_lorry:": "\U0001F69B",
    ":tractor:": "\U0001F69C",
    ":racing_car:": "\U0001F3CE",
    ":racing_motorcycle:": "\U0001F3CD",
    ":motorcycle:": "\U0001F3CD",
    ":motor_scooter:": "\U0001F6F5",
    ":manual_wheelchair:": "\U0001F9BD",
    ":motorized_wheelchair:": "\U0001F9BC",
    ":auto_rickshaw:": "\U0001F6FA",
    ":bicycle:": "\U0001F6B2",
    ":kick_scooter:": "\U0001F6F4",
    ":skateboard:": "\U0001F6F9",
    ":roller_skate:": "\U0001F6FC",
    ":bus:": "\U0001F68C",
    ":oncoming_bus:": "\U0001F68D",
    ":trolleybus:": "\U0001F68E",
    ":minibus:": "\U0001F690",
    ":ambulance:": "\U0001F691",
    ":fire_engine:": "\U0001F692",
    ":police_car:": "\U0001F693",
    ":oncoming_police_car:": "\U0001F694",
    ":taxi:": "\U0001F695",
    ":oncoming_taxi:": "\U0001F696",
    ":automobile:": "\U0001F697",
    ":oncoming_automobile:": "\U0001F698",
    ":blue_car:": "\U0001F699",
    ":pickup_truck:": "\U0001F6FB",
    ":truck:": "\U0001F69A",
    ":articulated_lorry:": "\U0001F69B",
    ":tractor:": "\U0001F69C",
    ":racing_car:": "\U0001F3CE",
    ":racing_motorcycle:": "\U0001F3CD",
    ":motorcycle:": "\U0001F3CD",
    ":motor_scooter:": "\U0001F6F5",
    ":manual_wheelchair:": "\U0001F9BD",
    ":motorized_wheelchair:": "\U0001F9BC",
    ":auto_rickshaw:": "\U0001F6FA",
    ":bicycle:": "\U0001F6B2",
    ":kick_scooter:": "\U0001F6F4",
    ":skateboard:": "\U0001F6F9",
    ":roller_skate:": "\U0001F6FC",
    ":bus:": "\U0001F68C",
    ":oncoming_bus:": "\U0001F68D",
    ":trolleybus:": "\U0001F68E",
    ":minibus:": "\U0001F690",
    ":ambulance:": "\U0001F691",
    ":fire_engine:": "\U0001F692",
    ":police_car:": "\U0001F693",
    ":oncoming_police_car:": "\U0001F694",
    ":taxi:": "\U0001F695",
    ":oncoming_taxi:": "\U0001F696",
    ":automobile:": "\U0001F697",
    ":oncoming_automobile:": "\U0001F698",
    ":blue_car:": "\U0001F699",
    ":pickup_truck:": "\U0001F6FB",
    ":truck:": "\U0001F69A",
    ":articulated_lorry:": "\U0001F69B",
    ":tractor:": "\U0001F69C",
    ":racing_car:": "\U0001F3CE",
    ":racing_motorcycle:": "\U0001F3CD",
    ":motorcycle:": "\U0001F3CD",
    ":motor_scooter:": "\U0001F6F5",
    ":manual_wheelchair:": "\U0001F9BD",
    ":motorized_wheelchair:": "\U0001F9BC",
    ":auto_rickshaw:": "\U0001F6FA",
    ":bicycle:": "\U0001F6B2",
    ":kick_scooter:": "\U0001F6F4",
    ":skateboard:": "\U0001F6F9",
    ":roller_skate:": "\U0001F6FC",
    ":bus:": "\U0001F68C",
    ":oncoming_bus:": "\U0001F68D",
    ":trolleybus:": "\U0001F68E",
    ":minibus:": "\U0001F690",
    ":ambulance:": "\U0001F691",
    ":fire_engine:": "\U0001F692",
    ":police_car:": "\U0001F693",
    ":oncoming_police_car:": "\U0001F694",
    ":taxi:": "\U0001F695",
    ":oncoming_taxi:": "\U0001F696",
    ":automobile:": "\U0001F697",
    ":oncoming_automobile:": "\U0001F698",
    ":blue_car:": "\U0001F699",
    ":pickup_truck:": "\U0001F6FB",
    ":truck:": "\U0001F69A",
    ":articulated_lorry:": "\U0001F69B",
    ":tractor:": "\U0001F69C",
    ":racing_car:": "\U0001F3CE",
    ":racing_motorcycle:": "\U0001F3CD",
    ":motorcycle:": "\U0001F3CD",
    ":motor_scooter:": "\U0001F6F5",
    ":manual_wheelchair:": "\U0001F9BD",
    ":motorized_wheelchair:": "\U0001F9BC",
    ":auto_rickshaw:": "\U0001F6FA",
    ":bicycle:": "\U0001F6B2",
    ":kick_scooter:": "\U0001F6F4",
    ":skateboard:": "\U0001F6F9",
    ":roller_skate:": "\U0001F6FC",
    ":bus:": "\U0001F68C",
    ":oncoming_bus:": "\U0001F68D",
    ":trolleybus:": "\U0001F68E",
    ":minibus:": "\U0001F690",
    ":ambulance:": "\U0001F691",
    ":fire_engine:": "\U0001F692",
    ":police_car:": "\U0001F693",
    ":oncoming_police_car:": "\U0001F694",
    ":taxi:": "\U0001F695",
    ":oncoming_taxi:": "\U0001F696",
    ":automobile:": "\U0001F697",
    ":oncoming_automobile:": "\U0001F698",
    ":blue_car:": "\U0001F699",
    ":pickup_truck:": "\U0001F6FB",
    ":truck:": "\U0001F69A",
    ":articulated_lorry:": "\U0001F69B",
    ":tractor:": "\U0001F69C",
    ":racing_car:": "\U0001F3CE",
    ":racing_motorcycle:": "\U0001F3CD",
    ":motorcycle:": "\U0001F3CD",
    ":motor_scooter:": "\U0001F6F5",
    ":manual_wheelchair:": "\U0001F9BD",
    ":motorized_wheelchair:": "\U0001F9BC",
    ":auto_rickshaw:": "\U0001F6FA",
    ":bicycle:": "\U0001F6B2",
    ":kick_scooter:": "\U0001F6F4",
    ":skateboard:": "\U0001F6F9",
    ":roller_skate:": "\U0001F6FC",
    ":bus:": "\U0001F68C",
    ":oncoming_bus:": "\U0001F68D",
    ":trolleybus:": "\U0001F68E",
    ":minibus:": "\U0001F690",
    ":ambulance:": "\U0001F691",
    ":fire_engine:": "\U0001F692",
    ":police_car:": "\U0001F693",
    ":oncoming_police_car:": "\U0001F694",
    ":taxi:": "\U0001F695",
    ":oncoming_taxi:": "\U0001F696",
    ":automobile:": "\U0001F697",
    ":oncoming_automobile:": "\U0001F698",
    ":blue_car:": "\U0001F699",
    ":pickup_truck:": "\U0001F6FB",
    ":truck:": "\U0001F69A",
    ":articulated_lorry:": "\U0001F69B",
    ":tractor:": "\U0001F69C",
    ":racing_car:": "\U0001F3CE",
    ":racing_motorcycle:": "\U0001F3CD",
    ":motorcycle:": "\U0001F3CD",
    ":motor_scooter:": "\U0001F6F5",
    ":manual_wheelchair:": "\U0001F9BD",
    ":motorized_wheelchair:": "\U0001F9BC",
    ":auto_rickshaw:": "\U0001F6FA",
    ":bicycle:": "\U0001F6B2",
    ":kick_scooter:": "\U0001F6F4",
    ":skateboard:": "\U0001F6F9",
    ":roller_skate:": "\U0001F6FC",
    ":bus:": "\U0001F68C",
    ":oncoming_bus:": "\U0001F68D",
    ":trolleybus:": "\U0001F68E",
    ":minibus:": "\U0001F690",
    ":ambulance:": "\U0001F691",
    ":fire_engine:": "\U0001F692",
    ":police_car:": "\U0001F693",
    ":oncoming_police_car:": "\U0001F694",
    ":taxi:": "\U0001F695",
    ":oncoming_taxi:": "\U0001F696",
    ":automobile:": "\U0001F697",
    ":oncoming_automobile:": "\U0001F698",
    ":blue_car:": "\U0001F699",
    ":pickup_truck:": "\U0001F6FB",
    ":truck:": "\U0001F69A",
    ":articulated_lorry:": "\U0001F69B",
    ":tractor:": "\U0001F69C",
    ":racing_car:": "\U0001F3CE",
    ":racing_motorcycle:": "\U0001F3CD",
    ":motorcycle:": "\U0001F3CD",
    ":motor_scooter:": "\U0001F6F5",
    ":manual_wheelchair:": "\U0001F9BD",
    ":motorized_wheelchair:": "\U0001F9BC",
    ":auto_rickshaw:": "\U0001F6FA",
    ":bicycle:": "\U0001F6B2",
    ":kick_scooter:": "\U0001F6F4",
    ":skateboard:": "\U0001F6F9",
    ":roller_skate:": "\U0001F6FC",
    ":bus:": "\U0001F68C",
    ":oncoming_bus:": "\U0001F68D",
    ":trolleybus:": "\U0001F68E",
    ":minibus:": "\U0001F690",
    ":ambulance:": "\U0001F691",
    ":fire_engine:": "\U0001F692",
    ":police_car:": "\U0001F693",
    ":oncoming_police_car:": "\U0001F694",
    ":taxi:": "\U0001F695",
    ":oncoming_taxi:": "\U0001F696",
    ":automobile:": "\U0001F697",
    ":oncoming_automobile:": "\U0001F698",
    ":blue_car:": "\U0001F699",
    ":pickup_truck:": "\U0001F6FB",
    ":truck:": "\U0001F69A",
    ":articulated_lorry:": "\U0001F69B",
    ":tractor:": "\U0001F69C",
    ":racing_car:": "\U0001F3CE",
    ":racing_motorcycle:": "\U0001F3CD",
    ":motorcycle:": "\U0001F3CD",
    ":motor_scooter:": "\U0001F6F5",
    ":manual_wheelchair:": "\U0001F9BD",
    ":motorized_wheelchair:": "\U0001F9BC",
    ":auto_rickshaw:": "\U0001F6FA",
    ":bicycle:": "\U0001F6B2",
    ":kick_scooter:": "\U0001F6F4",
    ":skateboard:": "\U0001F6F9",
    ":roller_skate:": "\U0001F6FC",
    ":bus:": "\U0001F68C",
    ":oncoming_bus:": "\U0001F68D",
    ":trolleybus:": "\U0001F68E",
    ":minibus:": "\U0001F690",
    ":ambulance:": "\U0001F691",
    ":fire_engine:": "\U0001F692",
    ":police_car:": "\U0001F693",
    ":oncoming_police_car:": "\U0001F694",
    ":taxi:": "\U0001F695",
    ":oncoming_taxi:": "\U0001F696",
    ":automobile:": "\U0001F697",
    ":oncoming_automobile:": "\U0001F698",
    ":blue_car:": "\U0001F699",
    ":pickup_truck:": "\U0001F6FB",
    ":truck:": "\U0001F69A",
    ":articulated_lorry:": "\U0001F69B",
    ":tractor:": "\U0001F69C",
    ":racing_car:": "\U0001F3CE",
    ":racing_motorcycle:": "\U0001F3CD",
    ":motorcycle:": "\U0001F3CD",
    ":motor_scooter:": "\U0001F6F5",
    ":manual_wheelchair:": "\U0001F9BD",
    ":motorized_wheelchair:": "\U0001F9BC",
    ":auto_rickshaw:": "\U0001F6FA",
    ":bicycle:": "\U0001F6B2",
    ":kick_scooter:": "\U0001F6F4",
    ":skateboard:": "\U0001F6F9",
    ":roller_skate:": "\U0001F6FC",
    ":bus:": "\U0001F68C",
    ":oncoming_bus:": "\U0001F68D",
    ":trolleybus:": "\U0001F68E",
    ":minibus:": "\U0001F690",
    ":ambulance:": "\U0001F691",
    ":fire_engine:": "\U0001F692",
    ":police_car:": "\U0001F693",
    ":oncoming_police_car:": "\U0001F694",
    ":taxi:": "\U0001F695",
    ":oncoming_taxi:": "\U0001F696",
    ":automobile:": "\U0001F697",
    ":oncoming_automobile:": "\U0001F698",
    ":blue_car:": "\U0001F699",
    ":pickup_truck:": "\U0001F6FB",
    ":truck:": "\U0001F69A",
    ":articulated_lorry:": "\U0001F69B",
    ":tractor:": "\U0001F69C",
    ":racing_car:": "\U0001F3CE",
    ":racing_motorcycle:": "\U0001F3CD",
    ":motorcycle:": "\U0001F3CD",
    ":motor_scooter:": "\U0001F6F5",
    ":manual_wheelchair:": "\U0001F9BD",
    ":motorized_wheelchair:": "\U0001F9BC",
    ":auto_rickshaw:": "\U0001F6FA",
    ":bicycle:": "\U0001F6B2",
    ":kick_scooter:": "\U0001F6F4",
    ":skateboard:": "\U0001F6F9",
    ":roller_skate:": "\U0001F6FC",
    ":bus:": "\U0001F68C",
    ":oncoming_bus:": "\U0001F68D",
    ":trolleybus:": "\U0001F68E",
    ":minibus:": "\U0001F690",
    ":ambulance:": "\U0001F691",
    ":fire_engine:": "\U0001F692",
    ":police_car:": "\U0001F693",
    ":oncoming_police_car:": "\U0001F694",
    ":taxi:": "\U0001F695",
    ":oncoming_taxi:": "\U0001F696",
    ":automobile:": "\U0001F697",
    ":oncoming_automobile:": "\U0001F698",
    ":blue_car:": "\U0001F699",
    ":pickup_truck:": "\U0001F6FB",
    ":truck:": "\U0001F69A",
    ":articulated_lorry:": "\U0001F69B",
    ":tractor:": "\U0001F69C",
    ":racing_car:": "\U0001F3CE",
    ":racing_motorcycle:": "\U0001F3CD",
    ":motorcycle:": "\U0001F3CD",
    ":motor_scooter:": "\U0001F6F5",
    ":manual_wheelchair:": "\U0001F9BD",
    ":motorized_wheelchair:": "\U0001F9BC",
    ":auto_rickshaw:": "\U0001F6FA",
    ":bicycle:": "\U0001F6B2",
    ":kick_scooter:": "\U0001F6F4",
    ":skateboard:": "\U0001F6F9",
    ":roller_skate:": "\U0001F6FC",
    ":bus:": "\U0001F68C",
    ":oncoming_bus:": "\U0001F68D",
    ":trolleybus:": "\U0001F68E",
    ":minibus:": "\U0001F690",
    ":ambulance:": "\U0001F691",
    ":fire_engine:": "\U0001F692",
    ":police_car:": "\U0001F693",
    ":oncoming_police_car:": "\U0001F694",
    ":taxi:": "\U0001F695",
    ":oncoming_taxi:": "\U0001F696",
    ":automobile:": "\U0001F697",
    ":oncoming_automobile:": "\U0001F698",
    ":blue_car:": "\U0001F699",
    ":pickup_truck:": "\U0001F6FB",
    ":truck:": "\U0001F69A",
    ":articulated_lorry:": "\U0001F69B",
    ":tractor:": "\U0001F69C",
    ":racing_car:": "\U0001F3CE",
    ":racing_motorcycle:": "\U0001F3CD",
    ":motorcycle:": "\U0001F3CD",
    ":motor_scooter:": "\U0001F6F5",
    ":manual_wheelchair:": "\U0001F9BD",
    ":motorized_wheelchair:": "\U0001F9BC",
    ":auto_rickshaw:": "\U0001F6FA",
    ":bicycle:": "\U0001F6B2",
}

EMO_UNICODE = {v: k for k, v in EMO_UNICODE.items()}

In [83]:
def convert_emojis(text):
    for emot in EMO_UNICODE:
        text = re.sub(r'{}+'.format(emot), ' '.join(EMO_UNICODE[emot].replace(', ', '').split()), text)
    return text

text = "game is on 🔥"

convert_emojis(text)

'game is on :fire:'

In [85]:
text = "Hilarious  😊"

convert_emojis(text)

'Hilarious  :blush:'

# 12) Removal of URL's

Next preprocessing step is to remove any URLS present in the data. For example, if we are doing a twitter analysis, then there is a good chance that the tweet will have some URL in it. Probably we might need to remove them for our further analysis.

In [93]:
def remove_urls(text):
    url_pattern = re.compile(r'https?://\S+|www\.\S+')
    return url_pattern.sub(r'', text)

**Let us take a https link and check the code**

In [94]:
text = "Driverless AI NLP blog post on https://www.h2o.ai/blog/detecting-sarcasm-is-difficult-but-ai-may-have-an-answer/"

remove_urls(text)

'Driverless AI NLP blog post on '

**Now let us take a http url and check the code**

In [95]:
text = "Please refer to link http://lnkd.in/ecntSyc for the paper."

remove_urls(text)

'Please refer to link  for the paper.'

In [96]:
text = "want to know more. Checkout www.h20.ai for additional Information."
remove_urls(text)

'want to know more. Checkout  for additional Information.'

# 13) Removal of HTML Tags

* One another common preprocessing technique that will come handy in multiple places is removal of html tags. This is especially useful, if we scrap the data from different websites. We might end up having html strings as part of our text.


* First, let us try to remove the HTML tags using regular expressions.

### (i) Remove HTML using Regex 

In [102]:
def remove_html(text):
    html_pattern = re.compile(r'<.*?>')
    return html_pattern.sub('', text)

text = """<div>
<h1> H2O</h1>
<p> AutoML</p>
<a href="https://www.h2o.ai/products/h20-driverless-ai/"> DriverlessAI</a>
</div>"""

remove_html(text)
print(remove_html(text))


 H2O
 AutoML
 DriverlessAI



**We can also use BeautifulSoup package to get the text from HTML document in a more elegant way.**

### (ii) Remove HTML using Beautiful Soup Library

In [104]:
from bs4 import BeautifulSoup

def remove_html(text):
    soup = BeautifulSoup(text, 'html.parser')
    return soup.get_text()

text = """<div>
<h1> H29</h1>
<p> AutoML</p>
<a href="https://www.hzo.al/products/120-driverless-al/"> Driverless Al</a>
</div>"""

print(remove_html(text))


 H29
 AutoML
 Driverless Al



# 13) Chat Words Conversion

This is an important text preprocessing step if we are dealing with chat data. People do use a lot of abbreviated words in chat and so it might be helpful to expand those words for our analysis purposes.

In [105]:
chat_words_str = """
AFAIK=As Far As I Know
AFK=Away From Keyboard
ASAP=As Soon As Possible
ATK=The Keyboard
BRB=Be Right Back
BTW=By The Way
BFF=Best Friends Forever
DM=Direct Message
FB=Facebook
FYI=For Your Information
GR8=Great
IDK=I Don't Know
IMO=In My Opinion
IMHO=In My Humble Opinion
IRL=In Real Life
JK=Just Kidding
LOL=Laughing Out Loud
LMK=Let Me Know
LMAO=Laughing My Ass Off
NVM=Never Mind
OMG=Oh My God
OMW=On My Way
ROFL=Rolling On the Floor Laughing
TMI=Too Much Information
TTYL=Talk To You Later
WB=Welcome Back
"""

In [106]:
def chat_words_conversion(text):
    chat_words_map = dict()
    chat_words_list = []
    for line in chat_words_str.split("\n"):
        if line == "":
            continue
        cw, cw_expanded = line.split("=")
        chat_words_map[cw.strip()] = cw_expanded.strip()
        chat_words_list.append(cw.strip())

    chat_words_list = set(chat_words_list)

    new_text = []
    for w in text.split():
        if w.upper() in chat_words_list:
            new_text.append(chat_words_map[w.upper()])
        else:
            new_text.append(w)

    return ' '.join(new_text)

In [107]:
chat_words_conversion("one minute BRB")

'one minute Be Right Back'

In [108]:
chat_words_conversion("imo this is awesome")

'In My Opinion this is awesome'

# 14) Spelling Correction

One another important text preprocessing step is spelling correction. Typos are common in text data and we might want to correct those spelling mistakes before we do our analysis.

In [110]:
# !pip install pyspellchecker

In [111]:
from spellchecker import SpellChecker

def correct_spellings(text):
    spell = SpellChecker()
    corrected_text = []
    misspelled_words = spell.unknown(text.split())
    for word in text.split():
        if word in misspelled_words:
            corrected_text.append(spell.correction(word))
        else:
            corrected_text.append(word)

    return " ".join(corrected_text)

In [112]:
text = "speling correctin"
correct_spellings(text)

'spelling correcting'

In [113]:
text = "thinks for readin the noteboo"
correct_spellings(text)

'thinks for reading the notebook'

**In spell correction it will either do spelling correction or auto correct. Example: It will convert 'Idia' into 'India' or 'Idea'. It will depends on it.**