# Введение в обработку естественного языка

## Урок 1. Предобработка текста

**Примечания**:
- словари `apostrophe_dict`, `short_word_dict` и `emoticon_dict` вынесены в hw01_helper.py
- csv файлы вынесены в директорию "../data"
- для работы с текстом по возможности использована функциональность pandas.Series.str как более векторизованная

In [1]:
import numpy as np
import pandas as pd
import nltk

In [2]:
pd.set_option('max_colwidth', 150)

### Вспомогательные словари

In [3]:
from hw01_helper import apostrophe_dict as apostrophe_dict_lib, short_word_dict, emoticon_dict

In [4]:
# В apostrophe_dict в значении указаны варианты, например: 'am not / are not'
# Это избыточно для целей замены сокращений, оставим только первое значение.
apostrophe_dict = {k: v.split('/')[0].strip() for k, v in apostrophe_dict_lib.items()}

### Данные

In [5]:
TRAIN_PATH = "../data/train_tweets.csv"
TEST_PATH = "../data/test_tweets.csv"
PROCESSED_PATH = "../data/tweets.pkl.gz"

In [6]:
train_df = pd.read_csv(TRAIN_PATH)
test_df = pd.read_csv(TEST_PATH)
combine_df = train_df.append(test_df, ignore_index = True, sort = False)
combine_df.shape

(49159, 3)

### Обработка

In [7]:
df00 = combine_df.assign(text=lambda x: x.tweet)

---
1. Удалим @user из всех твитов с помощью паттерна "@[\w]*".

In [8]:
df01 = df00.assign(text=lambda x: x.text.str.replace(r'@\w+', '', regex=True))
df01.head(5)

Unnamed: 0,id,label,tweet,text
0,1,0.0,@user when a father is dysfunctional and is so selfish he drags his kids into his dysfunction. #run,when a father is dysfunctional and is so selfish he drags his kids into his dysfunction. #run
1,2,0.0,@user @user thanks for #lyft credit i can't use cause they don't offer wheelchair vans in pdx. #disapointed #getthanked,thanks for #lyft credit i can't use cause they don't offer wheelchair vans in pdx. #disapointed #getthanked
2,3,0.0,bihday your majesty,bihday your majesty
3,4,0.0,#model i love u take with u all the time in urð±!!! ððððð¦ð¦ð¦,#model i love u take with u all the time in urð±!!! ððððð¦ð¦ð¦
4,5,0.0,factsguide: society now #motivation,factsguide: society now #motivation


---
2. Изменим регистр твитов на нижний с помощью .lower()

In [9]:
df02 = df01.assign(text=lambda x: x.text.str.lower())

---
3. Заменим сокращения с апострофами (пример: ain't, can't), используя apostrophe_dict.

In [10]:
# эти сокращения ещё называются "short/contracted forms of the modal verbs"
def dictionary_replace(string, lookup):
    return ' '.join(lookup.get(word,word) for word in string.split())

In [11]:
df03 = df02.assign(text=lambda x: x.text.apply(dictionary_replace, lookup=apostrophe_dict))
df03.head(3)

Unnamed: 0,id,label,tweet,text
0,1,0.0,@user when a father is dysfunctional and is so selfish he drags his kids into his dysfunction. #run,when a father is dysfunctional and is so selfish he drags his kids into his dysfunction. #run
1,2,0.0,@user @user thanks for #lyft credit i can't use cause they don't offer wheelchair vans in pdx. #disapointed #getthanked,thanks for #lyft credit i cannot use cause they do not offer wheelchair vans in pdx. #disapointed #getthanked
2,3,0.0,bihday your majesty,bihday your majesty


---
4. Заменим сокращения на их полные формы, используя short_word_dict. Для этого воспользуемся функцией, используемой в предыдущем пункте.

In [12]:
df04 = df03.assign(text=lambda x: x.text.apply(dictionary_replace, lookup=short_word_dict))
# С таким простым подходом не все получится заменить. Например, "omg!!!" не заменится на "oh my god!!!"
df04.iloc[[94, 4426]]

Unnamed: 0,id,label,tweet,text
94,95,0.0,omg!!! loving this station!!! way to jam out at work!!! while getting work done of course!!!! #memories @user,omg!!! loving this station!!! way to jam out at work!!! while getting work done of course!!!! #memories
4426,4427,0.0,thanx ð @user #love #couple #cute #adorable #hugs #romance #forever #girlfriend #smile #beautiful,thanks ð #love #couple #cute #adorable #hugs #romance #forever #girlfriend #smile #beautiful


---
5. Заменим эмотиконы (пример: ":)" = "happy"), используя emoticon_dict. Для этого воспользуемся функцией, используемой в предыдущем пункте.

In [13]:
df05 = df04.assign(text=lambda x: x.text.apply(dictionary_replace, lookup=emoticon_dict))
df05.iloc[[63, 128]]

Unnamed: 0,id,label,tweet,text
63,64,0.0,you've really hu my feelings :(,you have really hu my feelings sad
128,129,0.0,yeah! new buttons in the mail for me ð they are so pretty! :) #jewelrymaking #buttons,yeah! new buttons in the mail for me ð they are so pretty! happy #jewelrymaking #buttons


---
6. Заменим пунктуацию на пробелы, используя re.sub() и паттерн r'[^\w\s]'

In [14]:
df06 = df05.assign(text=lambda x: x.text.str.replace(r'[^\w\s]', ' ', regex=True))
df06.head(4)

Unnamed: 0,id,label,tweet,text
0,1,0.0,@user when a father is dysfunctional and is so selfish he drags his kids into his dysfunction. #run,when a father is dysfunctional and is so selfish he drags his kids into his dysfunction run
1,2,0.0,@user @user thanks for #lyft credit i can't use cause they don't offer wheelchair vans in pdx. #disapointed #getthanked,thanks for lyft credit i cannot use cause they do not offer wheelchair vans in pdx disapointed getthanked
2,3,0.0,bihday your majesty,bihday your majesty
3,4,0.0,#model i love u take with u all the time in urð±!!! ððððð¦ð¦ð¦,model i love you take with you all the time in urð ð ð ð ð ð ð ð


---
7. Заменим спец. символы на пробелы, используя re.sub() и паттерн r'[^a-zA-Z0-9]'

In [15]:
df07 = df06.assign(text=lambda x: x.text.str.replace(r'[^a-zA-Z0-9]', repl=' ', regex=True))
df07.iloc[48003:48005]

Unnamed: 0,id,label,tweet,text
48003,48004,,@user me &amp; @user off to @user fri 22/7 @user to see @user @user @user plus otheâ¦,me amp off to fri 22 7 to see plus othe
48004,48005,,"run when you can, walk if you have to, crawl if u must; just never give up~! ð«ð«âºï¸ #sundayâ¦",run when you can walk if you have to crawl if you must just never give up sunday


---
8. Заменим числа на пробелы, используя re.sub() и паттерн r'[^a-zA-Z]'

In [16]:
df08 = df07.assign(text=lambda x: x.text.str.replace(r'[^a-zA-Z]', repl=' ', regex=True))
df08.iloc[[50, 48003]]

Unnamed: 0,id,label,tweet,text
50,51,0.0,#abc2020 getting ready 2 remove the victums frm #pulseclub #prayfororlando,abc getting ready remove the victums frm pulseclub prayfororlando
48003,48004,,@user me &amp; @user off to @user fri 22/7 @user to see @user @user @user plus otheâ¦,me amp off to fri to see plus othe


---
9. Удалим из текста слова длиной в 1 символ, используя ' '.join([w for w in x.split() if len(w)>1])

In [17]:
# Немного более общая реализация
def filter_words(words, function=None):
    if isinstance(words, str):
        return ' '.join(filter(function, words.split()))
    return list(filter(function, words))

In [18]:
df09 = df08.assign(text=lambda x: x.text.apply(filter_words, function=lambda x: len(x)>1))
df09.head(2)

Unnamed: 0,id,label,tweet,text
0,1,0.0,@user when a father is dysfunctional and is so selfish he drags his kids into his dysfunction. #run,when father is dysfunctional and is so selfish he drags his kids into his dysfunction run
1,2,0.0,@user @user thanks for #lyft credit i can't use cause they don't offer wheelchair vans in pdx. #disapointed #getthanked,thanks for lyft credit cannot use cause they do not offer wheelchair vans in pdx disapointed getthanked


---
10. Поделим твиты на токены с помощью nltk.tokenize.word_tokenize, создав новый столбец 'tweet_token'.

In [19]:
%%time
df10 = df09.assign(tweet_token=lambda x: x.text.apply(nltk.tokenize.word_tokenize))
df10.head(2)

Wall time: 10.3 s


Unnamed: 0,id,label,tweet,text,tweet_token
0,1,0.0,@user when a father is dysfunctional and is so selfish he drags his kids into his dysfunction. #run,when father is dysfunctional and is so selfish he drags his kids into his dysfunction run,"[when, father, is, dysfunctional, and, is, so, selfish, he, drags, his, kids, into, his, dysfunction, run]"
1,2,0.0,@user @user thanks for #lyft credit i can't use cause they don't offer wheelchair vans in pdx. #disapointed #getthanked,thanks for lyft credit cannot use cause they do not offer wheelchair vans in pdx disapointed getthanked,"[thanks, for, lyft, credit, can, not, use, cause, they, do, not, offer, wheelchair, vans, in, pdx, disapointed, getthanked]"


---
11. Удалим стоп-слова из токенов, используя nltk.corpus.stopwords. Создадим столбец 'tweet_token_filtered' без стоп-слов.

In [20]:
stopwords = set(nltk.corpus.stopwords.words("english"))

In [21]:
df11 = df10.assign(tweet_token_filtered=lambda x: x.tweet_token.apply(filter_words, 
                                                                      function=lambda x: x not in stopwords))
df11.head(2)

Unnamed: 0,id,label,tweet,text,tweet_token,tweet_token_filtered
0,1,0.0,@user when a father is dysfunctional and is so selfish he drags his kids into his dysfunction. #run,when father is dysfunctional and is so selfish he drags his kids into his dysfunction run,"[when, father, is, dysfunctional, and, is, so, selfish, he, drags, his, kids, into, his, dysfunction, run]","[father, dysfunctional, selfish, drags, kids, dysfunction, run]"
1,2,0.0,@user @user thanks for #lyft credit i can't use cause they don't offer wheelchair vans in pdx. #disapointed #getthanked,thanks for lyft credit cannot use cause they do not offer wheelchair vans in pdx disapointed getthanked,"[thanks, for, lyft, credit, can, not, use, cause, they, do, not, offer, wheelchair, vans, in, pdx, disapointed, getthanked]","[thanks, lyft, credit, use, cause, offer, wheelchair, vans, pdx, disapointed, getthanked]"


---
12. Применим стемминг к токенам с помощью nltk.stem.PorterStemmer. Создадим столбец 'tweet_stemmed' после применения стемминга.

In [22]:
stemmer = nltk.stem.PorterStemmer()

In [23]:
def transform_words(words, function):
    if isinstance(words, str):
        return ' '.join(map(function, words.split()))
    return list(map(function, words))

In [24]:
%%time
df12 = df11.assign(tweet_stemmed=lambda x: x.tweet_token_filtered.apply(transform_words,
                                                                        function=stemmer.stem))
df12.head(2)

Wall time: 14 s


Unnamed: 0,id,label,tweet,text,tweet_token,tweet_token_filtered,tweet_stemmed
0,1,0.0,@user when a father is dysfunctional and is so selfish he drags his kids into his dysfunction. #run,when father is dysfunctional and is so selfish he drags his kids into his dysfunction run,"[when, father, is, dysfunctional, and, is, so, selfish, he, drags, his, kids, into, his, dysfunction, run]","[father, dysfunctional, selfish, drags, kids, dysfunction, run]","[father, dysfunct, selfish, drag, kid, dysfunct, run]"
1,2,0.0,@user @user thanks for #lyft credit i can't use cause they don't offer wheelchair vans in pdx. #disapointed #getthanked,thanks for lyft credit cannot use cause they do not offer wheelchair vans in pdx disapointed getthanked,"[thanks, for, lyft, credit, can, not, use, cause, they, do, not, offer, wheelchair, vans, in, pdx, disapointed, getthanked]","[thanks, lyft, credit, use, cause, offer, wheelchair, vans, pdx, disapointed, getthanked]","[thank, lyft, credit, use, caus, offer, wheelchair, van, pdx, disapoint, getthank]"


In [25]:
# Faster implementation with memory cache
def transform_series_words(series, function):
    unique_words = set(series.explode(ignore_index=True))
    cache = {w:function(w) for w in unique_words if isinstance(w, str)}
    return series.apply(lambda words: [cache[w] for w in words if w in cache])

In [26]:
%%time
df12 = df11.assign(tweet_stemmed=lambda x: transform_series_words(x.tweet_token_filtered,
                                                                  function=stemmer.stem))
df12.head(2)

Wall time: 2.02 s


Unnamed: 0,id,label,tweet,text,tweet_token,tweet_token_filtered,tweet_stemmed
0,1,0.0,@user when a father is dysfunctional and is so selfish he drags his kids into his dysfunction. #run,when father is dysfunctional and is so selfish he drags his kids into his dysfunction run,"[when, father, is, dysfunctional, and, is, so, selfish, he, drags, his, kids, into, his, dysfunction, run]","[father, dysfunctional, selfish, drags, kids, dysfunction, run]","[father, dysfunct, selfish, drag, kid, dysfunct, run]"
1,2,0.0,@user @user thanks for #lyft credit i can't use cause they don't offer wheelchair vans in pdx. #disapointed #getthanked,thanks for lyft credit cannot use cause they do not offer wheelchair vans in pdx disapointed getthanked,"[thanks, for, lyft, credit, can, not, use, cause, they, do, not, offer, wheelchair, vans, in, pdx, disapointed, getthanked]","[thanks, lyft, credit, use, cause, offer, wheelchair, vans, pdx, disapointed, getthanked]","[thank, lyft, credit, use, caus, offer, wheelchair, van, pdx, disapoint, getthank]"


---
13. Применим лемматизацию к токенам с помощью nltk.stem.wordnet.WordNetLemmatizer. Создадим столбец 'tweet_lemmatized' после применения лемматизации.

In [27]:
lemmatizer = nltk.stem.wordnet.WordNetLemmatizer()

In [28]:
%%time
df13 = df12.assign(tweet_lemmatized=lambda x: transform_series_words(x.tweet_token_filtered,
                                                                     function=lemmatizer.lemmatize))
df13.head(2)

Wall time: 4.11 s


Unnamed: 0,id,label,tweet,text,tweet_token,tweet_token_filtered,tweet_stemmed,tweet_lemmatized
0,1,0.0,@user when a father is dysfunctional and is so selfish he drags his kids into his dysfunction. #run,when father is dysfunctional and is so selfish he drags his kids into his dysfunction run,"[when, father, is, dysfunctional, and, is, so, selfish, he, drags, his, kids, into, his, dysfunction, run]","[father, dysfunctional, selfish, drags, kids, dysfunction, run]","[father, dysfunct, selfish, drag, kid, dysfunct, run]","[father, dysfunctional, selfish, drag, kid, dysfunction, run]"
1,2,0.0,@user @user thanks for #lyft credit i can't use cause they don't offer wheelchair vans in pdx. #disapointed #getthanked,thanks for lyft credit cannot use cause they do not offer wheelchair vans in pdx disapointed getthanked,"[thanks, for, lyft, credit, can, not, use, cause, they, do, not, offer, wheelchair, vans, in, pdx, disapointed, getthanked]","[thanks, lyft, credit, use, cause, offer, wheelchair, vans, pdx, disapointed, getthanked]","[thank, lyft, credit, use, caus, offer, wheelchair, van, pdx, disapoint, getthank]","[thanks, lyft, credit, use, cause, offer, wheelchair, van, pdx, disapointed, getthanked]"


### Обработка как последовательность действий в pandas

In [29]:
%%time
df = (
    combine_df
    .assign(text=lambda x: x.tweet)
    .assign(text=lambda x: x.text.str.replace(r'@\w+', '', regex=True))
    .assign(text=lambda x: x.text.str.lower())
    .assign(text=lambda x: x.text.apply(dictionary_replace, lookup=apostrophe_dict))
    .assign(text=lambda x: x.text.apply(dictionary_replace, lookup=short_word_dict))
    .assign(text=lambda x: x.text.apply(dictionary_replace, lookup=emoticon_dict))
    .assign(text=lambda x: x.text.str.replace(r'[^\w\s]', ' ', regex=True))
    .assign(text=lambda x: x.text.str.replace(r'[^a-zA-Z0-9]', repl=' ', regex=True))
    .assign(text=lambda x: x.text.str.replace(r'[^a-zA-Z]', repl=' ', regex=True))
    .assign(text=lambda x: x.text.apply(filter_words, function=lambda x: len(x)>1))
    .assign(tweet_token=lambda x: x.text.apply(nltk.tokenize.word_tokenize))
    .assign(tweet_token_filtered=lambda x: x.tweet_token.apply(filter_words,
                                                               function=lambda x: x not in stopwords))
    .assign(tweet_stemmed=lambda x: transform_series_words(x.tweet_token_filtered,
                                                           function=stemmer.stem))
    .assign(tweet_lemmatized=lambda x: transform_series_words(x.tweet_token_filtered,
                                                              function=lemmatizer.lemmatize))
)
df

Wall time: 15.7 s


Unnamed: 0,id,label,tweet,text,tweet_token,tweet_token_filtered,tweet_stemmed,tweet_lemmatized
0,1,0.0,@user when a father is dysfunctional and is so selfish he drags his kids into his dysfunction. #run,when father is dysfunctional and is so selfish he drags his kids into his dysfunction run,"[when, father, is, dysfunctional, and, is, so, selfish, he, drags, his, kids, into, his, dysfunction, run]","[father, dysfunctional, selfish, drags, kids, dysfunction, run]","[father, dysfunct, selfish, drag, kid, dysfunct, run]","[father, dysfunctional, selfish, drag, kid, dysfunction, run]"
1,2,0.0,@user @user thanks for #lyft credit i can't use cause they don't offer wheelchair vans in pdx. #disapointed #getthanked,thanks for lyft credit cannot use cause they do not offer wheelchair vans in pdx disapointed getthanked,"[thanks, for, lyft, credit, can, not, use, cause, they, do, not, offer, wheelchair, vans, in, pdx, disapointed, getthanked]","[thanks, lyft, credit, use, cause, offer, wheelchair, vans, pdx, disapointed, getthanked]","[thank, lyft, credit, use, caus, offer, wheelchair, van, pdx, disapoint, getthank]","[thanks, lyft, credit, use, cause, offer, wheelchair, van, pdx, disapointed, getthanked]"
2,3,0.0,bihday your majesty,bihday your majesty,"[bihday, your, majesty]","[bihday, majesty]","[bihday, majesti]","[bihday, majesty]"
3,4,0.0,#model i love u take with u all the time in urð±!!! ððððð¦ð¦ð¦,model love you take with you all the time in ur,"[model, love, you, take, with, you, all, the, time, in, ur]","[model, love, take, time, ur]","[model, love, take, time, ur]","[model, love, take, time, ur]"
4,5,0.0,factsguide: society now #motivation,factsguide society now motivation,"[factsguide, society, now, motivation]","[factsguide, society, motivation]","[factsguid, societi, motiv]","[factsguide, society, motivation]"
...,...,...,...,...,...,...,...,...
49154,49155,,thought factory: left-right polarisation! #trump #uselections2016 #leadership #politics #brexit #blm &gt;3,thought factory left right polarisation trump uselections leadership politics brexit blm gt,"[thought, factory, left, right, polarisation, trump, uselections, leadership, politics, brexit, blm, gt]","[thought, factory, left, right, polarisation, trump, uselections, leadership, politics, brexit, blm, gt]","[thought, factori, left, right, polaris, trump, uselect, leadership, polit, brexit, blm, gt]","[thought, factory, left, right, polarisation, trump, uselections, leadership, politics, brexit, blm, gt]"
49155,49156,,feeling like a mermaid ð #hairflip #neverready #formal #wedding #gown #dresses #mermaid â¦,feeling like mermaid hairflip neverready formal wedding gown dresses mermaid,"[feeling, like, mermaid, hairflip, neverready, formal, wedding, gown, dresses, mermaid]","[feeling, like, mermaid, hairflip, neverready, formal, wedding, gown, dresses, mermaid]","[feel, like, mermaid, hairflip, neverreadi, formal, wed, gown, dress, mermaid]","[feeling, like, mermaid, hairflip, neverready, formal, wedding, gown, dress, mermaid]"
49156,49157,,"#hillary #campaigned today in #ohio((omg)) &amp; used words like ""assets&amp;liability"" never once did #clinton say thee(word) #radicalization",hillary campaigned today in ohio omg amp used words like assets amp liability never once did clinton say thee word radicalization,"[hillary, campaigned, today, in, ohio, omg, amp, used, words, like, assets, amp, liability, never, once, did, clinton, say, thee, word, radicaliza...","[hillary, campaigned, today, ohio, omg, amp, used, words, like, assets, amp, liability, never, clinton, say, thee, word, radicalization]","[hillari, campaign, today, ohio, omg, amp, use, word, like, asset, amp, liabil, never, clinton, say, thee, word, radic]","[hillary, campaigned, today, ohio, omg, amp, used, word, like, asset, amp, liability, never, clinton, say, thee, word, radicalization]"
49157,49158,,"happy, at work conference: right mindset leads to culture-of-development organizations #work #mindset",happy at work conference right mindset leads to culture of development organizations work mindset,"[happy, at, work, conference, right, mindset, leads, to, culture, of, development, organizations, work, mindset]","[happy, work, conference, right, mindset, leads, culture, development, organizations, work, mindset]","[happi, work, confer, right, mindset, lead, cultur, develop, organ, work, mindset]","[happy, work, conference, right, mindset, lead, culture, development, organization, work, mindset]"


14. Сохраним результат предобработки в pickle-файл.

In [30]:
%%time
df.to_pickle(PROCESSED_PATH)

Wall time: 9 s


In [31]:
# Проверка
%time df_from_pickle = pd.read_pickle(PROCESSED_PATH)

Wall time: 1.13 s


In [32]:
df_from_pickle

Unnamed: 0,id,label,tweet,text,tweet_token,tweet_token_filtered,tweet_stemmed,tweet_lemmatized
0,1,0.0,@user when a father is dysfunctional and is so selfish he drags his kids into his dysfunction. #run,when father is dysfunctional and is so selfish he drags his kids into his dysfunction run,"[when, father, is, dysfunctional, and, is, so, selfish, he, drags, his, kids, into, his, dysfunction, run]","[father, dysfunctional, selfish, drags, kids, dysfunction, run]","[father, dysfunct, selfish, drag, kid, dysfunct, run]","[father, dysfunctional, selfish, drag, kid, dysfunction, run]"
1,2,0.0,@user @user thanks for #lyft credit i can't use cause they don't offer wheelchair vans in pdx. #disapointed #getthanked,thanks for lyft credit cannot use cause they do not offer wheelchair vans in pdx disapointed getthanked,"[thanks, for, lyft, credit, can, not, use, cause, they, do, not, offer, wheelchair, vans, in, pdx, disapointed, getthanked]","[thanks, lyft, credit, use, cause, offer, wheelchair, vans, pdx, disapointed, getthanked]","[thank, lyft, credit, use, caus, offer, wheelchair, van, pdx, disapoint, getthank]","[thanks, lyft, credit, use, cause, offer, wheelchair, van, pdx, disapointed, getthanked]"
2,3,0.0,bihday your majesty,bihday your majesty,"[bihday, your, majesty]","[bihday, majesty]","[bihday, majesti]","[bihday, majesty]"
3,4,0.0,#model i love u take with u all the time in urð±!!! ððððð¦ð¦ð¦,model love you take with you all the time in ur,"[model, love, you, take, with, you, all, the, time, in, ur]","[model, love, take, time, ur]","[model, love, take, time, ur]","[model, love, take, time, ur]"
4,5,0.0,factsguide: society now #motivation,factsguide society now motivation,"[factsguide, society, now, motivation]","[factsguide, society, motivation]","[factsguid, societi, motiv]","[factsguide, society, motivation]"
...,...,...,...,...,...,...,...,...
49154,49155,,thought factory: left-right polarisation! #trump #uselections2016 #leadership #politics #brexit #blm &gt;3,thought factory left right polarisation trump uselections leadership politics brexit blm gt,"[thought, factory, left, right, polarisation, trump, uselections, leadership, politics, brexit, blm, gt]","[thought, factory, left, right, polarisation, trump, uselections, leadership, politics, brexit, blm, gt]","[thought, factori, left, right, polaris, trump, uselect, leadership, polit, brexit, blm, gt]","[thought, factory, left, right, polarisation, trump, uselections, leadership, politics, brexit, blm, gt]"
49155,49156,,feeling like a mermaid ð #hairflip #neverready #formal #wedding #gown #dresses #mermaid â¦,feeling like mermaid hairflip neverready formal wedding gown dresses mermaid,"[feeling, like, mermaid, hairflip, neverready, formal, wedding, gown, dresses, mermaid]","[feeling, like, mermaid, hairflip, neverready, formal, wedding, gown, dresses, mermaid]","[feel, like, mermaid, hairflip, neverreadi, formal, wed, gown, dress, mermaid]","[feeling, like, mermaid, hairflip, neverready, formal, wedding, gown, dress, mermaid]"
49156,49157,,"#hillary #campaigned today in #ohio((omg)) &amp; used words like ""assets&amp;liability"" never once did #clinton say thee(word) #radicalization",hillary campaigned today in ohio omg amp used words like assets amp liability never once did clinton say thee word radicalization,"[hillary, campaigned, today, in, ohio, omg, amp, used, words, like, assets, amp, liability, never, once, did, clinton, say, thee, word, radicaliza...","[hillary, campaigned, today, ohio, omg, amp, used, words, like, assets, amp, liability, never, clinton, say, thee, word, radicalization]","[hillari, campaign, today, ohio, omg, amp, use, word, like, asset, amp, liabil, never, clinton, say, thee, word, radic]","[hillary, campaigned, today, ohio, omg, amp, used, word, like, asset, amp, liability, never, clinton, say, thee, word, radicalization]"
49157,49158,,"happy, at work conference: right mindset leads to culture-of-development organizations #work #mindset",happy at work conference right mindset leads to culture of development organizations work mindset,"[happy, at, work, conference, right, mindset, leads, to, culture, of, development, organizations, work, mindset]","[happy, work, conference, right, mindset, leads, culture, development, organizations, work, mindset]","[happi, work, confer, right, mindset, lead, cultur, develop, organ, work, mindset]","[happy, work, conference, right, mindset, lead, culture, development, organization, work, mindset]"
