> * Text analysis is an essential part of social network analysis.
> * Imagine you have a social media dataset with tweets. You want to analyze text to find nodes and edges as well as what they are talking about.
> * You can use text analysis to find the most frequent words, hashtags, and mentions.

In [3]:
import pandas as pd
from nltk.tokenize import word_tokenize
from nltk.tokenize import TweetTokenizer
from nltk.corpus import stopwords
from nltk.util import ngrams

ModuleNotFoundError: No module named 'nltk'

> * We are going to use the `nltk` library to analyze text.
> * `from` is telling Python to import a library called `nltk`.
> * `import` is used to import a specific module from a library.

In [2]:
data=pd.read_csv('../week3/Political-media-DFE.csv', encoding='latin1')

> * Let's import the data from week 3.

In [3]:
data.columns

Index(['_unit_id', '_golden', '_unit_state', '_trusted_judgments',
       '_last_judgment_at', 'audience', 'audience:confidence', 'bias',
       'bias:confidence', 'message', 'message:confidence', 'orig__golden',
       'audience_gold', 'bias_gold', 'bioid', 'embed', 'id', 'label',
       'message_gold', 'source', 'text'],
      dtype='object')

In [4]:
data.dtypes

_unit_id                 int64
_golden                   bool
_unit_state             object
_trusted_judgments       int64
_last_judgment_at       object
audience                object
audience:confidence    float64
bias                    object
bias:confidence        float64
message                 object
message:confidence     float64
orig__golden           float64
audience_gold          float64
bias_gold              float64
bioid                   object
embed                   object
id                      object
label                   object
message_gold           float64
source                  object
text                    object
dtype: object

> * `.dtypes` is used to check the data type of each column.

> * Let's subset the data to have who posted, where they posted (social media platform), and what they posted.

In [5]:
content=data[['label', 'source', 'text']]

In [6]:
content

Unnamed: 0,label,source,text
0,From: Trey Radel (Representative from Florida),twitter,RT @nowthisnews: Rep. Trey Radel (R- #FL) slam...
1,From: Mitch McConnell (Senator from Kentucky),twitter,VIDEO - #Obamacare: Full of Higher Costs and ...
2,From: Kurt Schrader (Representative from Oregon),twitter,Please join me today in remembering our fallen...
3,From: Michael Crapo (Senator from Idaho),twitter,RT @SenatorLeahy: 1st step toward Senate debat...
4,From: Mark Udall (Senator from Colorado),twitter,.@amazon delivery #drones show need to update ...
...,...,...,...
4995,From: Ted Yoho (Representative from Florida),facebook,I applaud Governor PerryÛªs recent decision t...
4996,From: Ted Yoho (Representative from Florida),facebook,"Today, I voted in favor of H.R. 5016 - Financi..."
4997,From: Ted Yoho (Representative from Florida),facebook,(Taken from posted WOKV interview) Congressm...
4998,From: Ted Yoho (Representative from Florida),facebook,Join me next week for a town hall in Ocala! I'...


> * We have contents from both twitter and facebook and have a text column for the contents.

In [7]:
content['source'].unique()

array(['twitter', 'facebook'], dtype=object)

> * Let's print out some of the contents.

In [8]:
content['text'].iloc[0]

'RT @nowthisnews: Rep. Trey Radel (R- #FL) slams #Obamacare. #politics https://t.co/zvywMG8yIH'

In [9]:
content['text'].iloc[1]

'VIDEO - #Obamacare:  Full of Higher Costs and Broken Promises: http://t.co/dn3vzqIrWF'

In [10]:
content['text'].iloc[2]

'Please join me today in remembering our fallen heroes and honoring the men and women currently in military service for their sacrifices.'

> * Q. What additional information do you see in the contents?

```YOUR ANSWER HERE```

> * Yes, we see hashtags, retweets, mentions, and URLs.
> * In the third row, we see an example of a content only with text.

> * In text analysis, it is important to make the text clean (remove unnecessary words, symbols, etc.) and to make the text uniform (lowercase, no punctuation, etc.).

> * Lowercase all the text.

In [11]:
content['text-lower']=content['text'].str.lower()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  content['text-lower']=content['text'].str.lower()


In [12]:
content['text-lower'].iloc[2]

'please join me today in remembering our fallen heroes and honoring the men and women currently in military service for their sacrifices.'

> * We can seperate the entire contents into tokens (words, hashtags, mentions, etc.).
> * Seperating the contents into tokens is called tokenization.
> * We can use the `word_tokenize` function from the `nltk` library to tokenize the contents.
> * There is also a `TweetTokenizer` function in the `nltk` library that is specifically for tweets.

> * `.apply` is used to apply a function to a column. You don't have to use a for loop to apply a function to each row.

In [13]:
content['tokenized_unigrams']=content['text-lower'].apply(word_tokenize)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  content['tokenized_unigrams']=content['text-lower'].apply(word_tokenize)


In [14]:
iterated_unigrams=[]
for idx, row in content.iterrows():
    iterated_unigrams.append(word_tokenize(row['text-lower']))

In [15]:
content['iterated_unigrams']=iterated_unigrams 

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  content['iterated_unigrams']=iterated_unigrams


> * The results of iterating through each row and applying the `word_tokenize` function is a list of lists are identical.

In [16]:
content.loc[0,'iterated_unigrams'] == content.loc[0,'tokenized_unigrams']

True

> * One token is called a unigram.
> * We can try to find bigrams (two tokens) and trigrams (three tokens) as well. 
> * All of these tokens are called n-grams.


In [17]:
#bigrams lambda function
content['tokenized_bigrams']=content['text-lower'].apply(TweetTokenizer().tokenize).apply(lambda x: list(ngrams(x, 2)))
#It first tokenizes the text using TweetTokenizer, then creates biagrams using ngrams function

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  content['tokenized_bigrams']=content['text-lower'].apply(TweetTokenizer().tokenize).apply(lambda x: list(ngrams(x, 2)))


In [18]:
#bigram iteration
iterated_bigrams=[]
for idx, row in content.iterrows():
    iterated_bigrams.append(list(ngrams(row['tokenized_unigrams'], 2)))

In [19]:
content['iterated_bigrams']=iterated_bigrams

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  content['iterated_bigrams']=iterated_bigrams


In [20]:
from collections import Counter

In [21]:
content['tokenized_bigrams']

0       [(rt, @nowthisnews), (@nowthisnews, :), (:, re...
1       [(video, -), (-, #obamacare), (#obamacare, :),...
2       [(please, join), (join, me), (me, today), (tod...
3       [(rt, @senatorleahy), (@senatorleahy, :), (:, ...
4       [(., @amazon), (@amazon, delivery), (delivery,...
                              ...                        
4995    [(i, applaud), (applaud, governor), (governor,...
4996    [(today, ,), (,, i), (i, voted), (voted, in), ...
4997    [((, taken), (taken, from), (from, posted), (p...
4998    [(join, me), (me, next), (next, week), (week, ...
4999    [(foreign, affairs), (affairs, committee), (co...
Name: tokenized_bigrams, Length: 5000, dtype: object

In [22]:
content['tokenized_unigrams']

0       [rt, @, nowthisnews, :, rep., trey, radel, (, ...
1       [video, -, #, obamacare, :, full, of, higher, ...
2       [please, join, me, today, in, remembering, our...
3       [rt, @, senatorleahy, :, 1st, step, toward, se...
4       [., @, amazon, delivery, #, drones, show, need...
                              ...                        
4995    [i, applaud, governor, perryûªs, recent, deci...
4996    [today, ,, i, voted, in, favor, of, h.r, ., 50...
4997    [(, taken, from, posted, wokv, interview, ), c...
4998    [join, me, next, week, for, a, town, hall, in,...
4999    [foreign, affairs, committee, hearing, on, syr...
Name: tokenized_unigrams, Length: 5000, dtype: object

In [23]:
Counter([item for row in content['tokenized_unigrams'] for item in row]).most_common(10)

[('the', 7668),
 ('.', 6855),
 ('to', 5913),
 (',', 4927),
 (':', 3763),
 ('and', 3678),
 ('of', 3371),
 ('in', 2892),
 ('#', 2518),
 ('a', 2508)]

In [24]:
Counter([item for row in content['tokenized_bigrams'] for item in row]).most_common(10)

[(('of', 'the'), 766),
 (('\x89', 'ûªs'), 684),
 (('in', 'the'), 547),
 ((',', 'and'), 547),
 (('on', 'the'), 416),
 (('to', 'the'), 398),
 (('.', 'i'), 328),
 (('for', 'the'), 307),
 (('\x89', 'û'), 305),
 (('at', 'the'), 305)]

> * However, we see frequent words include function words (e.g., the, and, is, etc.) and punctuation.
> * We can remove function words and punctuation to find the content words (e.g., nouns, verbs, adjectives, etc.).

In [25]:
content.head(2)

Unnamed: 0,label,source,text,text-lower,tokenized_unigrams,iterated_unigrams,tokenized_bigrams,iterated_bigrams
0,From: Trey Radel (Representative from Florida),twitter,RT @nowthisnews: Rep. Trey Radel (R- #FL) slam...,rt @nowthisnews: rep. trey radel (r- #fl) slam...,"[rt, @, nowthisnews, :, rep., trey, radel, (, ...","[rt, @, nowthisnews, :, rep., trey, radel, (, ...","[(rt, @nowthisnews), (@nowthisnews, :), (:, re...","[(rt, @), (@, nowthisnews), (nowthisnews, :), ..."
1,From: Mitch McConnell (Senator from Kentucky),twitter,VIDEO - #Obamacare: Full of Higher Costs and ...,video - #obamacare: full of higher costs and ...,"[video, -, #, obamacare, :, full, of, higher, ...","[video, -, #, obamacare, :, full, of, higher, ...","[(video, -), (-, #obamacare), (#obamacare, :),...","[(video, -), (-, #), (#, obamacare), (obamacar..."


In [26]:
stop=stopwords.words('english')

In [27]:
content['stopword']=content['text-lower'].apply(lambda x: ' '.join([word for word in x.split() if word not in stop]))
    

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  content['stopword']=content['text-lower'].apply(lambda x: ' '.join([word for word in x.split() if word not in stop]))


In [28]:
content['stopword']

0       rt @nowthisnews: rep. trey radel (r- #fl) slam...
1       video - #obamacare: full higher costs broken p...
2       please join today remembering fallen heroes ho...
3       rt @senatorleahy: 1st step toward senate debat...
4       .@amazon delivery #drones show need update law...
                              ...                        
4995    applaud governor perryûªs recent decision dep...
4996    today, voted favor h.r. 5016 - financial servi...
4997    (taken posted wokv interview) congressman yoho...
4998    join next week town hall ocala! i'll answer qu...
4999    foreign affairs committee hearing syria. remai...
Name: stopword, Length: 5000, dtype: object

In [29]:
content['text-lower']

0       rt @nowthisnews: rep. trey radel (r- #fl) slam...
1       video - #obamacare:  full of higher costs and ...
2       please join me today in remembering our fallen...
3       rt @senatorleahy: 1st step toward senate debat...
4       .@amazon delivery #drones show need to update ...
                              ...                        
4995    i applaud governor perryûªs recent decision t...
4996    today, i voted in favor of h.r. 5016 - financi...
4997    (taken from posted wokv interview)   congressm...
4998    join me next week for a town hall in ocala! i'...
4999    foreign affairs committee hearing on syria. i ...
Name: text-lower, Length: 5000, dtype: object

> * We'll get unigrams for the text after we remove function words.

In [30]:
content['stop_tokenized_unigrams']=content['stopword'].apply(word_tokenize)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  content['stop_tokenized_unigrams']=content['stopword'].apply(word_tokenize)


> * We'll get bigrams for the text after we remove function words.

In [31]:
content['stop_tokenized_bigrams']=content['stopword'].apply(TweetTokenizer().tokenize).apply(lambda x: list(ngrams(x, 2)))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  content['stop_tokenized_bigrams']=content['stopword'].apply(TweetTokenizer().tokenize).apply(lambda x: list(ngrams(x, 2)))


> * Here are the most frequent unigrams and bigrams after removing function words.

In [32]:
Counter([item for row in content['stop_tokenized_unigrams'] for item in row]).most_common(10)

[('.', 6861),
 (',', 4927),
 (':', 3763),
 ('#', 2518),
 ('http', 2162),
 ('@', 1877),
 ('!', 995),
 ("'s", 812),
 ('today', 784),
 ('&', 589)]

In [33]:
Counter([item for row in content['stop_tokenized_bigrams'] for item in row]).most_common(10)

[(('\x89', 'ûªs'), 684),
 (('\x89', 'û'), 305),
 (('û', '\x9d'), 249),
 (('.', '\x89'), 198),
 (('s', '.'), 193),
 (('u', '.'), 192),
 (('.', 's'), 190),
 (('here', ':'), 183),
 (('.', '"'), 154),
 (('\x89', 'ûò'), 149)]

> * We still see irrelevant punctuations. Let's get rid of them.


In [34]:
content['punct_tokenized_unigrams']=content['stop_tokenized_unigrams'].apply(lambda x: [word for word in x if word.isalnum()])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  content['punct_tokenized_unigrams']=content['stop_tokenized_unigrams'].apply(lambda x: [word for word in x if word.isalnum()])


In [35]:
punct_iterated_bigrams=[]
for idx, row in content.iterrows():
    punct_iterated_bigrams.append(list(ngrams(row['punct_tokenized_unigrams'], 2)))

In [36]:
content['punct_tokenized_bigrams']=punct_iterated_bigrams

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  content['punct_tokenized_bigrams']=punct_iterated_bigrams


> * Let's count the most frequent unigrams and bigrams after removing punctuation.

In [37]:
Counter([item for row in content['punct_tokenized_unigrams'] for item in row]).most_common(10)

[('http', 2162),
 ('today', 784),
 ('house', 435),
 ('amp', 431),
 ('great', 396),
 ('new', 361),
 ('bill', 324),
 ('president', 317),
 ('act', 294),
 ('congress', 289)]

In [38]:
Counter([item for row in content['punct_tokenized_bigrams'] for item in row]).most_common(10)

[(('here', 'http'), 159),
 (('health', 'care'), 97),
 (('president', 'obama'), 83),
 (('united', 'states'), 75),
 (('town', 'hall'), 67),
 (('high', 'school'), 66),
 (('immigration', 'reform'), 61),
 (('small', 'business'), 47),
 (('last', 'night'), 46),
 (('make', 'sure'), 46)]

> * But we want to do additional cleaning. 
> * When counting the most frequent words, past and present tense of the same word are counted as different words.
> * For example, "run" and "running" are counted as different words.
> * We can use lemmatization to convert words to their base form.

In [39]:
import nltk
from nltk.stem import WordNetLemmatizer
lemmatizer=WordNetLemmatizer() #Initialize lemmatizer
from nltk.corpus import wordnet

In [40]:
content['lemma']=content['punct_tokenized_unigrams'].apply(lambda x: [lemmatizer.lemmatize(word) for word in x])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  content['lemma']=content['punct_tokenized_unigrams'].apply(lambda x: [lemmatizer.lemmatize(word) for word in x])


In [41]:
content['lemma_str']=content['lemma'].apply(lambda x: ' '.join(x))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  content['lemma_str']=content['lemma'].apply(lambda x: ' '.join(x))


> * Let's compare the results of lemmatization and without lemmatization.

In [42]:
content.loc[5, 'lemma_str']

'called usdotfra release info inspection casseltonderailment review quality rail'

In [43]:
content.loc[5, 'text-lower']

'called on the @usdotfra to release info about inspections before the #casseltonderailment to review quality of rails. (1/2)'

> * Q. Do you see any different results? What tokens have changed after lemmatization?

`YOUR ANSWER`

In [44]:
lemmatizer.lemmatize('better')

'better'

In [45]:
lemmatizer.lemmatize('better', pos=wordnet.ADJ)

'good'

In [46]:
lemmatizer.lemmatize('cars')

'car'

In [47]:
lemmatizer.lemmatize('cars', pos=wordnet.VERB)

'cars'

> * Interestingly enough, NLTK's WordNetLemmatizer is not perfect.
> * By default, it only lemmatize nouns.
> * Therefore, we need to specify the part of speech (POS) for each token.

In [48]:
def nltk_tag_to_wordnet_tag(nltk_tag):
    if nltk_tag.startswith('J'): #ADJECTIVE
        return wordnet.ADJ
    elif nltk_tag.startswith('V'): #VERN
        return wordnet.VERB
    elif nltk_tag.startswith('N'): #NOUN        
        return wordnet.NOUN
    elif nltk_tag.startswith('R'): #ADVERB
        return wordnet.ADV
    else:          
        return None

In [49]:
def lemmatize_sentence(sentence):
    # Tokenize the sentence and find the POS tag for each token
    nltk_tagged = nltk.pos_tag(nltk.word_tokenize(sentence))  
    # Tuple of (token, wordnet_tag)
    wordnet_tagged = map(lambda x: (x[0], nltk_tag_to_wordnet_tag(x[1])), nltk_tagged) 
    lemmatized_sentence = []
    for word, tag in wordnet_tagged:
        if tag is None:
            # If no tag was found, then use the word as is
            lemmatized_sentence.append(word)
        else:        
            # Else use the tag to lemmatize the word
            lemmatized_sentence.append(lemmatizer.lemmatize(word, tag))
    return " ".join(lemmatized_sentence)

In [50]:
content['lemmatizer_str']=content['lemma'].apply(lambda x: lemmatize_sentence(' '.join(x)))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  content['lemmatizer_str']=content['lemma'].apply(lambda x: lemmatize_sentence(' '.join(x)))


In [51]:
content.loc[5, 'lemma_str']

'called usdotfra release info inspection casseltonderailment review quality rail'

In [52]:
content.loc[5, 'lemmatizer_str']

'call usdotfra release info inspection casseltonderailment review quality rail'

> * Let's extract hashtags and mentions from the contents.
> * We have to use regular expressions to extract hashtags and mentions.

In [53]:
import re

> * There are four useful regex functions in Python: `findall`, `search`, `split`, and `sub`.

> * `findall` is used to find all matches of a pattern in a string.
> * `search` is used to find the first match of a pattern in a string.
> * `split` is used to split a string by a pattern.
> * `sub` is used to replace a pattern in a string.

> * In this class, we are going to use `findall` to extract hashtags and mentions.

> * The result of `findall` is a list of strings.

> * Basic regex patterns:
> * `.` matches any character except a newline.
> * `*` matches 0 or more repetitions of the preceding regex pattern.
> * `+` matches 1 or more repetitions of the preceding regex pattern.
> * `?` matches 0 or 1 repetition of the preceding regex pattern.
> * `^` matches the start of a string.
> * `$` matches the end of a string.
> * `[]` matches any one of the characters inside the square brackets.
> * `\` is used to escape special characters.
> * `|` is used to match either the regex pattern on the left or the right.

> * `[a-z]` matches any lowercase letter.
> * `[A-Z]` matches any uppercase letter.
> * `[0-9]` matches any digit.
> * `\d` matches any digit.
> * `\D` matches any non-digit.
> * `\w` matches any word character (alphanumeric and underscore).
> * `\W` matches any non-word character.
> * `\s` matches any whitespace character.
> * `\S` matches any non-whitespace character.

> * `[a-zA-Z]` matches any alphabet character.
> * `[a-zA-Z0-9]` matches any alphanumeric character.


<img src="../week4/Regex-Cheat-Sheet.png" width=500px height=800px />

In [54]:
pattern = re.compile(r'\d+') # \d+ is the pattern to search for, it means any digit 0-9 and the + means one or more times
sample1='I have 2 dogs and 3 cats'
result=re.findall(pattern, sample1)
print(result)

['2', '3']


In [55]:
pattern = re.compile(r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}') 
#Matches one or more alphanumeric characters, dots, underscores, percent signs, plus signs, or hyphens. 
#The @ symbol is also matched.
sample2='My primary email is is324socialnetworkanalysis@illinois.edu. Please contact me! If I am not replying, you can contact me at jaihyunpark@illinois.edu'
result=re.findall(pattern, sample2)
print(result)

['is324socialnetworkanalysis@illinois.edu', 'jaihyunpark@illinois.edu']


In [56]:
pattern = re.compile(r'https?://\S+')
#Matches http or https, followed by ://, followed by any non-whitespace characters.
sample3='Look at what is happening at Washington! https://is324.com'
result=re.findall(pattern, sample3)
print(result)

['https://is324.com']


In [57]:
pattern = re.compile(r'#[a-zA-Z0-9]+')
#Matches hashtags that start with #, followed by one or more alphanumeric characters.
sample4='I am so excited for the class #IS324'
result=re.findall(pattern, sample4)
print(result)

['#IS324']


In [58]:
pattern = re.compile(r'@[a-zA-Z0-9]+')
#Matches mentions that start with @, followed by one or more alphanumeric characters.
sample5='@jaihyunpark is the instructor for #IS324'
result=re.findall(pattern, sample5)
print(result)

['@jaihyunpark']


In [59]:
pattern = re.compile(r'(rt\s+@[a-zA-Z0-9]+ | @[a-zA-Z0-9]+)')

sample6='rt @jaihyunpark: I am so excited for the class #IS324 @socialmedia'
result=re.findall(pattern, sample6)
print(result)

[' @jaihyunpark', ' @socialmedia']


> * Let's extract hashtags and mentions from the content DataFrame.

In [60]:
content['text-lower']

0       rt @nowthisnews: rep. trey radel (r- #fl) slam...
1       video - #obamacare:  full of higher costs and ...
2       please join me today in remembering our fallen...
3       rt @senatorleahy: 1st step toward senate debat...
4       .@amazon delivery #drones show need to update ...
                              ...                        
4995    i applaud governor perryûªs recent decision t...
4996    today, i voted in favor of h.r. 5016 - financi...
4997    (taken from posted wokv interview)   congressm...
4998    join me next week for a town hall in ocala! i'...
4999    foreign affairs committee hearing on syria. i ...
Name: text-lower, Length: 5000, dtype: object