# Markov Chain

## Trump Twitter Generator
Using old Donald Trump tweets to genearte Text with Markovify

### Loading the tweets

In [1]:
import pandas as pd

df = pd.read_csv('trumptweets.csv')
df.head()

Unnamed: 0,id,link,content,date,retweets,favorites,mentions,hashtags,geo
0,1698308935,https://twitter.com/realDonaldTrump/status/169...,Be sure to tune in and watch Donald Trump on L...,2009-05-04 20:54:25,500,868,,,
1,1701461182,https://twitter.com/realDonaldTrump/status/170...,Donald Trump will be appearing on The View tom...,2009-05-05 03:00:10,33,273,,,
2,1737479987,https://twitter.com/realDonaldTrump/status/173...,Donald Trump reads Top Ten Financial Tips on L...,2009-05-08 15:38:08,12,18,,,
3,1741160716,https://twitter.com/realDonaldTrump/status/174...,New Blog Post: Celebrity Apprentice Finale and...,2009-05-08 22:40:15,11,24,,,
4,1773561338,https://twitter.com/realDonaldTrump/status/177...,"""My persona will never be that of a wallflower...",2009-05-12 16:07:28,1399,1965,,,


In [2]:
# We only need the text from the tweet
tweets = df[['content']].copy() #df[df.retweets < 1.0]
tweets.head()

Unnamed: 0,content
0,Be sure to tune in and watch Donald Trump on L...
1,Donald Trump will be appearing on The View tom...
2,Donald Trump reads Top Ten Financial Tips on L...
3,New Blog Post: Celebrity Apprentice Finale and...
4,"""My persona will never be that of a wallflower..."


In [3]:
# Rename the column name
tweets = tweets.rename(columns={'content': 'text'})

In [4]:
# Make sure to make the text a string
tweets['text'] = tweets.text.astype(str)
tweets.head()

Unnamed: 0,text
0,Be sure to tune in and watch Donald Trump on L...
1,Donald Trump will be appearing on The View tom...
2,Donald Trump reads Top Ten Financial Tips on L...
3,New Blog Post: Celebrity Apprentice Finale and...
4,"""My persona will never be that of a wallflower..."


### Removing all links and special characters from the text

In [5]:
# Remove all URLs.
url_regex = r"""https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|www\.[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9]+\.[^\s]{2,}|www\.[a-zA-Z0-9]+\.[^\s]{2,}"""
tweets['text'] = tweets.text.str.replace(url_regex, '')

# Fix `&amp;`.
tweets['text'] = tweets.text.str.replace(r'&amp;', '&')

# Replace inverted quotes.
tweets['text'] = tweets.text.str.replace('“', '"')
tweets['text'] = tweets.text.str.replace('”', '"')

# Replace strange hyphens.
tweets['text'] = tweets.text.str.replace('–', '-')
tweets['text'] = tweets.text.str.replace('—', '-')

# Replace strange apostrophes.
tweets['text'] = tweets.text.str.replace('’', "'")
tweets['text'] = tweets.text.str.replace('‘', "'")
tweets['text'] = tweets.text.str.replace('\x92', "'")

# Replace latin space.
tweets['text'] = tweets.text.str.replace('\xa0', ' ')
# Zero width space.
tweets['text'] = tweets.text.str.replace('\u200b', ' ')

# l2r and r2l marks.
tweets['text'] = tweets.text.str.replace('\u200e', '')
tweets['text'] = tweets.text.str.replace('\u200f', '')

# Fix bad unicode.
tweets['text'] = tweets.text.str.replace('\U0010fc00', '')

  tweets['text'] = tweets.text.str.replace(url_regex, '')


In [6]:
# Look at one tweet to check if it is correct
print(tweets['text'].loc[22])

"If you don't have problems, you're pretending or you don't run your own business." -Donald J. Trump 


In [7]:
# Join all lines for the model.
tweets_text = '\n'.join(tweets.text.values)

In [16]:
# Make a Text model using markovify.
import markovify
trump_model_2 = markovify.Text(tweets_text, state_size=2)

In [17]:
# print 10 tweets from the model
for i in range(10):
    print('{}: {}'.format(i, trump_model_2.make_short_sentence(100)))

0: … Real @ FoxNews HAPPY EASTER!
1: America is the truth.
2: Good jobs are being led to food stamps went to hell!
3: I wonder if it were up to the wonderful and powerful words on @ FoxNews Just another Witch Hunt.
4: The voters in Mississippi - GREAT EPISODE!
5: Our thoughts & prayers go out to the United States of America - & the # USVI.
6: What separates the winners of the Arab League paying for them.
7: Governor @ DougDucey as the Republican Party.
8: Can you believe that @ ralphreed's Faith and Freedom chapters are at new historic lows.
9: Claims for unemployment are at crazy levels--fire Obama!


## Test with diffent number of words the probability of a next word depends on

In [13]:
trump_model_3 = markovify.Text(tweets_text, state_size=3)

In [14]:
# print 10 tweets from the model
for i in range(10):
    print('{}: {}'.format(i, trump_model_3.make_short_sentence(100)))

0: When a car is sent to Crooked Pols.
1: # LESM # Trump2016pic.twitter.com/SuH1jfOQR4 . @ heytana, great job - so far, no contest!
2: Unfortunately, Washington is incapable of working on either.
3: Just returned from New Hampshire at 7:00 P.M. Eastern, Montoursville, Pennsylvania!
4: It is by far the best & most beautiful place to get married.
5: I am pleased to announce my nomination of Judge Brett Kavanaugh.
6: Without the con it's over We are going to seek to Impeach me over NOTHING.
7: @ FoxNews should be ashamed of himself.
8: Obama's war on coal, and will continue to stand with Presidents for # OneAmericaAppeal.
9: You have my vote.We need more people like you in this tragic hour, and we will stop these also.


Model is giving good results


# Combine Tweets from Hillary Clinton and Donald Trump

## Load new dataset

In [18]:
df = pd.read_csv('clinton_trump_tweets.csv')
df.head()

Unnamed: 0,id,handle,text,is_retweet,original_author,time,in_reply_to_screen_name,in_reply_to_status_id,in_reply_to_user_id,is_quote_status,...,place_type,place_country_code,place_country,place_contained_within,place_attributes,place_bounding_box,source_url,truncated,entities,extended_entities
0,780925634159796224,HillaryClinton,The question in this election: Who can put the...,False,,2016-09-28T00:22:34,,,,False,...,,,,,,,https://studio.twitter.com,False,{'media': [{'display_url': 'pic.twitter.com/Xr...,{'media': [{'display_url': 'pic.twitter.com/Xr...
1,780916180899037184,HillaryClinton,"Last night, Donald Trump said not paying taxes...",True,timkaine,2016-09-27T23:45:00,,,,False,...,,,,,,,http://twitter.com,False,{'media': [{'display_url': 'pic.twitter.com/t0...,{'media': [{'display_url': 'pic.twitter.com/t0...
2,780911564857761793,HillaryClinton,Couldn't be more proud of @HillaryClinton. Her...,True,POTUS,2016-09-27T23:26:40,,,,False,...,,,,,,,https://about.twitter.com/products/tweetdeck,False,"{'user_mentions': [{'id_str': '1536791610', 'n...",
3,780907038650068994,HillaryClinton,"If we stand together, there's nothing we can't...",False,,2016-09-27T23:08:41,,,,False,...,,,,,,,https://studio.twitter.com,False,{'media': [{'display_url': 'pic.twitter.com/Q3...,{'media': [{'display_url': 'pic.twitter.com/Q3...
4,780897419462602752,HillaryClinton,Both candidates were asked about how they'd co...,False,,2016-09-27T22:30:27,,,,False,...,,,,,,,https://about.twitter.com/products/tweetdeck,False,"{'user_mentions': [], 'symbols': [], 'urls': [...",


In [21]:
df['handle'].value_counts()

HillaryClinton     3226
realDonaldTrump    3218
Name: handle, dtype: int64

Dataset is good balanced

In [22]:
# copy text to new dataframe
tweets = df[['text']].copy() #df[df.retweets < 1.0]
tweets.head()

Unnamed: 0,text
0,The question in this election: Who can put the...
1,"Last night, Donald Trump said not paying taxes..."
2,Couldn't be more proud of @HillaryClinton. Her...
3,"If we stand together, there's nothing we can't..."
4,Both candidates were asked about how they'd co...


In [24]:
# Make sure to make the text a string
tweets['text'] = tweets.text.astype(str)
tweets.head()

Unnamed: 0,text
0,The question in this election: Who can put the...
1,"Last night, Donald Trump said not paying taxes..."
2,Couldn't be more proud of @HillaryClinton. Her...
3,"If we stand together, there's nothing we can't..."
4,Both candidates were asked about how they'd co...


### Removing all links and special characters from the text

In [25]:
# Remove all URLs.
url_regex = r"""https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|www\.[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9]+\.[^\s]{2,}|www\.[a-zA-Z0-9]+\.[^\s]{2,}"""
tweets['text'] = tweets.text.str.replace(url_regex, '')

# Fix `&amp;`.
tweets['text'] = tweets.text.str.replace(r'&amp;', '&')

# Replace inverted quotes.
tweets['text'] = tweets.text.str.replace('“', '"')
tweets['text'] = tweets.text.str.replace('”', '"')

# Replace strange hyphens.
tweets['text'] = tweets.text.str.replace('–', '-')
tweets['text'] = tweets.text.str.replace('—', '-')

# Replace strange apostrophes.
tweets['text'] = tweets.text.str.replace('’', "'")
tweets['text'] = tweets.text.str.replace('‘', "'")
tweets['text'] = tweets.text.str.replace('\x92', "'")

# Replace latin space.
tweets['text'] = tweets.text.str.replace('\xa0', ' ')
# Zero width space.
tweets['text'] = tweets.text.str.replace('\u200b', ' ')

# l2r and r2l marks.
tweets['text'] = tweets.text.str.replace('\u200e', '')
tweets['text'] = tweets.text.str.replace('\u200f', '')

# Fix bad unicode.
tweets['text'] = tweets.text.str.replace('\U0010fc00', '')

  tweets['text'] = tweets.text.str.replace(url_regex, '')


In [26]:
# Join all lines for the model.
tweets_text = '\n'.join(tweets.text.values)

In [27]:
# Make a Text model using markovify.
import markovify
hill_trump_2 = markovify.Text(tweets_text, state_size=2)

In [33]:
for i in range(10):
    print('{}: {}'.format(i, hill_trump_2.make_short_sentence(100)))

0: I am now off to Iowa for an exclusive look at the top at the Civic Center.
1: #Debates2016 New national Bloomberg poll just hit 49% for Trump.
2: Here are 5 reasons he's unfit to be our next president!
3: We are going to do his failing @NYDailyNews will I at least be given some credit?
4: Too many young black men and women in the 400m freestyle.
5: Text WHERE to 47246 to tell @WellsFargo that Wall Street and the entire U.S. Senate.
6: I was a vehicle for @realDonaldTrump who was brief & gracious.
7: Bernie Sanders was very well in South Carolina needs strength as illegals and Syrians pour in.
8: For the first time that they will not be ignoring!
9: We will never win.


# Model with different state size

In [38]:
hill_trump_3 = markovify.Text(tweets_text, state_size=3)

In [47]:
for i in range(10):
    print('{}: {}'.format(i, hill_trump_3.make_short_sentence(100)))

0: Full speech transcript: Hillary Clinton only knows how to criticize, but not how to lead.
1: #WeMadeHistory It took 240 years but 2016 will be the destruction of civilization as we know it!
2: #VoteTrump #ImWithYou Only one candidate in this election who's ready to be Commander-in-Chief.
3: We will bring jobs back!
4: Excited to be back on the campaign trail by President Obama and Crooked Hillary.
5: #MakeAmericaGreatAgain #Trump2016 THANK YOU ARIZONA!
6: I will be on Face the Nation with John Dickerson on CBS this morning, was unable to respond.
7: Pocahontas is at it again.He could not have worked out better.
8: Glad to see that the Justice Department for refusing to admit it.
9: The Wall Street Journal/NBC Poll is a total phony and dishonest guy.
