# The Realest POTUS Tweets

This project will be split into 2 parts (given the limitations of webscraping Twitter, I will postpone that part as an addition later on). For now, I'll use the data found on https://www.kaggle.com/austinreese/trump-tweets:
* 1. *NLP* - using the dataset scrapped, I'll make a NLP model that will generate fake twits
* 2. *Basic UI* - I'll make a python file that abstract the function from the NLP into a usable, more fun manner. 

### Part 1: NLP model

In [1]:
import numpy as np
import pandas as pd 
import tensorflow

In [2]:
tweets = pd.read_csv("realdonaldtrump.csv")

In [3]:
tweets.head()

Unnamed: 0,id,link,content,date,retweets,favorites,mentions,hashtags
0,1698308935,https://twitter.com/realDonaldTrump/status/169...,Be sure to tune in and watch Donald Trump on L...,2009-05-04 13:54:25,510,917,,
1,1701461182,https://twitter.com/realDonaldTrump/status/170...,Donald Trump will be appearing on The View tom...,2009-05-04 20:00:10,34,267,,
2,1737479987,https://twitter.com/realDonaldTrump/status/173...,Donald Trump reads Top Ten Financial Tips on L...,2009-05-08 08:38:08,13,19,,
3,1741160716,https://twitter.com/realDonaldTrump/status/174...,New Blog Post: Celebrity Apprentice Finale and...,2009-05-08 15:40:15,11,26,,
4,1773561338,https://twitter.com/realDonaldTrump/status/177...,"""My persona will never be that of a wallflower...",2009-05-12 09:07:28,1375,1945,,


### Let's try taking tweet from 2015 onwards since that's more of the Trump we know (and 'love')

In [4]:
tweets.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 43352 entries, 0 to 43351
Data columns (total 8 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   id         43352 non-null  int64 
 1   link       43352 non-null  object
 2   content    43352 non-null  object
 3   date       43352 non-null  object
 4   retweets   43352 non-null  int64 
 5   favorites  43352 non-null  int64 
 6   mentions   20386 non-null  object
 7   hashtags   5583 non-null   object
dtypes: int64(3), object(5)
memory usage: 2.6+ MB


In [5]:
#need to change dates to the right format
tweets['date'] = pd.to_datetime(tweets['date'])

In [6]:
#voila
tweets.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 43352 entries, 0 to 43351
Data columns (total 8 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   id         43352 non-null  int64         
 1   link       43352 non-null  object        
 2   content    43352 non-null  object        
 3   date       43352 non-null  datetime64[ns]
 4   retweets   43352 non-null  int64         
 5   favorites  43352 non-null  int64         
 6   mentions   20386 non-null  object        
 7   hashtags   5583 non-null   object        
dtypes: datetime64[ns](1), int64(3), object(4)
memory usage: 2.6+ MB


In [7]:
president_twts = tweets[tweets['date'].dt.year.between(2015,2020)]

In [8]:
#renmoving hyperlinks in the tweets
president_twts['content'] = president_twts['content'].str.replace('http\S+|www.\S+|pic.twitter\S+', '', case=False)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [9]:
president_twts['content'].iloc[150]

'" @ JustSoldCom: @ realDonaldTrump SO HAPPY # CelebrityApprentice is back on! Thanks Donald for keeping it real!"'

In [10]:
i = 0 
for row in president_twts['content']:    
    if '.com' in row and i < 20:
        print(row)
        i += 1

" @ about_life: @ realDonaldTrump I Luv America 4 Trump 2016 - Let’s Make America Great Again! - iluvamerica4trump2016@gmail.com"
" @ about_life: @ realDonaldTrump Silent Warriors 4 Trump 2016 - Vets! Let’s Take America Back! - silentwarriors4trump2016@gmail.com"
" @ about_life: Politician Have Been Running(Ruining) America For Years; Lets Change That Now -Trump 2016 - iluvamerica4trump2016@gmail.com"
Veterans, please call 855- VETS- 352 or email address veterans@donaldtrump.com to share your stories about the need to reform the VA.
...completely avoided if you buy from a non-Tariffed Country, or you buy the product inside the USA (the best idea). That’s Zero Tariffs. Many Tariffed companies will be leaving China for Vietnam and other such countries in Asia. That’s why China wants to make a deal so badly!...
....companies to come to the USA and to get companies that have left us for other lands to COME BACK HOME. We stupidly lost 30% of our auto business to Mexico. If the Tariffs went 

In [11]:
president_twts['content']

19465         "@flicka__: @ realDonaldTrump for president"
19466    The Mar-a-Lago Club was amazing tonight. Every...
19467    " @ archangeljf12: ;, @ realDonaldTrump for Pr...
19468    "@TalentlessCook: @ realDonaldTrump You're onl...
19469    " @ yankeejayman: @ realDonaldTrump @flicka__ ...
                               ...                        
43347    Joe Biden was a TOTAL FAILURE in Government. H...
43348    Will be interviewed on @ seanhannity tonight a...
43349                                                     
43350                                                     
43351                                                     
Name: content, Length: 23887, dtype: object

In [12]:
text_data = '\n '.join([row for row in president_twts['content']])

In [13]:
text_data[:300]

'"@flicka__: @ realDonaldTrump for president"\n The Mar-a-Lago Club was amazing tonight. Everybody was there, the biggest and the hottest. Palm Beach is so lucky to have best club in world\n " @ archangeljf12: ;, @ realDonaldTrump for President of the United States! @ SenTedCruz Vice President ;A # Win'

In [14]:
"""The first model will be quite basic and will be character based"""
vocab=sorted(set(text_data))
char_to_ind={char:ind for ind,char in enumerate(vocab)} #c_t_i['H'] = 33
ind_to_char=np.array(vocab) #i_t_c[33] = 'H'
encoded_text=np.array([char_to_ind[c] for c in text_data])

In [15]:
vocab

['\n',
 ' ',
 '!',
 '"',
 '#',
 '$',
 '%',
 '&',
 "'",
 '(',
 ')',
 '*',
 '+',
 ',',
 '-',
 '.',
 '/',
 '0',
 '1',
 '2',
 '3',
 '4',
 '5',
 '6',
 '7',
 '8',
 '9',
 ':',
 ';',
 '<',
 '=',
 '>',
 '?',
 '@',
 'A',
 'B',
 'C',
 'D',
 'E',
 'F',
 'G',
 'H',
 'I',
 'J',
 'K',
 'L',
 'M',
 'N',
 'O',
 'P',
 'Q',
 'R',
 'S',
 'T',
 'U',
 'V',
 'W',
 'X',
 'Y',
 'Z',
 '[',
 ']',
 '_',
 '`',
 'a',
 'b',
 'c',
 'd',
 'e',
 'f',
 'g',
 'h',
 'i',
 'j',
 'k',
 'l',
 'm',
 'n',
 'o',
 'p',
 'q',
 'r',
 's',
 't',
 'u',
 'v',
 'w',
 'x',
 'y',
 'z',
 '{',
 '|',
 '}',
 '~',
 '«',
 '´',
 '»',
 '½',
 'É',
 'á',
 'â',
 'è',
 'é',
 'í',
 'ï',
 'ñ',
 'ò',
 'ó',
 'ô',
 'ö',
 'ø',
 'ú',
 'ğ',
 'ı',
 'ĺ',
 'ō',
 'א',
 'ב',
 'ג',
 'ד',
 'ה',
 'ו',
 'ז',
 'ח',
 'ט',
 'י',
 'ך',
 'כ',
 'ל',
 'ם',
 'מ',
 'ן',
 'נ',
 'ס',
 'ע',
 'צ',
 'ק',
 'ר',
 'ש',
 'ת',
 '،',
 'ء',
 'آ',
 'أ',
 'ؤ',
 'ا',
 'ب',
 'ة',
 'ت',
 'ج',
 'ح',
 'خ',
 'د',
 'ذ',
 'ر',
 'ز',
 'س',
 'ش',
 'ص',
 'ض',
 'ط',
 'ظ',
 'ع',
 'ف',
 'ق',
 'ك',
 'ل

In [16]:
#after encoding text 
seq_len = 125 #length of a batch
char_dataset=tensorflow.data.Dataset.from_tensor_slices(encoded_text)
#creates special tf dataset object
sequences = char_dataset.batch(seq_len+1, drop_remainder=True)
#creates batches of sequences 

def create_seq_targets(seq):
        """this pair is what the NLP model will try predict (the last letter)"""
        input_txt = seq[:-1] #Hello my nam
        target_txt = seq[1:] #ello my name
        return input_txt, target_txt 
dataset = sequences.map(create_seq_targets)

batch_size = 128
buffer_size = 100000 #shuffle by batch to prevent memory issue
dataset = dataset.shuffle(buffer_size).batch(batch_size,drop_remainder=True)

In [17]:
vocab_size=len(vocab)
embed_dim=64
rnn_neurons=1026

from tensorflow.keras.losses import sparse_categorical_crossentropy
def sparse_cat_loss(y_true,y_pred):
        return sparse_categorical_crossentropy(y_true,y_pred,from_logits=True)

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, GRU, Dense

def create_model(vocab_size,embed_dim,rnn_neurons,batch_size):
        model=Sequential()
        model.add(Embedding(vocab_size,embed_dim,
                                             batch_input_shape=[batch_size, None]))
        model.add(GRU(rnn_neurons,return_sequences=True,
                                   stateful=True, recurrent_initializer='glorot_uniform'))
        #stateful = last state for each esample in batch will be used as initial state for following batch
        
        model.add(Dense(vocab_size))
        model.compile(optimizer='adam', loss=sparse_cat_loss)
        return model

model=create_model(vocab_size=vocab_size,
                                     embed_dim=embed_dim,
                                     rnn_neurons=rnn_neurons,
                                     batch_size=batch_size)

model.fit(dataset, epochs=30)

Train for 203 steps
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


<tensorflow.python.keras.callbacks.History at 0x2448e879488>

In [30]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (128, None, 64)           15872     
_________________________________________________________________
gru (GRU)                    (128, None, 1026)         3361176   
_________________________________________________________________
dense (Dense)                (128, None, 248)          254696    
Total params: 3,631,744
Trainable params: 3,631,744
Non-trainable params: 0
_________________________________________________________________


In [19]:
model.save_weights('D:/data_science/GIT/Trump-Twit-Generator/')

In [22]:
tensorflow.keras.models.save_model(model, 'D:/data_science/GIT/Trump-Twit-Generator/my_model.h5')

In [27]:
#after training NLP model
def generate_text(model,start_seed,gen_size=500,temp=1.0):
    input_eval = [char_to_ind[s] for s in start_seed]  #transforming string back into index location
    input_eval = tensorflow.expand_dims(input_eval,0) #expand to fit shape of model
    text_generated = [] 
    model.reset_states()

    for i in range(gen_size):
        predictions=model(input_eval)
        predictions=tensorflow.squeeze(predictions,0) #undo the expand dims
        predictions=predictions/temp #affect prob distribution based on temp
        predicted_id=tensorflow.random.categorical(predictions,num_samples=1)[-1,0].numpy()
        #grabbing the index and converted into np to get single digit
        input_eval=tensorflow.expand_dims([predicted_id],0)
        text_generated.append(ind_to_char[predicted_id])
    return(start_seed+"".join(text_generated))

In [36]:
from tensorflow.keras.models import load_model

In [37]:
model = create_model(vocab_size,embed_dim,rnn_neurons,batch_size = 1)
model.load_weights('my_model.h5')
model.build(tensorflow.TensorShape([1,None]))

In [38]:
model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (1, None, 64)             15872     
_________________________________________________________________
gru_1 (GRU)                  (1, None, 1026)           3361176   
_________________________________________________________________
dense_1 (Dense)              (1, None, 248)            254696    
Total params: 3,631,744
Trainable params: 3,631,744
Non-trainable params: 0
_________________________________________________________________


In [56]:
#what does Donald think about Hillary
for i in range(0,20):
    print(generate_text(model, "Hillary", gen_size=125))

Hillary for Pennsylvania! Looks like Monda on # Trump16 that's only one won—much to fight back!!"
 " @ LutorontFounHe: @ realDonaldT
Hillary facts go over the loo: 
 THANK YOU DOWITH in over 500,000 for the Washington Poll - out to @ realDonaldTrump and be the secu
Hillary Clinton 49% # Trump2016 
 "@Bdendenk978: @ realDonaldTrump @ bluestar vetor now!
 Legal immigrants tapped. Remember, they co
Hillary Clinton cannot Trump on Military, our Vets and the 2018 @ TeamCavuto @toulp keep down! # MakeAmericaGreatAgain"
 " @ DhuedRV
Hillary Clinton is a good thing, not a bad thing. The unemployment rate for life, to be a fine jumpt rominand the illegal immigratio
Hillary’s Veterans! Love tell Trump he's in rare. My great honor!
 The great people of Puerto Rico as possible, I never seen bowh in
Hillary Clinton ‘smart & very bright Investigation Hear), think Her...
 ...it was a little conflict along Mini Mike credibility much
Hillary" on the same things. Save of deals has put a folult we can ne

In [57]:
#what does Donald think about Obama
for i in range(0,20):
    print(generate_text(model, "Obama", gen_size=125))

Obama Coant Down is dead. San to work for our stand. Congressman Megyn.
 " @ bigor03: Peloni 60% @ slunting @ realDonaldTrump Caug
Obama from the Great State of Independence. The reason tour own help fronting the truth. Mike will ifly using all of you!
 Great j
Obama of ABC, SCMPAST PITI- MARCED Decision was hitchungred. I only seeks a 
 " @ ilwing123: I'm very talented reportership was a 
Obama 'House's showed really things what is going on in this Whole Widdwhere muden trying to do, and we should try to get out over
Obama/and KNY toarch sutcess of Leftalicts yet, but spould go up deficies. # Trump2016
 # VoteTrumpMI, # Trump2016
 THANK YOU NORT
Obama voters have to use media outlets, no sources, there was NO COLLUSION and NATO people. Everybody and they refuse to see that 
Obama last year.
 Bad lia, it was Lushed 7.7% in Washington Antressive caring out the now disgraceful showed a great Congressiona 
Obama jobs on now management doesn't she doesn't want to go down to warmive butthe 

In [58]:
#what does Donald think about Coronavirus
for i in range(0,20):
    print(generate_text(model, "corona", gen_size=125))

coronaVATrump just slong me and only Steve Hife” Transcript of the popular India shipping immediately!
 Sad to lose the drive of Pr
coronaVIG: Donald Trump: April in @ TrumpDoral being better reporting in the hopents people like stephing up the House to be on the
coronaVirus Moting over” but they don’t know what we have a n this mon. We should also dismusted for the U.S., doing an insult to a
corona dropped authorities to bobbing air lowest the totally biased in talks to be moving point. He didn't they real change.
 Entre
coronaVirus: 
 # SmallBeNnieThow trought that gives upeace and what most of the problem. Because it is far greater than any other c
coronaVY vood vicious and totally good to determine to both, love that Spying on Bronnjust delivered live nomination.
 Entry rebuon
coronastructuary as a Rocky two great Space. Here idease FBI Director in the poll done, big trial!
 The Fake News Media hates to sa
coronavers. Prevaile must stop what @ realDonaldTrump does best. Amond make 

### Part 2: Making a useable UI on Python 

#### 1. make a function that allows user to input a word
#### 2. randomise where the input word will be in the sentence