### TrumpTweet Notebook
This notebook references the code to create a Trump tweet. This includes:
- Processing the Trump Tweet archive to create a clean file of Trump's tweets
- Defining and training GRU long short term memory recurrent neural network
- Feeding data into the model in order to create a novel tweet

#### Data Prep

In [59]:
# Disable warnings
import warnings
warnings.filterwarnings('ignore')

In [118]:
# Run the helpers scripts with the data and model helper objects
%run scripts/helpers
#from scripts import helpers

In [3]:
# Verify GPU support
tf.config.list_physical_devices('GPU')


[]

##### Create a datahelper and process raw data

In [65]:
# Create a datahelper object and designate the input file
dh = DataHelper(file_name='tweets_12-29-2020.csv')

# Prep the raw data to create the tweet file
dh.prep_raw_data(start_date='2020-06-01', end_date='2020-12-29')

# Print the number of tweets
print('The number of Tweets sent by Trump during the period is {}'.format(dh.num_tweets))


Data processing complete.
The number of Tweets sent by Trump during the period is 3888


##### Tokenize the text and create the dataset for model training

In [43]:
# Tokenize the text and create the dataset
dataset, tokenizer = dh.create_tokenizer('inputdata/clean_tweet.txt')
print('The number of unique characters is {0:,} and the dataset size is {1:,} document(s).' \
      ' The number of windows in the dataset for processing is {2:,}.'.format(dh.num_unique_chars,dh.dataset_size,dh.num_data_windows))


Dataset and tokenizer creation complete.
The number of unique characters is 101 and the dataset size is 1 document(s). The number of windows in the dataset for processing is 481,083.


##### Create the Model and Train It
The model is a stateless RNN made up of GRU cells

In [6]:
# Create the modelhelper object
mh = ModelHelper(epochs=20)

# Create the model
model = mh.create_model(tokenizer)

# Compile the model
model.compile(loss='sparse_categorical_crossentropy',optimizer='adam')

# Save a checkpoint after every epoch
EPOCHS = 20
checkpoint_filepath = 'checkpoints/weights.{epoch:02d}.hdf5'
model_checkpoint = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_filepath,
    save_weights_only=False,
    save_freq='epoch',
    monitor='val_loss',
    mode='min',
    save_best_only=False)

# Fit the model
history = model.fit(dataset,epochs=EPOCHS,callbacks=[model_checkpoint])


Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


##### Start Training Model from Saved Checkpoint
Training the model takes quite some time. A checkpoint is saved every epoch so the code below will allow you to resume training from a checkpoint

In [None]:
#restart training from saved checkpoint
new_model = load_model('checkpoints/weights.09.hdf5')

# Save a checkpoint after every epoch
EPOCHS = 1
checkpoint_filepath = 'checkpoints/weights.{epoch:02d}.hdf5'
model_checkpoint = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_filepath,
    save_weights_only=False,
    save_freq='epoch',
    monitor='val_loss',
    mode='min',
    save_best_only=False)

#Fit the model
history = new_model.fit(dataset,epochs=EPOCHS,callbacks=[model_checkpoint])


In [None]:
# Save model
model.save('model.h5')

#### Load a saved model and generate a Tweet

In [120]:
# Create a datahelper object and designate the input file
dh = DataHelper(file_name='tweets_12-29-2020.csv')

# Prep the raw data to create the tweet file
dh.prep_raw_data(start_date='2020-06-01', end_date='2020-12-29')

# Print the number of tweets
print('The number of Tweets sent by Trump during the period is {}'.format(dh.num_tweets))

Data processing complete.
The number of Tweets sent by Trump during the period is 3888


In [121]:
# re-create the tokenizer
# Tokenize the text and create the dataset
dataset, tok = dh.create_tokenizer('inputdata/clean_tweet.txt')
print('The number of unique characters is {0:,} and the dataset size is {1:,} document(s).' \
      ' The number of windows in the dataset for processing is {2:,}.'.format(dh.num_unique_chars,dh.dataset_size,dh.num_data_windows))

# restore saved model for inferencing
mh = ModelHelper(epochs=1, tokenizer=tok)
new_model = mh.restore_model('weights.50.hdf5')

Dataset and tokenizer creation complete.
The number of unique characters is 101 and the dataset size is 1 document(s). The number of windows in the dataset for processing is 481,083.


##### Generate some tweets

In [94]:
# Create a long sequence of text
print(mh.create_tweet(text='Mo Brooks: House Members Joined Trump, Meadows in Electoral Vote Challenge Call', model=new_model, n_chars=140, temperature=0.02))

Mo Brooks: House Members Joined Trump, Meadows in Electoral Vote Challenge Call (in poll) thing to comport many people $2000, rather than the measly $600 that is now in the bill. also, stop the billions of dollars in “p


#### Generate Tweets Based on Current News Headlines

In [124]:
# Get some headlines
th = TweetHelper()
headlines = th.get_headlines(news_source='nytimes.com', num_headlines=3)
print(headlines)


['Ben Sasse Slams Republican Effort to Challenge Election', 'Why Coronavirus Vaccine Distribution is Taking Longer Than Expected', 'Photos From Fashion’s Uncertain Year']


In [128]:
# Create some Tweets
for headline in headlines:
    tweet = mh.create_tweet(text=headline, model=new_model, n_chars=140, temperature=0.02)
    tweet = tweet[len(headline):]
    print("Headline: " + headline)
    print("Trump Tweet: " + tweet)
    print("-"*25)


Headline: Ben Sasse Slams Republican Effort to Challenge Election
Trump Tweet: . now they (almost all) sit back and watch me fight against a crooked and vicious foe, the radical left democrats. i will never forget!"
"me
-------------------------
Headline: Why Coronavirus Vaccine Distribution is Taking Longer Than Expected
Trump Tweet: . this was the most corrupt election in the history of our country, and it must be closely examined!"




"rush is the greatest! 

"more tha
-------------------------
Headline: Photos From Fashion’s Uncertain Year
Trump Tweet: s in the approval of numerous great new vaccines, it is still a big, old, slow turtle. get the dam vaccines out now, dr. hahn @stevefda. sto
-------------------------


hon is 
