
<div align="left">
<img src="./assets/trump-tweet-sketch.jpg">
</div>
<br>
<br>

> **"Now they (almost all) sit back and watch me fight against a crooked and vicious foe, the radical left democrats. i will never forget!"**<br>
> *-TrumpTweet Bot*

### TrumpTweet
TrumpTweet is an exploratory project for artificial text generation. Irrespective of one's politics, America's president is an avid and provocative user of Twitter - Tweeting more than 50,000 times as of December 2020! By applying artifical intelligence to model Trump's Tweets one can both gain some insights into Trump's topics of concern and use of language, and how an AI system will learn them in order to generate novel Tweets. In the case of this project, the neural network learns from Trump's past Tweets and will then generate novel Tweets in response to current headlines in the News source of your choice.

For example in response to the New York Times January 4, 2021 headline "Ben Sasse Slams Republican Effort to Challenge Election" the AI Trump responded with the Tweet "Now they (almost all) sit back and watch me fight against a crooked and vicious foe, the radical left democrats. i will never forget!".

As another example, Breitbart News ran the headline "Georgia Election ‘Is All About Divided Government’" to which the artificial Trump responded "The presidency of the usa’s 2020 election. ours, with its millions and millions of corrupt mail-in ballots"

#### TrumpTweet Architecture
The TrumpTweet project ingests Tweets from TrumpTwitterArchive.com using both the historical archive and the latst 1000 Tweets from Trump. It then uses TensorFlow and Keras to tokenize the Tweets and train a GRU RNN model. A simple set of functions based on Aurelion Geron's novel text generation example are used to create new Tweets. Finally, Python's URLLIB library and BeautifulSoup are used to scrape Google news for headlines from a specified news source. The headlines are then fed to the Tweet Generation routine to create novel Tweets in response to the headlines.

This project is not intended as a live web app, though it could certainly be made so if someone wanted to support the cost of the infrastructure. Instead it uses a JupyterLab notebook to orchestrate data ingestion and prep, model training, web scraping, and Tweet creation. The notebook is supported by a python script with several helper functions for data prep, modeling, and Tweeting. 

#### TrumpTweet Limitations and Possible Enhancements
Ideally, the TrumpTweet project would be used live to monitor both Trump's current Tweets, and current headlines and generate an ongoing stream of Tweets in response. However, infrastructure costs would be entailed. Training the GRU model does require GPU hardware in order to complete model training in a reasonable amount of time. This could be done for free at Google Colab for someone willing to re-work the code for a Colab notebook/py script combination, or it could be done on an Azure or AWS machine. The model included took 33 hours to train on Microsoft's most minimal GPU machine with Tensorflow/CuDNN support at a cost of $1.31 per hour. It should be noted that the model was relatively simple, and was only trained for 50 epochs. Due to cost, no effort was made to optimize the model complexity or training depth for more robust responses and realistic Tweets.

Some thoughts, therefore, about possible enhancements to the project:

- **Model Optimization:** Experimentation with the model may yield more realistic results. The model was trained using only the prior 6 months of Tweets, so  additional training data, a more complex/deeper model, and additional epochs could be experimented with in order to improve results. Additionally, a statefull RNN may be used as well.
- **Online Model:** The current architecture does support the ingestion of new data from a secondary API of recent Trump Tweets found at TrumpTwitterArchive.com and does support updating the model and restarting training from a saved model. However, there is no facility for automating this. An improvement would be to create an automated job such as an Azure function to either monitor Twitter directly or the TrumpTwitterArchive, and then update the model on a regular basis to include new Tweets.
- **Web App:** Ideally the TrumpTweet project would be published as a web app which would both monitor news feeds and create Tweets in response to headlines, or allow a user to enter their own "headline" to which Trump would respond. I also had considered creating a companion BidenTweet model and then have the two of them generate novel Tweets in response to one another, effectively having "debate" over Twitter.
- **Tweet Trimming:** The current implementation simply generates the next 140 characters. This sometimes cuts off a word, so a feature could be created to trim down to a word in order to end the tweet.



#### Data Prep
Run the following cells to prep the environment. The first cell merely diables warnings that will arise due to the fact the model is set up for two different (CPU and GPU) environments. The second cell is critical as it imports all of the supporting scripts that do the work. Finally, a quick check to see if GPU's are available. If not an empty list is returned.

In [59]:
# Disable warnings just to keep the screen clean
import warnings
warnings.filterwarnings('ignore')

In [136]:
# Run the helpers scripts with the data and model helper objects
%run scripts/helpers


In [3]:
# Verify GPU support or not
tf.config.list_physical_devices('GPU')


[]

##### Create a datahelper and process raw data
Run the following cell to update the designated archive file with the most recent 1000 Trump Tweets.

In [138]:
# Update the archive file if desired. This will retrieve the latest 1000 Tweets and append any not already present to the archive file.
# It will then create a new archive file with current date in the name
dh = DataHelper(file_name='tweets_12-29-2020.csv')  # Current archive. This is used to establish the DH object
dh.update_archive_file(current_archive_file='tweets_01-04-2021.csv')  # File that new tweets will be added to when creating new archive


Archive file./inputdata/tweets_01-05-2021.csv has been created


Once the archive file is created, designate the time range for which you want to train the model. Note that if you already created the dh object in the cell above, you will want to update the first line with the new archive file name. This code preps the data and creates a clean file with URL's removed from the Tweets since we don't want to train on the URL's.

In [65]:
# Create a datahelper object and designate the input file
dh = DataHelper(file_name='tweets_12-29-2020.csv')

# Prep the raw data to create the tweet file
dh.prep_raw_data(start_date='2020-06-01', end_date='2020-12-29')

# Print the number of tweets in the time range
print('The number of Tweets sent by Trump during the period is {}'.format(dh.num_tweets))


Data processing complete.
The number of Tweets sent by Trump during the period is 3888


##### Tokenize the text and create the dataset for model training
In order to do the text analysis, you'll need to tokenize the text. The following line creates a Keras tokenizer object. Note that some of the defaults have been changed on the tokenizer to better handle Tweets. For specifics of the tokenizer, look at the python helper script.

In [43]:
# Tokenize the text and create the dataset
dataset, tokenizer = dh.create_tokenizer('inputdata/clean_tweet.txt')
print('The number of unique characters is {0:,} and the dataset size is {1:,} document(s).' \
      ' The number of windows in the dataset for processing is {2:,}.'.format(dh.num_unique_chars,dh.dataset_size,dh.num_data_windows))


Dataset and tokenizer creation complete.
The number of unique characters is 101 and the dataset size is 1 document(s). The number of windows in the dataset for processing is 481,083.


##### Create the Model and Train It
Create the model and train it. Depending on your hardware and how much data you include, it could take a significant amount of time. On my CPU machine, training 3 months of tweets would take more than a month. The GPU machine cut this to a little more than a day.

Note that you can adjust some options here like the number of epochs. Also, the model will be saved as a full checkpoint after each epoch in case training gets interrupted.

In [None]:
# Create the modelhelper object
mh = ModelHelper(epochs=20)

# Create the model
model = mh.create_model(tokenizer)

# Compile the model
model.compile(loss='sparse_categorical_crossentropy',optimizer='adam')

# Save a checkpoint after every epoch
EPOCHS = 20
checkpoint_filepath = 'checkpoints/weights.{epoch:02d}.hdf5'
model_checkpoint = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_filepath,
    save_weights_only=False,
    save_freq='epoch',
    monitor='val_loss',
    mode='min',
    save_best_only=False)

# Fit the model
history = model.fit(dataset,epochs=EPOCHS,callbacks=[model_checkpoint])


##### Start Training Model from Saved Checkpoint
Training the model takes quite some time. A checkpoint is saved every epoch so the code below will allow you to resume training from a checkpoint. The code below can be used to restart training of the file.

In [None]:
#restart training from saved checkpoint
new_model = load_model('checkpoints/weights.09.hdf5')

# Save a checkpoint after every epoch
EPOCHS = 1
checkpoint_filepath = 'checkpoints/weights.{epoch:02d}.hdf5'
model_checkpoint = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_filepath,
    save_weights_only=False,
    save_freq='epoch',
    monitor='val_loss',
    mode='min',
    save_best_only=False)

#Fit the model
history = new_model.fit(dataset,epochs=EPOCHS,callbacks=[model_checkpoint])


#### Load a saved model and generate a Tweet
Run the code below to re-load both the data and a saved model in order generate some Tweets

In [120]:
# Create a datahelper object and designate the input file
dh = DataHelper(file_name='tweets_12-29-2020.csv')

# Prep the raw data to create the tweet file
dh.prep_raw_data(start_date='2020-06-01', end_date='2020-12-29')

# Print the number of tweets
print('The number of Tweets sent by Trump during the period is {}'.format(dh.num_tweets))

Data processing complete.
The number of Tweets sent by Trump during the period is 3888


In [121]:
# re-create the tokenizer
# Tokenize the text and create the dataset
dataset, tok = dh.create_tokenizer('inputdata/clean_tweet.txt')
print('The number of unique characters is {0:,} and the dataset size is {1:,} document(s).' \
      ' The number of windows in the dataset for processing is {2:,}.'.format(dh.num_unique_chars,dh.dataset_size,dh.num_data_windows))

# restore saved model for inferencing
mh = ModelHelper(epochs=1, tokenizer=tok)
new_model = mh.restore_model('weights.50.hdf5')

Dataset and tokenizer creation complete.
The number of unique characters is 101 and the dataset size is 1 document(s). The number of windows in the dataset for processing is 481,083.


##### Generate some tweets
The cell below will generate a Tweet based on the passed 'text' argument

In [139]:
# Create a long sequence of text
print(mh.create_tweet(text='Kamala', model=new_model, n_chars=140, temperature=0.02))

Kamaland and for the presidency, like the democrats would do if they had actually won. the proof is irrefutable! massive late night mail-in ballot


#### Generate Tweets Based on Current News Headlines
If you'd like to see how the artificial Trump would respond to current news headlines, you can run the two cells below. The first cell scrapes some headlines from Google news based on the news source provider entered and returns a list of headlines. The second cell recursively calls the Tweet generator for each headline in the list.

In [140]:
# Get some headlines
th = TweetHelper()
headlines = th.get_headlines(news_source='nytimes.com', num_headlines=5)
print(headlines)


['Citrus and Persimmon Salad Recipe - NYT Cooking', 'What Scientists Know About How the Coronavirus Variant Spreads', 'Stimulus Money Should Have Gone to the Jobless, Economists Say', 'Becky Hammon Becomes First Woman to Serve as Head Coach in N.B.A. Game', 'Pope Francis to Skip New Year Services Because of Leg Condition']


In [141]:
# Create some Tweets
for headline in headlines:
    tweet = mh.create_tweet(text=headline, model=new_model, n_chars=140, temperature=0.02)
    tweet = tweet[len(headline):]
    print("Headline: " + headline)
    print("Trump Tweet: " + tweet)
    print("-"*25)


Headline: Citrus and Persimmon Salad Recipe - NYT Cooking
Trump Tweet:  at the many reasons it was brought. a rigged election, fight on!"
"so, you’re the president of the united states

full video:  
"they are s
-------------------------
Headline: What Scientists Know About How the Coronavirus Variant Spreads
Trump Tweet:  and allow for removal of military from far away, and very unappreciative, lands. thank you! 
"just released data shows many thousands of no
-------------------------
Headline: Stimulus Money Should Have Gone to the Jobless, Economists Say
Trump Tweet:  to “people where are the proof of the people, florida &amp; others are open &amp; doing well. common sense please!"
"time for republican se
-------------------------
Headline: Becky Hammon Becomes First Woman to Serve as Head Coach in N.B.A. Game
Trump Tweet:  fight for it. don’t let them take it away!"
"oh, they do this to me every day. when will they apologize? 
"governor @briankempga and his pu
---------------------