# Twitter Bot with GPT-2

  [Twitter Account link](https://twitter.com/ai_telling) <br>
[Twitter Developer link](https://developer.twitter.com/en/apps/16374336)

## Background
In this Jupyter notebook you can play around with the small and medium version (117M/334M) of **Open AI's GPT-2** Model from the paper *[Language Models are Unsupervised Multitask Learners](https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf).*

According to the authors, the GPT-2 algorithm was trained on the task of *language modeling*--- which tests a program's ability to predict the next word in a given sentence--by ingesting huge numbers of articles, blogs, and websites. By using just this data it achieved state-of-the-art scores on a number of unseen language tests, an achievement known as *zero-shot learning.* It can also perform other writing-related tasks, like translating text from one language to another, summarizing long articles, and answering trivia questions.

Open AI decided not to release the dataset, training code, or the full GPT-2 model weights. This is due to the concerns about large language models being used to generate deceptive, biased, or abusive language at scale. Some examples of the applications of these models for malicious purposes are:
* Generate misleading news articles
* Impersonate others online
* Automate the production of abusive or faked content to post on social media
* Automate the production of spam/phishing content

As one can imagine, this combined with recent advances in generation of synthetic imagery, audio, and video implies that it's never been easier to create fake content and spread disinformation at scale. The public at large will need to become more skeptical of the content they consume online. 

----

**PRs are welcomed !**


----


## Steps
Before starting, is recommended to set *Runtime Type* to *GPU* on the top menu bar.


###1. Installation
Download the model data and istall Python libraries:


In [0]:
!git clone https://github.com/ShaneZhong/gpt-2-twitter-bot/
import os
os.chdir('gpt-2-twitter-bot')
!python download_model.py 117M
!python download_model.py 345M
!pip3 install -r requirements.txt

Cloning into 'gpt-2-twitter-bot'...
remote: Enumerating objects: 27, done.[K
remote: Counting objects:   3% (1/27)   [Kremote: Counting objects:   7% (2/27)   [Kremote: Counting objects:  11% (3/27)   [Kremote: Counting objects:  14% (4/27)   [Kremote: Counting objects:  18% (5/27)   [Kremote: Counting objects:  22% (6/27)   [Kremote: Counting objects:  25% (7/27)   [Kremote: Counting objects:  29% (8/27)   [Kremote: Counting objects:  33% (9/27)   [Kremote: Counting objects:  37% (10/27)   [Kremote: Counting objects:  40% (11/27)   [Kremote: Counting objects:  44% (12/27)   [Kremote: Counting objects:  48% (13/27)   [Kremote: Counting objects:  51% (14/27)   [Kremote: Counting objects:  55% (15/27)   [Kremote: Counting objects:  59% (16/27)   [Kremote: Counting objects:  62% (17/27)   [Kremote: Counting objects:  66% (18/27)   [Kremote: Counting objects:  70% (19/27)   [Kremote: Counting objects:  74% (20/27)   [Kremote: Counting objects:  77% 

### 2. Test with the interactive model

There are a few flags available, with a default value: 
- `seed = None`  || a random value is generated unless specified. give a specific integer value if you want to reproduce same results in the future.
- `nsamples = 1`     ||  specify the number of samples you want to print
- `length = None`   ||  number of tokens (words) to print on each sample.
- `batch_size= 1`  ||  how many inputs you want to process simultaneously. *doesn't seem to affect the results. * 
- `temperature = 1`  ||  scales logits before sampling prior to softmax.
- `top_k = 0`   ||  truncates the set of logits considered to those with the highest values.


In [0]:
!python3 src/interactive_conditional_samples.py -- --help

Type:        function
String form: <function interact_model at 0x7fca17d17d08>
File:        /content/gpt-2-twitter-bot/gpt-2-twitter-bot/src/interactive_conditional_samples.py
Line:        11
Docstring:   Interactively run the model
:model_name=117M : String, which model to use
:seed=None : Integer seed for random number generators, fix seed to reproduce
 results
:nsamples=1 : Number of samples to return total
:batch_size=1 : Number of batches (only affects speed/memory).  Must divide nsamples.
:length=None : Number of tokens in generated text, if None (default), is
 determined by model hyperparameters
:temperature=1 : Float value controlling randomness in boltzmann
 distribution. Lower temperature results in less random completions. As the
 temperature approaches zero, the model will become deterministic and
 repetitive. Higher temperature results in more random completions.
:top_k=0 : Integer value controlling diversity. 1 means only 1 word is
 considered for each step (token), resul

### 3. Load Twitter Credentials
  [Twitter Account link](https://twitter.com/ai_telling) <br>
[Twitter Developer link](https://developer.twitter.com/en/apps/16374336)

In [0]:
# Enter your twitter account detail here
consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''

### 3. Twitter API connection

In [0]:
import tweepy
from time import sleep

# Code to access the account
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

user = api.me()
print("This is my user name: "+user.name)

This is my user name: AI Story Telling


### Using GPT-2 model to create tweets

In [0]:
# Copy all the files from src to pwd
!cp -a ./src/. .

In [0]:
# Load the interact model
from interactive_conditional_colab import interact_model

### Provide input text
To be automated - based on trendy topics

In [0]:
# set up input
input_text = "Once upon a time,"
end_text = '#TweetCreatedByAI'

In [0]:
import re
import numpy as np

# run the model
interact_model(input_text=input_text,
               temperature=0.8)

# cleaning the ouput
f=open("GPT-2_output.txt", "r")
context = f.read()

tweet_raw = context.split("=========Start==========")[0]
tweet_split = re.split('! |\.',tweet_raw)
tweet_split_len = [len(scentence)+1 for scentence in tweet_split]
tweet_split_len_cum = np.cumsum(tweet_split_len)

# max length limit is 280 characters
tweet_len = 0 # initialise 
tweet_len_limit = 280 - len(end_text) - 4
for length in tweet_split_len_cum:
  if length <= tweet_len_limit:
    tweet_len = length
  else:
    print("Maximum tweet length: " + str(tweet_len))
    break

gpt2_input = tweet_raw[:tweet_split_len_cum[0]] + "\n"
gpt2_output=tweet_raw[tweet_split_len_cum[0]+1:tweet_len] + "\n"

tweet_final = gpt2_input + gpt2_output + end_text
print("="*30)
print(tweet_final)

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Use tf.cast instead.
Instructions for updating:
Use tf.random.categorical instead.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from models/345M/model.ckpt
Once upon a time,
 the energy at the heart of modern democracy was embodied in the young boys who had assembled at the nation's capital to hear FDR deliver the Great Society speech in 1981. The Age of Reagan, for instance, should have included an account of the Clinton administration's role in running down the economy and privatizing Social Security. There was simply no reason to include this so-called "tough on crime" agenda in the Clinton administration's history, which Bill Clinton later tried to paint as a paragon of anti-crime initiatives, and which ended up leading to the deaths of countless innocent people.

But we have come a long way in as a nation 

In [0]:
# Post the above tweet online
api.update_status(tweet_final)

Status(_api=<tweepy.api.API object at 0x7f32fa96ccc0>, _json={'created_at': 'Sun Jun 09 08:48:04 +0000 2019', 'id': 1137642505737056259, 'id_str': '1137642505737056259', 'text': 'Once upon a time, the energy at the heart of modern democracy was embodied in the young boys who had assembled at t… https://t.co/G5M1Or5ZyI', 'truncated': True, 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': [{'url': 'https://t.co/G5M1Or5ZyI', 'expanded_url': 'https://twitter.com/i/web/status/1137642505737056259', 'display_url': 'twitter.com/i/web/status/1…', 'indices': [117, 140]}]}, 'source': '<a href="https://github.com/openai/gpt-2" rel="nofollow">AI Story Telling</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 1128859106607976448, 'id_str': '1128859106607976448', 'name': 'AI Story Telling', 'screen_name': 'ai_telling', 'location': 'Sydney, New Sou

--------------
# END OF THE SCRIPT