GPT-2 Twitter Kubernetes

A FastAPI app and python scripts designed to run a GPT-2 powered twitter bot on a schedule using Kubernetes deployment, cronjobs and MongoDB.

The newest state-of-the-art GPT-2 language model from OpenAI is trained and fine-tuned on user tweets.
Due to the current unpredictibility of AI-generated language models, AI tweets are batch generated and human currated.
A centralized Mongodb dataset hosted on kubernetes is used to house the currated tweets.
FastAPI endpoints are used to populate the database with new generated tweets
The endpoints and k8 cronjobs are used to tweet out the genereated tweets with a specified user account using Tweepy.

Installation

git clone git@github.com:AaronGrainer/gpt2-twitter-kubernetes.git

conda create -n [ENV_NAME] python=3.8

conda activate [ENV_NAME]

pip install -r requirements.txt

Downloading User Tweets

Twitter's API currently limits the users to retrieving only the latest 3,200 tweets from a given user, which is not nearly enough input data for training. Therefore, the python package twint is used to bypass the API limitation.

To download a list of tweets from any given user, call download_tweets.py, For example:

python -m src.scripts.download_tweets --username=karpathy

Training GPT-2 on User Tweets

Given the downloaded user tweets, the huggingface Trainer API is used to train a GPT-2 model to generate tweets based on the tweeting pattern of the user.

Train the model on the downloaded tweets by calling trainer.py, the provided filename should match the tweets file downloaded. For example:

python -m src.ml.trainer --dataset=data/karpathy_tweets.txt

This will train the GPT-2 model and generate an output tweet file that looks like this:

My poor brain. I'm so tempted to plug this in and see if I can't decipher it as a self-supervised basic arithmetic problem. But no, I can't. It's too painful. Too much loss. Something must give. Please, gods above, let this be true.
====================
The 2D appearance of an image can be deceivingly complex. My head may be incomplete, but I assure you it's not incomplete.
====================
While coding, almost everything I write is saved. I don't want this.
====================
Spending extra time tuning my baselines, thinking about how much incentive I have not to, & how this is the reason we can't have nice things
====================
SpaceX Falcon 9 launch in ~20 minutes! + Another attempt at first stage recovery coming   #soexcite
====================

You are now able to currate the tweets above, removing an non-funny tweets perhaps? But please do keep the general format intact.

Environment Variables

Create a new .env file using the .env_reference file as reference. Populate the .env file with the desired app settings. This file serves to provide all the environment variables necessary for deploying the app.

Setting up Twitter Bot

To populate the .env file and run a Twitter bot, the CONSUMER_KEY, CONSUMER_SECRET, ACCESS_KEY and ACCESS_SECRET is required.

Create a normal twitter account and apply for a twitter developer app.
The 4 twitter variables can be obtained within the twitter developer app page.

Trying the app out Locally / Minikube / Skaffold

To run the app localy, first deploy the kubernetes mongoDB instance the use skaffold to deploy the APP and job.

# Startup local minikube cluster
minikube start
kubectl create namespace gpt2-twitter
kubectl config set-context minikube --namespace=gpt2-twitter
minikube dashboard

# Deploy Kubernetes Mongodb and open port-forward for development
kubectl apply -f kubernetes/mongo-volume.yaml
kubectl apply -f kubernetes/mongo.yaml

kubectl port-forward svc/mongo 4321:27017

Populating the server

After MongoDB service is running, visit the [URL]/docs. You will be greeted with FastAPI's automatic interative documentation powered by Swagger UI.

Select the /add_tweets_file/ endpoint, enter the x-token and upload the currated tweets file generated during the model training phase.
You can use the connect to the Mongodb port-forwarded endpoint host:port to verify that the database has been populated with the currated tweets.

Running the tweet job

You can either run the kubernetes Cronjob to schedule a routined tweet, or access the API endpoint /post_tweet/ to post a tweet.

# Start-up either the FastAPI App container
skaffold dev

# or cronjob container
skaffold dev -f skaffold-post.yaml

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
docs		docs
kubernetes		kubernetes
src		src
.dockerignore		.dockerignore
.env_sample		.env_sample
.gcloudignore		.gcloudignore
.gitignore		.gitignore
Dockerfile.app		Dockerfile.app
Dockerfile.tweet		Dockerfile.tweet
LICENSE		LICENSE
README.md		README.md
docker-compose.yaml		docker-compose.yaml
makefile		makefile
requirements.txt		requirements.txt
skaffold-post.yaml		skaffold-post.yaml
skaffold.yaml		skaffold.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPT-2 Twitter Kubernetes

Installation

Downloading User Tweets

Training GPT-2 on User Tweets

Environment Variables

Setting up Twitter Bot

Trying the app out Locally / Minikube / Skaffold

Populating the server

Running the tweet job

About

Releases

Packages

Contributors 2

Languages

License

AaronGrainer/gpt2-twitter-kubernetes

Folders and files

Latest commit

History

Repository files navigation

GPT-2 Twitter Kubernetes

Installation

Downloading User Tweets

Training GPT-2 on User Tweets

Environment Variables

Setting up Twitter Bot

Trying the app out Locally / Minikube / Skaffold

Populating the server

Running the tweet job

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages