A FastAPI app and python scripts designed to run a GPT-2 powered twitter bot on a schedule using Kubernetes deployment, cronjobs and MongoDB.
-
The newest state-of-the-art GPT-2 language model from OpenAI is trained and fine-tuned on user tweets.
-
Due to the current unpredictibility of AI-generated language models, AI tweets are batch generated and human currated.
-
A centralized Mongodb dataset hosted on kubernetes is used to house the currated tweets.
-
FastAPI endpoints are used to populate the database with new generated tweets
-
The endpoints and k8 cronjobs are used to tweet out the genereated tweets with a specified user account using Tweepy.
git clone git@github.com:AaronGrainer/gpt2-twitter-kubernetes.git
conda create -n [ENV_NAME] python=3.8
conda activate [ENV_NAME]
pip install -r requirements.txt
Twitter's API currently limits the users to retrieving only the latest 3,200 tweets from a given user, which is not nearly enough input data for training. Therefore, the python package twint is used to bypass the API limitation.
To download a list of tweets from any given user, call download_tweets.py
, For example:
python -m src.scripts.download_tweets --username=karpathy
Given the downloaded user tweets, the huggingface Trainer API is used to train a GPT-2 model to generate tweets based on the tweeting pattern of the user.
Train the model on the downloaded tweets by calling trainer.py
, the provided filename should match the tweets file downloaded. For example:
python -m src.ml.trainer --dataset=data/karpathy_tweets.txt
This will train the GPT-2 model and generate an output tweet file that looks like this:
My poor brain. I'm so tempted to plug this in and see if I can't decipher it as a self-supervised basic arithmetic problem. But no, I can't. It's too painful. Too much loss. Something must give. Please, gods above, let this be true.
====================
The 2D appearance of an image can be deceivingly complex. My head may be incomplete, but I assure you it's not incomplete.
====================
While coding, almost everything I write is saved. I don't want this.
====================
Spending extra time tuning my baselines, thinking about how much incentive I have not to, & how this is the reason we can't have nice things
====================
SpaceX Falcon 9 launch in ~20 minutes! + Another attempt at first stage recovery coming #soexcite
====================
You are now able to currate the tweets above, removing an non-funny tweets perhaps? But please do keep the general format intact.
- Create a new .env file using the .env_reference file as reference. Populate the .env file with the desired app settings. This file serves to provide all the environment variables necessary for deploying the app.
To populate the .env file and run a Twitter bot, the CONSUMER_KEY
, CONSUMER_SECRET
, ACCESS_KEY
and ACCESS_SECRET
is required.
-
Create a normal twitter account and apply for a twitter developer app.
-
The 4 twitter variables can be obtained within the twitter developer app page.
To run the app localy, first deploy the kubernetes mongoDB instance the use skaffold to deploy the APP and job.
# Startup local minikube cluster
minikube start
kubectl create namespace gpt2-twitter
kubectl config set-context minikube --namespace=gpt2-twitter
minikube dashboard
# Deploy Kubernetes Mongodb and open port-forward for development
kubectl apply -f kubernetes/mongo-volume.yaml
kubectl apply -f kubernetes/mongo.yaml
kubectl port-forward svc/mongo 4321:27017
After MongoDB service is running, visit the [URL]/docs. You will be greeted with FastAPI's automatic interative documentation powered by Swagger UI.
-
Select the
/add_tweets_file/
endpoint, enter the x-token and upload the currated tweets file generated during the model training phase. -
You can use the connect to the Mongodb port-forwarded endpoint host:port to verify that the database has been populated with the currated tweets.
You can either run the kubernetes Cronjob to schedule a routined tweet, or access the API endpoint /post_tweet/
to post a tweet.
# Start-up either the FastAPI App container
skaffold dev
# or cronjob container
skaffold dev -f skaffold-post.yaml