GenAI with Airflow - RAG + Fine-tuning GPT-3.5 pipeline for content generation

Welcome! This project is a simple but functional blueprint for a RAG + fine-tuning pipeline with Apache Airflow. Fork this project to create your own content generation pipelines!

Tools used:

Apache Airflow run with the Astro CLI to create a local instance in Docker
OpenAI
Weaviate - running as a local instance in Docker
Streamlit - running as a local instance in Docker
LangChain for chunking
tiktoken for token counting
Matplotlib for plotting

How to use this repository

Setting up

Option 1: Use GitHub Codespaces

Run this Airflow project without installing anything locally.

Fork this repository.
Create a new GitHub codespaces project on your fork. Make sure it uses at least 4 cores!
Inside of Codespaces, copy the .env_example file contents into a new .env file and provide your OpenAI API key in both the OPENAI_API_KEY and AIRFLOW_CONN_WEAVIATE_DEFAULT fields.
Run astro dev start to start up all necessary Airflow components as well the Streamlit and Weaviate containers. This can take a few minutes.
Once the Airflow project has started access the Airflow UI by clicking on the Ports tab and opening the forward URL for port 8080. The Streamlit app will be available on port 8501.

Option 2: Use the Astro CLI

Download the Astro CLI to run Airflow locally in Docker. astro is the only package you will need to install.

Run git clone https://github.com/astronomer/gen-ai-fine-tune-rag-use-case.git on your computer to create a local clone of this repository.
Install the Astro CLI by following the steps in the Astro CLI documentation. The main prerequisite is Docker Desktop/Docker Engine but no Docker knowledge is needed to run Airflow with the Astro CLI.
Copy the .env_example file contents into a new .env file and provide your OpenAI API key in both the OPENAI_API_KEY and AIRFLOW_CONN_WEAVIATE_DEFAULT fields.
Run astro dev start in your cloned repository.
After your Astro project has started. View the Airflow UI at localhost:8080. The Streamlit app will be available on localhost:8501.

Run the project

Unpause all DAGs, starting top to bottom, by clicking on the toggle on their left hand side. Once the 📚 Ingest Knowledge Base DAG is unpaused it will run once, starting the RAG part of the pipeline.
Kick off the fine-tuning part of the pipeline by running the 🚀 0 - Start Fine-Tuning Pipeline DAG manually.
Watch the DAGs run according to their dependencies which have been set using Datasets. The 🤖 Fine-tune DAG will take approximately 15min to run.
After the last DAG in the pipeline ✨ Champion vs Challenger has completed, open the Streamlit app at localhost:8501 / port forward 8501 in Codespaces.
In the streamlit app click Generate post! to generate new content using the fine-tuned model.
Click Generate picture! to get an image generated by DALLE about the new content.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.astro		.astro
.devcontainer		.devcontainer
api_scripts		api_scripts
dags		dags
include		include
plugins		plugins
tests/dags		tests/dags
.dockerignore		.dockerignore
.env_example		.env_example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
__init__.py		__init__.py
docker-compose.override.yml		docker-compose.override.yml
packages.txt		packages.txt
requirements.txt		requirements.txt

astronomer/gen-ai-fine-tune-rag-use-case

Folders and files

Latest commit

History

Repository files navigation

GenAI with Airflow - RAG + Fine-tuning GPT-3.5 pipeline for content generation

How to use this repository

Setting up

Option 1: Use GitHub Codespaces

Option 2: Use the Astro CLI

Run the project

About

Resources

Stars

Watchers

Forks

Languages