Skip to content

astronomer/gen-ai-fine-tune-rag-use-case

Repository files navigation

GenAI with Airflow - RAG + Fine-tuning GPT-3.5 pipeline for content generation

Welcome! This project is a simple but functional blueprint for a RAG + fine-tuning pipeline with Apache Airflow. Fork this project to create your own content generation pipelines!

Tools used:

How to use this repository

Setting up

Option 1: Use GitHub Codespaces

Run this Airflow project without installing anything locally.

  1. Fork this repository.
  2. Create a new GitHub codespaces project on your fork. Make sure it uses at least 4 cores!
  3. Inside of Codespaces, copy the .env_example file contents into a new .env file and provide your OpenAI API key in both the OPENAI_API_KEY and AIRFLOW_CONN_WEAVIATE_DEFAULT fields.
  4. Run astro dev start to start up all necessary Airflow components as well the Streamlit and Weaviate containers. This can take a few minutes.
  5. Once the Airflow project has started access the Airflow UI by clicking on the Ports tab and opening the forward URL for port 8080. The Streamlit app will be available on port 8501.

Option 2: Use the Astro CLI

Download the Astro CLI to run Airflow locally in Docker. astro is the only package you will need to install.

  1. Run git clone https://github.com/astronomer/gen-ai-fine-tune-rag-use-case.git on your computer to create a local clone of this repository.
  2. Install the Astro CLI by following the steps in the Astro CLI documentation. The main prerequisite is Docker Desktop/Docker Engine but no Docker knowledge is needed to run Airflow with the Astro CLI.
  3. Copy the .env_example file contents into a new .env file and provide your OpenAI API key in both the OPENAI_API_KEY and AIRFLOW_CONN_WEAVIATE_DEFAULT fields.
  4. Run astro dev start in your cloned repository.
  5. After your Astro project has started. View the Airflow UI at localhost:8080. The Streamlit app will be available on localhost:8501.

Run the project

  1. Unpause all DAGs, starting top to bottom, by clicking on the toggle on their left hand side. Once the 📚 Ingest Knowledge Base DAG is unpaused it will run once, starting the RAG part of the pipeline.
  2. Kick off the fine-tuning part of the pipeline by running the 🚀 0 - Start Fine-Tuning Pipeline DAG manually.
  3. Watch the DAGs run according to their dependencies which have been set using Datasets. The 🤖 Fine-tune DAG will take approximately 15min to run.
  4. After the last DAG in the pipeline ✨ Champion vs Challenger has completed, open the Streamlit app at localhost:8501 / port forward 8501 in Codespaces.
  5. In the streamlit app click Generate post! to generate new content using the fine-tuned model.
  6. Click Generate picture! to get an image generated by DALLE about the new content.

About

GenAI + Airflow. Fine-tuning + RAG pipeline for content generation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages