Name		Name	Last commit message	Last commit date
parent directory ..
scripts		scripts
01_b_load_inference_vllm.py		01_b_load_inference_vllm.py
01_load_inference.py		01_load_inference.py
02_[chat]_mlflow_logging_inference.py		02_[chat]_mlflow_logging_inference.py
02_mlflow_logging_inference.py		02_mlflow_logging_inference.py
03_[chat]_serve_driver_proxy.py		03_[chat]_serve_driver_proxy.py
03_serve_driver_proxy.py		03_serve_driver_proxy.py
04_[chat]_langchain.py		04_[chat]_langchain.py
04_langchain.py		04_langchain.py
05_fine_tune_deepspeed.py		05_fine_tune_deepspeed.py
06_fine_tune_qlora.py		06_fine_tune_qlora.py
07_ai_gateway.py		07_ai_gateway.py
README.md		README.md

README.md

Example notebooks for the Llama 2 7B model on Databricks

This folder contains the following examples for Llama 2 models: `

File	Description	Model Used	GPU Minimum Requirement
01_load_inference	Environment setup and suggested configurations when inferencing Llama 2 models on Databricks.	`Llama-2-7b-chat-hf`	1xA10-24GB
02_mlflow_logging_inference	Save, register, and load Llama 2 models with MLflow, and create a Databricks model serving endpoint.	`Llama-2-7b-chat-hf`	1xA10-24GB
02_[chat]_mlflow_logging_inference	Save, register, and load Llama 2 models with MLflow, and create a Databricks model serving endpoint for chat completion.	`Llama-2-7b-chat-hf`	1xA10-24GB
03_serve_driver_proxy	Serve Llama 2 models on the cluster driver node using Flask.	`Llama-2-7b-chat-hf`	1xA10-24GB
03_[chat]_serve_driver_proxy	Serve Llama 2 models as chat completion on the cluster driver node using Flask.	`Llama-2-7b-chat-hf`	1xA10-24GB
04_langchain	Integrate a serving endpoint or cluster driver proxy app with LangChain and query.	N/A	N/A
04_[chat]_langchain	Integrate a serving endpoint and setup langchain chat model.	N/A	N/A
05_fine_tune_deepspeed	Fine-tune Llama 2 base models leveraging DeepSpeed.	`Llama-2-7b-hf`	4xA10 or 2xA100-80GB
06_fine_tune_qlora	Fine-tune Llama 2 base models with QLORA.	`Llama-2-7b-hf`	1xA10
07_ai_gateway	Manage a MLflow AI Gateway Route that accesses a Databricks model serving endpoint.	N/A	N/A