In the previous module, we used Gemini 1.5 Flash via Google API. It's a very convenient way to use an LLM, but you have to pay for the usage, and you don't have control over the model you get to use.
In this module, we'll look at using open-source LLMs instead.
YouTube Class: 2.1 - Introduction to Open-Source
- Open-Source LLMs
- Replacing the LLM box in the RAG flow
YouTube Class: 2.2 - Using SaturnCloud for GPU Notebooks
- Registering in Saturn Cloud
- Configuring secrets and git
- Creating an instance with a GPU
Bonus: Using Google Colab for GPU Notebooks
This is my personal choice!
YouTube Class: 2.3 - HuggingFace and Google FLAN T5
- HuggingFace Model:
google/flan-t5-xl
- Jupyter Notebook: Model_FlanT5.ipynb
- References:
- Phi 3 Mini
- YouTube Class: 2. 4 - Phi 3 Mini
- HuggingFace Model:
microsoft/Phi-3-mini-128k-instruct
- Reference: huggingface.co/microsoft/Phi-3-mini-128k-instruct
- Mistral-7B and HuggingFace Hub Authentication
- YouTube Class: 2.5 - Mistral-7B
- HuggingFace Model:
mistralai/Mistral-7B-v0.1
- Reference: huggingface.co/mistralai/Mistral-7B-v0.1
- Exploring Open-Source Models
YouTube Class: 2.7 - Running LLMs Locally without a GPU with Ollama
Install and start Ollama
:
curl -fsSL https://ollama.com/install.sh | sh
ollama start
Pull (only once) and run locally a model:
ollama pull phi3
ollama run phi3
then a chat with the model will be opened from the command line interface.
-
Intro_RAG.ipynb: RAG system using
phi3
local model- Pros: open-source models and codes, totally free
- Cons: very slow running and probably not scalable