<img src="https://drive.google.com/uc?export=view&id=1wYSMgJtARFdvTt5g7E20mE4NmwUFUuog" width="200">

[![Build Fast with AI](https://img.shields.io/badge/BuildFastWithAI-GenAI%20Bootcamp-blue?style=for-the-badge&logo=artificial-intelligence)](https://www.buildfastwithai.com/genai-course)
[![EduChain GitHub](https://img.shields.io/github/stars/satvik314/educhain?style=for-the-badge&logo=github&color=gold)](https://github.com/satvik314/educhain)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1_f17Rr1bxQ6aHJT_z2Bpg9pUFbPdj5fM#scrollTo=yT9i6zObtGnM)
## Master Generative AI in 6 Weeks
**What You'll Learn:**
- Build with Latest LLMs
- Create Custom AI Apps
- Learn from Industry Experts
- Join Innovation Community
Transform your AI ideas into reality through hands-on projects and expert mentorship.
[Start Your Journey](https://www.buildfastwithai.com/genai-course)
*Empowering the Next Generation of AI Innovators

## 📚🔍 **LlamaIndex**: Enhancing Language Models with Intelligent Data Integration
LlamaIndex is a powerful Python library that enables seamless data integration for language models. It allows developers to connect, index, and query structured or unstructured data sources, facilitating advanced retrieval-augmented generation (RAG) workflows for LLMs. By transforming data into an optimized format for querying, LlamaIndex simplifies building applications like chatbots, knowledge retrieval systems, and intelligent search tools. It supports various integrations, including databases, APIs, and documents, making it an essential tool for leveraging external data with language models.  



## 🚀📖 **Building a RAG System with Mistral and LlamaIndex**  

### **Install Required Libraries**


In [None]:
!pip install llama-index-llms-mistralai llama_index

###**Configure Mistral API**

In [3]:
import os
from google.colab import userdata
import openai
os.environ["MISTRAL_API_KEY"]=userdata.get('MISTRAL_API_KEY')
os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')
openai.api_key = os.getenv("OPENAI_API_KEY")


In [4]:
import nest_asyncio

nest_asyncio.apply()


import os


from llama_index.core import Settings
from llama_index.llms.mistralai import MistralAI
from llama_index.embeddings.openai import OpenAIEmbedding

llm = MistralAI(model="open-mixtral-8x22b", temperature=0.1)
embed_model = OpenAIEmbedding(model_name="text-embedding-ada-002")


Settings.llm = llm
Settings.embed_model = embed_model

### **📥 Downloading the Dataset**

In [5]:
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf' -O './uber_2021.pdf'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/lyft_2021.pdf' -O './lyft_2021.pdf'

--2024-12-18 17:35:30--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.108.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1880483 (1.8M) [application/octet-stream]
Saving to: ‘./uber_2021.pdf’


2024-12-18 17:35:30 (24.3 MB/s) - ‘./uber_2021.pdf’ saved [1880483/1880483]

--2024-12-18 17:35:30--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/lyft_2021.pdf
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1440303 (1.4M) [application/octet-

### 📂 **Loading the Datasets**


In [6]:
from llama_index.core import SimpleDirectoryReader

uber_docs = SimpleDirectoryReader(input_files=["./uber_2021.pdf"]).load_data()
lyft_docs = SimpleDirectoryReader(input_files=["./lyft_2021.pdf"]).load_data()

### 🏗️ **Building VectorStore Indexes for the Datasets**


In [7]:
from llama_index.core import VectorStoreIndex

uber_index = VectorStoreIndex.from_documents(uber_docs)
uber_query_engine = uber_index.as_query_engine(similarity_top_k=5)

lyft_index = VectorStoreIndex.from_documents(lyft_docs)
lyft_query_engine = lyft_index.as_query_engine(similarity_top_k=5)

### 🤖 **Querying the Dataset**

In [None]:
response = uber_query_engine.query("What is the revenue of uber in 2021?")
print(response)

In [None]:
response = lyft_query_engine.query("What are lyft investments in 2021?")
print(response)

### 🖼️ **Querying Images with a Multi-Modal LLM**


### **Install Required Libraries**


In [None]:
!pip install llama-index-multi-modal-llms-openai
!pip install llama-index-vector-stores-qdrant
!pip install llama_index ftfy regex tqdm
!pip install llama-index-embeddings-clip
!pip install git+https://github.com/openai/CLIP.git
!pip install matplotlib scikit-image

### 🖼️ **Loading Image for Multi-Modal Query**


In [None]:
from llama_index.multi_modal_llms.openai import OpenAIMultiModal

from llama_index.core.multi_modal_llms.generic_utils import load_image_urls


image_urls = [
    "https://res.cloudinary.com/hello-tickets/image/upload/c_limit,f_auto,q_auto,w_1920/v1640835927/o3pfl41q7m5bj8jardk0.jpg",
]

image_documents = load_image_urls(image_urls)

### 🔄 **Initializing OpenAI Multi-Modal LLM**


In [None]:
openai_mm_llm = OpenAIMultiModal(
    model="gpt-4o-mini", max_new_tokens=300
)

### 🖼️ **Generating Description for the Image**

In [None]:
response = openai_mm_llm.complete(
    prompt="Describe the images as an alternative text",
    image_documents=image_documents,
)

print(response)

### 🖼️ **Querying Multiple Images**

In [None]:
from pathlib import Path

input_image_path = Path("input_images")
if not input_image_path.exists():
    Path.mkdir(input_image_path)

In [None]:
!wget "https://docs.google.com/uc?export=download&id=1nUhsBRiSWxcVQv8t8Cvvro8HJZ88LCzj" -O ./input_images/long_range_spec.png
!wget "https://docs.google.com/uc?export=download&id=19pLwx0nVqsop7lo0ubUSYTzQfMtKJJtJ" -O ./input_images/model_y.png
!wget "https://docs.google.com/uc?export=download&id=1utu3iD9XEgR5Sb7PrbtMf1qw8T1WdNmF" -O ./input_images/performance_spec.png
!wget "https://docs.google.com/uc?export=download&id=1dpUakWMqaXR4Jjn1kHuZfB0pAXvjn2-i" -O ./input_images/price.png
!wget "https://docs.google.com/uc?export=download&id=1qNeT201QAesnAP5va1ty0Ky5Q_jKkguV" -O ./input_images/real_wheel_spec.png

In [None]:
from PIL import Image
import matplotlib.pyplot as plt
import os


def plot_images(image_paths):
    images_shown = 0
    plt.figure(figsize=(16, 9))
    for img_path in image_paths:
        if os.path.isfile(img_path):
            image = Image.open(img_path)

            plt.subplot(2, 3, images_shown + 1)
            plt.imshow(image)
            plt.xticks([])
            plt.yticks([])

            images_shown += 1
            if images_shown >= 9:
                break

In [None]:
image_paths = []
for img_path in os.listdir("./input_images"):
    image_paths.append(str(os.path.join("./input_images", img_path)))
plot_images(image_paths)

In [None]:
from llama_index.multi_modal_llms.openai import OpenAIMultiModal
from llama_index.core import SimpleDirectoryReader


image_documents = SimpleDirectoryReader("/content/input_images").load_data()

In [None]:
response = openai_mm_llm.complete(
    prompt="Describe the images as an alternative text",
    image_documents=image_documents,
)

print(response)