# Welcome to the Tutorial for RAG

#### Requirements
- Python 3.12 version

#### Before we start, please do make a venv first. So we can only use the selected library and not mix it up with other library that you usually use
python -m venv path_to_the_library\\.venv

Activate the venv and install uv with:
```bash
.venv\Scripts\activate
pip install uv
```

Also to use the uv, you need to check the pyproject.toml and run this in your bash
```bash
uv lock
uv sync
```

In [1]:
from google import genai
from pydantic import BaseModel
import os

from dotenv import load_dotenv

In [None]:
load_dotenv()

class Response(BaseModel):
    capital: str
    area: float

client = genai.Client(
    api_key=os.getenv("GEMINI_API_KEY"),
)

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="What the capital of Japan and the area of it is meter squared?",
    config={
        "response_mime_type": "application/json",
        "response_schema": Response,
    },
)

print(response.text)

{"capital": "Tokyo", "area": 2194000000}


In [3]:
import json
print("The capital of japan is " + json.loads(response.text)['capital'] + " with area " + str(json.loads(response.text)['area']) + " meter squared.")

The capital of japan is Tokyo with area 2194000000 meter squared.


In [4]:
from markitdown import MarkItDown
from chonkie import RecursiveChunker, Visualizer


md = MarkItDown()

source_file= "sample pdf.pdf"
result = md.convert(source_file)

markdown_doc = result.text_content

chunker = RecursiveChunker.from_recipe("markdown", lang="en")
chunks = chunker.chunk(markdown_doc)

viz = Visualizer()
viz.save("chonkie.html", chunks)

  from .autonotebook import tqdm as notebook_tqdm


HTML visualization saved to: file://d:\DOWNLOAD\Training Binus\chonkie.html


In [5]:
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    length_function=len,
    is_separator_regex=False,
    separators=["\n\n", "\n", " ", ""] # The default hierarchy
)

load_docs = PyPDFLoader("sample pdf.pdf")
pages = load_docs.load()

chunks = text_splitter.split_documents(pages)

print(f"Total pages: {len(pages)}")
print(f"Total chunks generated: {len(chunks)}")
print("-- Example Chunk --")
print(chunks[0].page_content)
print("-- Metadata --")
print(chunks[0].metadata)

Total pages: 3
Total chunks generated: 7
-- Example Chunk --
THE HIDDEN DANGERS OF ALCOHOL INTOXICATION 
Understanding the Short-Term and Long-Term Risks to Your Health and Safety 
1. Introduction 
While alcohol is socially accepted in many cultures, "getting drunk" (intoxication) 
places the body under immense stress. Alcohol is a central nervous system depressant 
that rapidly impairs brain function, physical coordination, and judgment. This document 
outlines the critical dangers associated with excessive alcohol consumption, ranging 
from immediate physical threats to life-altering long-term diseases. 
 
2. Immediate Dangers (The "Here and Now") 
When you get drunk, your Blood Alcohol Concentration (BAC) rises, leading to 
immediate physiological threats. 
Alcohol Poisoning (Medical Emergency) 
Consuming a large amount of alcohol in a short time (binge drinking) can overwhelm the 
liver's ability to process it. 
 Symptoms: Confusion, vomiting, seizures, slow breathing (less than 8

In [26]:
client.delete_collection(
    name="gemini_demo")

In [29]:
import chromadb
import chromadb.utils.embedding_functions as embedding_functions

try:
    client.delete_collection(
        name="gemini_demo")
except:
    pass

google_ef = embedding_functions.GoogleGenerativeAiEmbeddingFunction(
    api_key=os.getenv("GEMINI_API_KEY"),
    model_name="gemini-embedding-001" # or "models/text-embedding-004"
)

client = chromadb.Client()

collection = client.create_collection(
    name="gemini_demo",
    embedding_function=google_ef
)

collection.add(
    ids= [f"chunk_{i}" for i, _ in enumerate(chunks)],
    documents= [doc.page_content for doc in chunks],
    metadatas= [doc.metadata for doc in chunks]

)

In [36]:
collection = client.get_collection("gemini_demo", embedding_function=google_ef)

query = "What is the dangerous of Alcohols?"

results = collection.query(
    query_texts=query,
    n_results=3
)

print(results['documents'][0][0])

THE HIDDEN DANGERS OF ALCOHOL INTOXICATION 
Understanding the Short-Term and Long-Term Risks to Your Health and Safety 
1. Introduction 
While alcohol is socially accepted in many cultures, "getting drunk" (intoxication) 
places the body under immense stress. Alcohol is a central nervous system depressant 
that rapidly impairs brain function, physical coordination, and judgment. This document 
outlines the critical dangers associated with excessive alcohol consumption, ranging 
from immediate physical threats to life-altering long-term diseases. 
 
2. Immediate Dangers (The "Here and Now") 
When you get drunk, your Blood Alcohol Concentration (BAC) rises, leading to 
immediate physiological threats. 
Alcohol Poisoning (Medical Emergency) 
Consuming a large amount of alcohol in a short time (binge drinking) can overwhelm the 
liver's ability to process it. 
 Symptoms: Confusion, vomiting, seizures, slow breathing (less than 8 breaths a 
minute), blue-tinged skin, and unconsciousness.


In [42]:
# Format the context from the search results
context = "\n".join(results["documents"][0])

# Create the prompt with the context and query
prompt = f"""Use the following context to answer the question. If you cannot answer the question based on the context, say "I cannot answer this based on the provided context."

Context:
{context}

Question: {query}"""

client = genai.Client(
    api_key=os.getenv("GEMINI_API_KEY"),
)



response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents= prompt,
    config={
        "system_instruction": "You are a helpful assistant that answers questions based on the provided context.",
        "temperature": "0",
    }
)

print(response.text)

The dangers of alcohol intoxication include both immediate and long-term risks:

**Immediate Dangers:**
*   **Alcohol Poisoning:** Can lead to confusion, vomiting, seizures, slow breathing, blue-tinged skin, unconsciousness, choking on vomit, or stopping the heart or breathing.
*   **Accidents and Physical Injury:** Increases the risk of motor vehicle crashes, falls, drownings, and accidental burns due to slowed reaction times and impaired balance.
*   **Impaired Judgment and Risky Behavior:** Can fuel aggression and violence, and increase the likelihood of unprotected sex, leading to sexually transmitted infections (STIs) or unintended pregnancies.

**Long-Term Health Risks:**
*   **Organ Damage:**
    *   **Liver:** Can cause fatty liver, alcoholic hepatitis, and cirrhosis.
    *   **Pancreas:** Can lead to pancreatitis.
    *   **Brain:** Can result in brain damage, affecting memory, learning, and cognitive function.
*   **Cancers:** Increases the risk of liver cancer, breast cancer