<a href="https://colab.research.google.com/github/asiopta/LO17-RAG/blob/main/LO17_Projet.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Dépendance langchain chroma gemini

In [None]:
!pip install langchain-core langchain langchain-google-genai -U langchain-community chromadb



### Bibliothèques pour le projet

In [None]:
from langchain import PromptTemplate
from langchain import hub
from langchain.docstore.document import Document
from langchain.document_loaders import WebBaseLoader
from langchain.schema import StrOutputParser
from langchain.schema.prompt_template import format_document
from langchain.schema.runnable import RunnablePassthrough
from langchain.vectorstores import Chroma



### Import API google

In [None]:
import os
from google.colab import userdata
GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')

os.environ["GOOGLE_API_KEY"] = GOOGLE_API_KEY

### Pour faire fonctionner la cellule ci-dessus, suivre le tuto https://github.com/google-gemini/cookbook/blob/main/quickstarts/Authentication.ipynb

### Bibliothèque pour interagir avec les services IA de Google

In [None]:
pip install -qU 'google-genai>=1.0.0'

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/196.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m196.3/196.3 kB[0m [31m6.7 MB/s[0m eta [36m0:00:00[0m
[?25h

### Paramétrage API  

In [None]:
from google import genai
from google.colab import userdata

GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')
client = genai.Client(api_key=GOOGLE_API_KEY)

In [None]:
MODEL_ID = "gemini-2.5-flash-preview-05-20" # @param ["gemini-2.0-flash-lite","gemini-2.0-flash","gemini-2.5-flash-preview-05-20","gemini-2.5-pro-preview-05-06"] {"allow-input":true, isTemplate: true}

### Ci-dessous un exemple pour tester que tout est bien paramétré :

In [None]:
from IPython.display import Markdown

response = client.models.generate_content(model=MODEL_ID, contents="Please give me python code to sort a list.")

display(Markdown(response.text))

Python provides very straightforward ways to sort lists. There are two primary methods:

1.  **`list.sort()`**: This method sorts the list **in-place**, meaning it modifies the original list and returns `None`.
2.  **`sorted()` (built-in function)**: This function returns a **new sorted list**, leaving the original list unchanged. It can sort any iterable (not just lists).

Let's look at examples for both, along with common customizations.

---

## 1. Using `list.sort()` (In-Place Sorting)

This method is called directly on a list object.

```python
# --- Basic Ascending Sort ---
my_list = [3, 1, 4, 1, 5, 9, 2, 6]
print("Original list:", my_list)

my_list.sort() # Sorts the list in-place
print("Sorted in-place (ascending):", my_list)

# --- Descending Sort ---
another_list = [7, 0, 8, 4, 2, 1]
print("\nOriginal list (another):", another_list)

another_list.sort(reverse=True) # Sorts in descending order
print("Sorted in-place (descending):", another_list)

# --- Important Note: it returns None ---
# sorted_result = my_list.sort() # DON'T DO THIS if you expect the sorted list back
# print(sorted_result) # This will print None
```

**Output for `list.sort()`:**

```
Original list: [3, 1, 4, 1, 5, 9, 2, 6]
Sorted in-place (ascending): [1, 1, 2, 3, 4, 5, 6, 9]

Original list (another): [7, 0, 8, 4, 2, 1]
Sorted in-place (descending): [8, 7, 4, 2, 1, 0]
```

---

## 2. Using `sorted()` (Returns a New List)

This built-in function takes an iterable (like a list) as an argument and returns a brand new sorted list.

```python
# --- Basic Ascending Sort ---
original_numbers = [3, 1, 4, 1, 5, 9, 2, 6]
print("Original numbers:", original_numbers)

new_sorted_numbers = sorted(original_numbers) # Returns a NEW sorted list
print("New sorted numbers (ascending):", new_sorted_numbers)
print("Original numbers after sorted():", original_numbers) # Original is unchanged

# --- Descending Sort ---
data_points = [7, 0, 8, 4, 2, 1]
print("\nOriginal data points:", data_points)

new_descending_data = sorted(data_points, reverse=True) # Returns new list in descending order
print("New sorted data (descending):", new_descending_data)
print("Original data points after sorted():", data_points) # Original is unchanged
```

**Output for `sorted()`:**

```
Original numbers: [3, 1, 4, 1, 5, 9, 2, 6]
New sorted numbers (ascending): [1, 1, 2, 3, 4, 5, 6, 9]
Original numbers after sorted(): [3, 1, 4, 1, 5, 9, 2, 6]

Original data points: [7, 0, 8, 4, 2, 1]
New sorted data (descending): [8, 7, 4, 2, 1, 0]
Original data points after sorted(): [7, 0, 8, 4, 2, 1]
```

---

## 3. Custom Sorting with the `key` Argument

Both `list.sort()` and `sorted()` accept a `key` argument. This argument should be a function that takes one argument (an element from the list) and returns a value to be used for comparison.

```python
# --- Sort a list of strings by their length ---
words = ["apple", "banana", "kiwi", "grapefruit", "date"]
print("\nOriginal words:", words)
sorted_by_length = sorted(words, key=len)
print("Sorted by length:", sorted_by_length)

# --- Case-insensitive sort for strings ---
names = ["Alice", "bob", "Charlie", "David"]
print("\nOriginal names:", names)
sorted_case_insensitive = sorted(names, key=str.lower)
print("Sorted case-insensitive:", sorted_case_insensitive)

# --- Sort a list of tuples by the second element ---
# We use a lambda function here for a concise way to define a small, anonymous function
items = [("itemA", 10), ("itemB", 5), ("itemC", 12), ("itemD", 8)]
print("\nOriginal items:", items)
sorted_by_second_element = sorted(items, key=lambda item: item[1])
print("Sorted by second element:", sorted_by_second_element)

# Sort by second element, then by first (if second elements are equal)
sorted_by_second_then_first = sorted(items, key=lambda item: (item[1], item[0]))
print("Sorted by second then first:", sorted_by_second_then_first)

# --- Sort a list of dictionaries by a specific key ---
users = [
    {"name": "Alice", "age": 30},
    {"name": "Bob", "age": 25},
    {"name": "Charlie", "age": 30}
]
print("\nOriginal users:", users)
sorted_by_age = sorted(users, key=lambda user: user["age"])
print("Sorted users by age:", sorted_by_age)

# Sort by age, then by name (for ties in age)
sorted_by_age_then_name = sorted(users, key=lambda user: (user["age"], user["name"]))
print("Sorted users by age then name:", sorted_by_age_then_name)
```

**Output for `key` examples:**

```
Original words: ['apple', 'banana', 'kiwi', 'grapefruit', 'date']
Sorted by length: ['kiwi', 'date', 'apple', 'banana', 'grapefruit']

Original names: ['Alice', 'bob', 'Charlie', 'David']
Sorted case-insensitive: ['Alice', 'bob', 'Charlie', 'David']

Original items: [('itemA', 10), ('itemB', 5), ('itemC', 12), ('itemD', 8)]
Sorted by second element: [('itemB', 5), ('itemD', 8), ('itemA', 10), ('itemC', 12)]
Sorted by second then first: [('itemB', 5), ('itemD', 8), ('itemA', 10), ('itemC', 12)]

Original users: [{'name': 'Alice', 'age': 30}, {'name': 'Bob', 'age': 25}, {'name': 'Charlie', 'age': 30}]
Sorted users by age: [{'name': 'Bob', 'age': 25}, {'name': 'Alice', 'age': 30}, {'name': 'Charlie', 'age': 30}]
Sorted users by age then name: [{'name': 'Bob', 'age': 25}, {'name': 'Alice', 'age': 30}, {'name': 'Charlie', 'age': 30}]
```

---

## When to use which?

*   Use `list.sort()` when you **don't need to keep the original order** of the list and want to save some memory (as it doesn't create a new list).
*   Use `sorted()` when you **need to preserve the original list** or when you want to sort an iterable that isn't a list (like a tuple, set, or dictionary items). It's generally safer if you're unsure, as it avoids unintended side effects.

### Le loader : pour parser la base de documents => Site, documents PDF, CSV, etc. À adapter selon la base considérée

In [None]:
loader = WebBaseLoader("https://blog.google/technology/ai/google-gemini-ai/") # À définir
docs = loader.load()

#### Mettre du code ici pour gérer notre base de documents/fichiers/données

In [None]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings

gemini_embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

In [None]:
vectorstore = Chroma.from_documents(
    documents=docs,                 # Les documents chargés
    embedding=gemini_embeddings,    # Modèle d'embedding
    persist_directory="./chroma_db" # Emplacement de la base de données
)

vectorstore_disk = Chroma(
                        persist_directory="./chroma_db",       # Directory of db
                        embedding_function=gemini_embeddings   # Embedding model
                   )

  vectorstore_disk = Chroma(


In [None]:
retriever = vectorstore_disk.as_retriever(search_kwargs={"k": 1})

In [None]:
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash")

In [None]:
# Prompt template to query Gemini
llm_prompt_template = """You are an assistant for question-answering tasks.
Use the following context to answer the question.
If you don't know the answer, just say that you don't know.
Use five sentences maximum and keep the answer concise.\n
Question: {question} \nContext: {context} \nAnswer:"""

llm_prompt = PromptTemplate.from_template(llm_prompt_template)

print(llm_prompt)

input_variables=['context', 'question'] input_types={} partial_variables={} template="You are an assistant for question-answering tasks.\nUse the following context to answer the question.\nIf you don't know the answer, just say that you don't know.\nUse five sentences maximum and keep the answer concise.\n\nQuestion: {question} \nContext: {context} \nAnswer:"


In [None]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | llm_prompt
    | llm
    | StrOutputParser()
)

In [None]:
rag_chain.invoke("What is Gemini?")

"Gemini is Google's most capable and general AI model. It was built from the ground up to be multimodal. This means it can generalize and seamlessly understand, operate across, and combine different types of information. This includes text, code, audio, image, and video. The first version, Gemini 1.0, is optimized for different sizes: Ultra, Pro, and Nano."

### Application streamlit

Installer Streamlit, Node.js, npm et localtunnel

In [None]:
!pip install streamlit -q && apt-get install -y nodejs npm && npm install -g localtunnel

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  gyp javascript-common libc-ares2 libjs-events libjs-highlight.js
  libjs-inherits libjs-is-typedarray libjs-psl libjs-source-map
  libjs-sprintf-js libjs-typedarray-to-buffer libnode-dev libnode72
  libnotify-bin libnotify4 libuv1-dev node-abab node-abbrev node-agent-base
  node-ansi-regex node-ansi-styles node-ansistyles node-aproba node-archy
  node-are-we-there-yet node-argparse node-arrify node-asap node-asynckit
  node-balanced-match node-brace-expansion node-builtins node-cacache
  node-chalk node-chownr node-clean-yaml-object node-cli-table node-clone
  node-color-convert node-color-name node-colors node-columnify
  node-combined-stream node-commander node-console-control-strings
  node-copy-concurrently node-core-util-is node-coveralls node-cssom
  node-cssstyle node-debug node-decompress-response node-defaults
  node-delayed-st

#### Paramètre à renseigner pour lancer l'app streamit (prochaine cellule)

In [None]:
!wget -q -O - ipv4.icanhazip.com

104.196.27.92


#### App Streamlit (Voir fichier app.py pour modifs)

In [None]:
! streamlit run app.py & npx localtunnel --port 8501


Collecting usage statistics. To deactivate, set browser.gatherUsageStats to false.
[0m
[0m
[34m[1m  You can now view your Streamlit app in your browser.[0m
[0m
[34m  Local URL: [0m[1mhttp://localhost:8501[0m
[34m  Network URL: [0m[1mhttp://172.28.0.12:8501[0m
[34m  External URL: [0m[1mhttp://104.196.27.92:8501[0m
[0m
your url is: https://brown-grapes-battle.loca.lt
[34m  Stopping...[0m
^C
