<a href="https://colab.research.google.com/github/Adeeba-Yusuf/RAG-Project/blob/main/RAG_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install -q sentence-transformers==2.2.2
!pip install -q wikipedia-api
!pip install -q numpy
!pip install -q scipy

In [22]:
from sentence_transformers import SentenceTransformer
query_prompt_name = "s2p_query"
model = SentenceTransformer("all-MiniLM-L6-v2")

In [23]:
from wikipediaapi import Wikipedia
wiki = Wikipedia('RAGDemo/0.1', 'en')
doc = wiki.page('Masashi_Kishimoto').text
paragraphs = doc.split('\n\n') # chunking according to paragraphs

# Displaying all the chunks
import textwrap
for i, paragraph in enumerate(paragraphs):
  wrapped_text = textwrap.fill(paragraph, width=100)
  print("-" * 100)
  print(wrapped_text)
  print("-" * 100)

----------------------------------------------------------------------------------------------------
Masashi Kishimoto (岸本 斉史, Kishimoto Masashi; born November 8, 1974) is a Japanese manga artist. His
manga series, Naruto, which was in serialization from 1999 to 2014, has sold over 250 million copies
worldwide in 46 countries as of May 2019. The series has been adapted into two anime and multiple
films, video games, and related media. Besides the Naruto manga, Kishimoto also personally
supervised the two anime films, The Last: Naruto the Movie and Boruto: Naruto the Movie, and has
written several one-shot stories. In 2019, Kishimoto wrote Samurai 8: The Tale of Hachimaru which
ended in March 2020. From May 2016 through October 2020 he supervised the Boruto: Naruto Next
Generations manga written by Ukyō Kodachi and illustrated by Mikio Ikemoto. In November 2020 it was
announced that he had taken over as writer on the series, replacing Kodachi. A reader of manga from
a young age, Kishimo

In [24]:
embedded_docs = model.encode(paragraphs, normalize_embeddings=True) # working as an in memory Vector Database
shape = embedded_docs.shape
print("Number of chunks: " + str(shape[0]) + " Embedding Dimension: " + str(shape[1]))
embedded_docs[0]

Number of chunks: 22 Embedding Dimension: 384


array([-5.53158447e-02, -4.99413833e-02,  3.43819857e-02,  1.95842069e-02,
       -1.96090490e-02,  3.83331329e-02, -3.16791758e-02,  8.08909535e-02,
       -7.53725122e-04,  7.62919942e-03,  1.58671290e-02,  8.16302821e-02,
        3.87491584e-02,  2.34714057e-02, -2.65551452e-02, -3.60773467e-02,
        3.01194820e-03, -9.82884225e-03,  3.89203280e-02, -6.49283230e-02,
        8.78713951e-02,  2.10801046e-02, -3.84004461e-03, -2.78141275e-02,
        7.29862899e-02, -1.86516047e-02,  4.76359613e-02, -3.69367786e-02,
       -1.06311321e-01, -1.05689326e-02, -1.33707235e-02, -1.67998187e-02,
       -1.66092045e-03, -4.73022163e-02,  6.26750365e-02,  2.19995081e-02,
       -6.12417143e-03,  1.86930802e-02, -3.35102156e-02, -3.44479047e-02,
       -1.68973207e-02,  4.10825349e-02,  3.90486456e-02, -1.01769101e-02,
        1.23618595e-01, -1.43524632e-02,  1.56365447e-02, -5.46976291e-02,
        6.70884550e-03,  8.22690353e-02, -1.07482858e-01, -3.70520577e-02,
        6.96235448e-02, -

In [25]:
query = "What is the most famous creation of Masashi Kishimoto"
embedded_query = model.encode(query, normalize_embeddings=True)
embedded_query.shape

(384,)

In [26]:
import numpy as np
similarities = np.dot(embedded_docs, embedded_query.T)
similarities.shape
similarities

array([0.56709886, 0.5801046 , 0.55513394, 0.4215477 , 0.442426  ,
       0.3671359 , 0.35548228, 0.36079842, 0.36545494, 0.3421911 ,
       0.3189158 , 0.171488  , 0.3135594 , 0.29851466, 0.28013492,
       0.25392485, 0.23996013, 0.28460288, 0.26549658, 0.49762896,
       0.62300783, 0.5791848 ], dtype=float32)

In [27]:
top_3_idx = np.argsort(similarities)[-3:][::-1].tolist()
top_3_idx

[20, 1, 21]

In [28]:
CONTEXT = ""
for idx in top_3_idx:
    wrapped_text = textwrap.fill(paragraphs[idx], width=100)
    print("-" * 100)
    print(wrapped_text)
    print("-" * 100)
    CONTEXT += wrapped_text + "\n\n"

----------------------------------------------------------------------------------------------------
Influences and style While Kishimoto enjoyed reading manga as a child, he was inspired to write one
after seeing a promotional image for the film Akira. This made him analyze the artwork of Akira's
original author, Katsuhiro Otomo, as well as Akira Toriyama, another artist he admired. Realizing
both had their own style regarding the designs, Kishimoto decided to draw manga while crafting his
own images. While attending art school, Kishimoto was also an avid reader of Hiroaki Samura's Blade
of the Immortal, and extensively studied Samura's page layouts, action sequences, and anatomical
techniques. When Kishimoto was originally creating the Naruto series, he looked to other shōnen
manga for influences while attempting to make his characters as unique as possible. Kishimoto cites
Akira Toriyama's Dragon Ball series as one of his influences, noting that Goku, the protagonist, was
a key fact

In [29]:
prompt = f"""
use the following CONTEXT to answer the QUESTION at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

CONTEXT: {CONTEXT}
QUESTION: {query}

"""

In [30]:
!pip install -q -U google-generativeai

In [34]:
import google.generativeai as genai
# Used to securely store your API key
from google.colab import userdata

# Or use `os.getenv('GOOGLE_API_KEY')` to fetch an environment variable.
GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')

genai.configure(api_key=GOOGLE_API_KEY)

# Displays all the available models that are free to use, though they are subject to rate limits.
for m in genai.list_models():
    print(m.name, m.supported_generation_methods)

model = genai.GenerativeModel('gemini-1.5-flash')
# Calculates total number of tokens used for the api call
model.count_tokens(prompt)

models/embedding-gecko-001 ['embedText', 'countTextTokens']
models/gemini-1.5-pro-latest ['generateContent', 'countTokens']
models/gemini-1.5-pro-002 ['generateContent', 'countTokens', 'createCachedContent']
models/gemini-1.5-pro ['generateContent', 'countTokens']
models/gemini-1.5-flash-latest ['generateContent', 'countTokens']
models/gemini-1.5-flash ['generateContent', 'countTokens']
models/gemini-1.5-flash-002 ['generateContent', 'countTokens', 'createCachedContent']
models/gemini-1.5-flash-8b ['createCachedContent', 'generateContent', 'countTokens']
models/gemini-1.5-flash-8b-001 ['createCachedContent', 'generateContent', 'countTokens']
models/gemini-1.5-flash-8b-latest ['createCachedContent', 'generateContent', 'countTokens']
models/gemini-2.5-pro-preview-03-25 ['generateContent', 'countTokens', 'createCachedContent', 'batchGenerateContent']
models/gemini-2.5-flash-preview-05-20 ['generateContent', 'countTokens', 'createCachedContent', 'batchGenerateContent']
models/gemini-2.5-fl

total_tokens: 1713

In [35]:
response = model.generate_content(prompt)
print(response.text)

The provided text states that Masashi Kishimoto created the *Naruto* series.

