## Content based Q & A chatbot using RAG and LLAMA2 running locally

In [12]:
# Installations
!pip install gpt4all -q # generate text embeddings

In [19]:
!pip install faiss-cpu -q # faiss for similarity search (fast vector retreiaval)

In [20]:
# Imports
import numpy as np
import json
import requests
from bs4 import BeautifulSoup
import re
import faiss
from gpt4all import GPT4All, Embed4All

## Step 1: Get the content

In [1]:
response = requests.get(
    "https://www.deeplearning.ai/the-batch/a-roadmap-explores-how-ai-can-detect-and-mitigate-greenhouse-gases/"
)
html_doc = response.text
soup = BeautifulSoup(html_doc, "html.parser")
tag = soup.find("div", re.compile("^prose--styled"))
text = tag.text
print(text)

How can AI help to fight climate change? A new report evaluates progress so far and explores options for the future.What’s new: The Innovation for Cool Earth Forum, a conference of climate researchers hosted by Japan, published a roadmap for the use of data science, computer vision, and AI-driven simulation to reduce greenhouse gas emissions. The roadmap evaluates existing approaches and suggests ways to scale them up.How it works: The roadmap identifies 6 “high-potential opportunities”: activities in which AI systems can make a significant difference based on the size of the opportunity, real-world results, and validated research. The authors emphasize the need for data, technical and scientific talent, computing power, funding, and leadership to take advantage of these opportunities.Monitoring emissions. AI systems analyze data from satellites, drones, and ground sensors to measure greenhouse gas emissions. The European Union uses them to measure methane emissions, environmental orga

In [2]:
# save content as txt
file_name = "AI_greenhouse_gas.txt"
with open(file_name, 'w') as file:
    file.write(text)

## Step 2: Chunking ~ divide text into chunks

In [5]:
# Chunking
chunk_size = 512
chunks = [text[i:i+chunk_size] for i in range(0, len(text), chunk_size)]

In [6]:
len(chunks)

8

## Step 3: Convert chunks into embeddings

In [8]:
# Example - text embeddings without API key
text = chunks[0]
embedder = Embed4All()
output = embedder.embed(text)
print(output)


Downloading: 100%|██████████| 45.9M/45.9M [00:07<00:00, 5.96MiB/s]
Verifying: 100%|██████████| 45.9M/45.9M [00:00<00:00, 558MiB/s]


[-0.018320398405194283, 0.031668659299612045, 0.0773177221417427, 0.05653655529022217, 0.07377296686172485, -0.012319106608629227, -0.037801001220941544, -0.02757413126528263, -0.010017039254307747, -0.0032897284254431725, -0.10266296565532684, -0.10423977673053741, 0.041975993663072586, 0.04529893025755882, -0.0004000934131909162, 0.0842086672782898, -0.06712917238473892, -0.03083486109972, -0.04806686192750931, -0.09034476429224014, -0.03866979852318764, 0.044794049113988876, -0.0201836246997118, -0.03864751383662224, -0.007165068294852972, 0.1141832023859024, -0.0014168836642056704, -0.007619751151651144, -0.07528069615364075, 0.08937923610210419, 0.025685830041766167, -0.005953062791377306, 0.010119503363966942, 0.03668002411723137, -0.009986447170376778, 0.00907193124294281, 0.005740094929933548, 0.0240167248994112, 0.013303413055837154, 0.02832617796957493, -0.041909851133823395, -0.07120227068662643, 0.010882719419896603, -0.07969506829977036, 0.08555101603269577, 0.095650829374

In [9]:
def get_text_embeddings(text):
    embedder = Embed4All()
    output = embedder.embed(text)
    return output

In [10]:
text_embeddings = np.array([get_text_embeddings(chunk) for chunk in chunks])
text_embeddings

array([[-0.0183204 ,  0.03166866,  0.07731772, ..., -0.00083027,
        -0.10891361, -0.0249987 ],
       [ 0.02102805,  0.00987257,  0.00708334, ...,  0.03518564,
        -0.07391566, -0.0995004 ],
       [-0.03956671, -0.0004522 ,  0.07877214, ...,  0.01673155,
        -0.06144706, -0.00548403],
       ...,
       [-0.05741516,  0.01029754,  0.0467902 , ...,  0.00423208,
         0.00068158, -0.00826758],
       [-0.05094521,  0.07352471,  0.0106987 , ..., -0.03207523,
        -0.01087563, -0.03456756],
       [ 0.00239646,  0.0342568 ,  0.0919314 , ..., -0.02383728,
        -0.05840307, -0.05525459]])

In [11]:
len(text_embeddings[0])

384

## Step 4: Embed the user query

In [13]:
question = "What are the ways that AI can reduce emissions in Agriculture?"
question_embeddings = np.array([get_text_embeddings(question)])

In [16]:
question_embeddings.shape

(1, 384)

## Step 5: Search for chunks similar to query
Perform a search operation on an index using fast vector retrieval, such as Faiss, where function returns two arrays: 

-- D, which contains the distances of the nearest neighbors, and 

-- I, which contains the indices of these neighbors in the dataset. 

The parameter k=2 specifies that the two closest neighbors for each query embedding should be returned.

In [21]:
dimension = question_embeddings.shape[1]  # assuming question_embeddings is a 2D array
index = faiss.IndexFlatL2(dimension)

In [22]:
D, I = index.search(question_embeddings, k = 2)
print(I)

[[-1 -1]]


In [23]:
retrieved_chunk = [chunks[i] for i in I.tolist()[0]]
print(retrieved_chunk)

['eration, manufacturing, food production, and transportation — could make a significant dent in greenhouse gas emissions.We’re thinking:\xa0AI also has an important role to play in advancing the science of climate geoengineering, such as stratospheric aerosol injection (SAI), to cool down the planet. More research is needed to determine whether SAI is a good idea, but AI-enabled climate modeling will help answer this question.', 'eration, manufacturing, food production, and transportation — could make a significant dent in greenhouse gas emissions.We’re thinking:\xa0AI also has an important role to play in advancing the science of climate geoengineering, such as stratospheric aerosol injection (SAI), to cool down the planet. More research is needed to determine whether SAI is a good idea, but AI-enabled climate modeling will help answer this question.']


In [24]:

prompt = f"""
Context information is below.
---------------------
{retrieved_chunk}
---------------------
Given the context information and not prior knowledge, answer the query.
Query: {question}
Answer:
"""

## Combined QA with Context

In [33]:
# Combine that all in a function
def qa_with_context(text, question, chunk_size=512):
    
    chunks = [text[i : i + chunk_size] for i in range(0, len(text), chunk_size)]
   
    text_embeddings = np.array([get_text_embeddings(chunk) for chunk in chunks])
    d = text_embeddings.shape[1]

    index = faiss.IndexFlatL2(d)
    index.add(text_embeddings)
    
    question_embeddings = np.array([get_text_embeddings(question)])
    
    D, I = index.search(question_embeddings, k=2)
    retrieved_chunk = [chunks[i] for i in I.tolist()[0]]

    prompt = f"""
    Context information is below.
    ---------------------
    {retrieved_chunk}
    ---------------------
    Given the context information and not prior knowledge, answer the query.
    Query: {question}
    Answer:
    """

    url = "http://localhost:11434/api/generate"
    data = {
        "model": "llama2",
        "prompt": prompt.format(user_input = prompt, relevant_document = retrieved_chunk)
    }
    headers = {'Content-Type': 'application/json'}
    response = requests.post(url, headers=headers, data=json.dumps(data), stream = True)
    full_response = []
    try:
        count = 0
        for line in response.iter_lines():
            if line:
                decoded_line = json.loads(line.decode('utf-8'))
                full_response.append(decoded_line['response'])
    finally:
        response.close()
    return ' '.join(full_response)

   

In [34]:
text = text
question = """
What are the ways AI can mitigate climate change in transportation?
"""
print(qa_with_context(text, question))


 B ased  on  the  provided  road map ,  there  are  several  ways  that  A I  can  help  mit ig ate  climate  change  in  the  transport ation  sector .  The  road map  ident ifies  six  " high - pot ential  opportun ities "  for  using  data  science ,  computer  vision ,  and  A I - dri ven  simulation  to  reduce  green house  gas  em issions : 
 
 1 .  Aut onom ous  electric  vehicles :  A I  can  optimize  routes  and  sched ules  for  electric  vehicles ,  reducing  the  need  for  foss il  fu els  and  lower ing  em issions . 
 2 .  Sm art  traffic  management :  A I  can  analyze  real - time  traffic  data  and  optimize  traffic  flow ,  reducing  con g estion  and  lower ing  em issions . 
 3 .  Hyper loop  systems :  A I  can  help  design  and  optimize  hyper loop  systems ,  which  could  revolution ize  transport ation  by  reducing  travel  times  and  em issions . 
 4 .  Electric  vehicle  charg ing  infrastr ucture :  A I  can  optimize  the  pla cement  and  operat