## Download and setup the Elasticsearch instance


In [None]:
!pip install elasticsearch==7.13.4 sentence-transformers faiss-cpu
!pip install -q -U google-generativeai

Collecting elasticsearch==7.13.4
  Downloading elasticsearch-7.13.4-py2.py3-none-any.whl.metadata (7.7 kB)
Collecting sentence-transformers
  Downloading sentence_transformers-3.1.0-py3-none-any.whl.metadata (23 kB)
Collecting faiss-cpu
  Downloading faiss_cpu-1.8.0.post1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.7 kB)
Collecting urllib3<2,>=1.21.1 (from elasticsearch==7.13.4)
  Downloading urllib3-1.26.20-py2.py3-none-any.whl.metadata (50 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.1/50.1 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
Downloading elasticsearch-7.13.4-py2.py3-none-any.whl (356 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m356.3/356.3 kB[0m [31m9.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading sentence_transformers-3.1.0-py3-none-any.whl (249 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m249.1/249.1 kB[0m [31m14.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading faiss_cpu

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/165.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m165.0/165.0 kB[0m [31m8.6 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/725.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m725.4/725.4 kB[0m [31m30.0 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
%%bash

wget -q https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-oss-7.9.2-linux-x86_64.tar.gz
wget -q https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-oss-7.9.2-linux-x86_64.tar.gz.sha512
tar -xzf elasticsearch-oss-7.9.2-linux-x86_64.tar.gz
sudo chown -R daemon:daemon elasticsearch-7.9.2/
shasum -a 512 -c elasticsearch-oss-7.9.2-linux-x86_64.tar.gz.sha512

elasticsearch-oss-7.9.2-linux-x86_64.tar.gz: OK


Run the instance as a daemon process

In [None]:
%%bash --bg

sudo -H -u daemon elasticsearch-7.9.2/bin/elasticsearch

In [None]:
# Sleep for few seconds to let the instance start.
import time
time.sleep(40)

Once the instance has been started, grep for `elasticsearch` in the processes list to confirm the availability.

In [None]:
%%bash

ps -ef | grep elasticsearch

root       65823   65821  0 00:05 ?        00:00:00 sudo -H -u daemon elasticsearch-7.9.2/bin/elasti
daemon     65824   65823 53 00:05 ?        00:00:21 /content/elasticsearch-7.9.2/jdk/bin/java -Xshar
root       66222   66220  0 00:06 ?        00:00:00 grep elasticsearch


query the base endpoint to retrieve information about the cluster.

In [None]:
%%bash

curl -sX GET "localhost:9200/"

{
  "name" : "ab60910205dc",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "cGF1c94kRN6FSGTG8iJXdg",
  "version" : {
    "number" : "7.9.2",
    "build_flavor" : "oss",
    "build_type" : "tar",
    "build_hash" : "d34da0ea4a966c4e49417f2da2f244e3e97b4e6e",
    "build_date" : "2020-09-23T00:45:33.626720Z",
    "build_snapshot" : false,
    "lucene_version" : "8.6.2",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}


## ElasticSearch Indexing

In [None]:
from elasticsearch import Elasticsearch
from sentence_transformers import SentenceTransformer
import numpy as np
import faiss

# Initialize Elasticsearch and FAISS
es = Elasticsearch(hosts = ["http://localhost:9200"])
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
dimension = 384  # Size of the embedding vector
faiss_index = faiss.IndexFlatL2(dimension)

data =  [
    {
      "title": "Oil Prices Hit Three-Year Low",
      "content": "Global oil demand, led by a slowdown in China, has caused a sharp drop in oil prices, with Brent crude falling to $70 per barrel in early September 2024."
    },
    {
      "title": "Major X Flare Erupts from Sun",
      "content": "A powerful X4.5 solar flare erupted from sunspot AR3825 on September 14, 2024, followed by a coronal mass ejection that could impact Earth's magnetic field."
    },
    {
      "title": "UN Warns of 3°C Global Warming Threat",
      "content": "A new report from the UN weather agency warns that global temperatures could rise by 3°C unless urgent action is taken. 2024 has been the warmest year on record."
    },
    {
      "title": "Typhoon Yagi Devastates Southeast Asia",
      "content": "Typhoon Yagi has severely impacted millions of children across Southeast Asia, causing floods and landslides in Vietnam, Myanmar, Laos, and Thailand."
    },
    {
      "title": "Refugee Detention Practices Criticized",
      "content": "The UN refugee agency has called for an end to the arbitrary detention of asylum-seekers, citing numerous cases where individuals were unlawfully detained."
    }
]

# Index documents into Elasticsearch and FAISS
index_name = "data_index"
for idx, doc in enumerate(data):
    # Index into Elasticsearch (BM25)
    es.index(index=index_name, id=idx, body=doc)

    # Compute embedding and add to FAISS
    embedding = embedding_model.encode([doc['title'], doc['content']])[0]
    faiss_index.add(np.array([embedding]))



## Query and hybrid retrieval

In [None]:
def search_elasticsearch(query, top_k=3):
    # BM25 query search in Elasticsearch
    search_body = {
        "query": {
            "multi_match": {
                "query": query,
                "fields": ["title", "content"]
            }
        }
    }
    response = es.search(index=index_name, body=search_body, size=top_k)
    return [hit['_source'] for hit in response['hits']['hits']]

def search_faiss(query, top_k=2):
    # Semantic search in FAISS
    query_embedding = embedding_model.encode([query])
    _, top_k_indices = faiss_index.search(query_embedding, top_k)

    # Retrieve top-k documents from knowledge base
    retrieved_docs = []
    for idx in top_k_indices[0]:
        doc = es.get(index=index_name, id=idx)
        retrieved_docs.append(doc['_source'])

    return retrieved_docs

def hybrid_search(query, top_k_bm25=2, top_k_semantic=2):
    # Perform BM25 (keyword) and FAISS (semantic) search
    bm25_docs = search_elasticsearch(query, top_k=top_k_bm25)
    faiss_docs = search_faiss(query, top_k=top_k_semantic)

    # Combine and deduplicate results
    unique_docs = {doc['content']: doc for doc in bm25_docs + faiss_docs}.values()
    return list(unique_docs)

# Example Query
# query = "Was there any changes in global oil prices this year?"
query = input("Enter your query: ")
retrieved_docs = hybrid_search(query)
print(retrieved_docs)

Enter your query: Was there any changes in global oil prices this year?
[{'title': 'Oil Prices Hit Three-Year Low', 'content': 'Global oil demand, led by a slowdown in China, has caused a sharp drop in oil prices, with Brent crude falling to $70 per barrel in early September 2024.'}, {'title': 'UN Warns of 3°C Global Warming Threat', 'content': 'A new report from the UN weather agency warns that global temperatures could rise by 3°C unless urgent action is taken. 2024 has been the warmest year on record.'}]


## Enriching the prompt

In [None]:
def enrich_prompt(query, retrieved_docs):
    enriched_prompt = f"User Query: {query}\n\n"
    enriched_prompt += "Related Information:\n"
    for i, doc in enumerate(retrieved_docs):
        enriched_prompt += f"{i+1}. {doc['title']}: {doc['content']}\n"
    return enriched_prompt

# Enrich the prompt with the retrieved documents
enriched_prompt = enrich_prompt(query, retrieved_docs)
print(enriched_prompt)

User Query: Was there any changes in global oil prices this year?

Related Information:
1. Oil Prices Hit Three-Year Low: Global oil demand, led by a slowdown in China, has caused a sharp drop in oil prices, with Brent crude falling to $70 per barrel in early September 2024.
2. UN Warns of 3°C Global Warming Threat: A new report from the UN weather agency warns that global temperatures could rise by 3°C unless urgent action is taken. 2024 has been the warmest year on record.



## Google Gemini API LLM

In [None]:
import google.generativeai as genai

# Google Gemini API credentials
API_KEY = '<Your_API_key>'
genai.configure(api_key=API_KEY)

def generate_response(prompt):
    model = genai.GenerativeModel("gemini-1.5-flash")
    response = model.generate_content(prompt)
    return response.text

# Get response from the LLM using enriched prompt
print(" Without context ".center(int(shutil.get_terminal_size().columns * 0.75), '*'))
print(generate_response(query))

final_response = generate_response(enriched_prompt)
print(" With context ".center(int(shutil.get_terminal_size().columns * 0.75), '*'))
print(final_response)

***************************** Without context *****************************
I do not have access to real-time information, including constantly changing data like global oil prices. 

To get the most up-to-date information on global oil prices, I recommend checking reputable financial news sources like:

* **Bloomberg:** [https://www.bloomberg.com/](https://www.bloomberg.com/)
* **Reuters:** [https://www.reuters.com/](https://www.reuters.com/)
* **Financial Times:** [https://www.ft.com/](https://www.ft.com/)
* **The Wall Street Journal:** [https://www.wsj.com/](https://www.wsj.com/)

You can also use websites like:

* **Oilprice.com:** [https://oilprice.com/](https://oilprice.com/)
* **Investing.com:** [https://www.investing.com/](https://www.investing.com/)

These sources provide current prices, charts, and analysis of the oil market. 

******************************* With context ******************************
Yes, there have been significant changes in global oil prices this year. 
