<a href="https://colab.research.google.com/github/hamzafarooq/advanced-llms/blob/main/semantic%20cache/semantic_cache_from_scratch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

If you use our code, please cite:

@misc{2024<br>
  title = {Semantic Cache from Scratch},<br>
  author = {Hamza Farooq, Darshil Modi, Kanwal Mehreen, Nazila Shafiei},<br>
  keywords = {Semantic Cache},<br>
  year = {2024},<br>
  copyright = {MIT, non-exclusive license}<br>
}

In [None]:
!pip install -U faiss-cpu sentence_transformers transformers

Collecting faiss-cpu
  Downloading faiss_cpu-1.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (27.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m27.0/27.0 MB[0m [31m58.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting sentence_transformers
  Downloading sentence_transformers-2.5.1-py3-none-any.whl (156 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m156.5/156.5 kB[0m [31m20.4 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: faiss-cpu, sentence_transformers
Successfully installed faiss-cpu-1.8.0 sentence_transformers-2.5.1


In [None]:
import faiss
import sqlite3
from sentence_transformers import SentenceTransformer
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import numpy as np
from pprint import pprint





# Traversaal Ares API Overview

Traversaal Ares API is a cutting-edge solution designed to provide real-time search results generated from user queries. Leveraging advanced Large Language Models (LLMs), Ares connects to the internet to deliver accurate and factual information, including relevant URLs for reference. This API is tailored for speed and efficiency, providing lightning-fast search results within 3-4 seconds. Currently available for free during the beta phase, with priced solutions coming soon.

## Key Features:
- **Real-time Search Results:** Ares API offers unparalleled speed in generating search results.
- **Internet Connectivity:** Connects to the internet to fetch the latest and most accurate information.
- **Lightning-Fast Response:** Delivers search results with URLs in 3-4 seconds.
- **Free Beta Access:** Available for free during the beta phase, with pricing plans to be introduced.
- **Factual and Accurate:** Ensures the information provided is accurate and supported by relevant references.

## Getting Started:
To access the Ares API, sign up at [api.traversaal.ai](https://api.traversaal.ai) and refer to the usage documentation at [docs.traversaal.ai](https://docs.traversaal.ai/docs/intro).

Experience the future of AI-driven search with Traversaal Ares API!


In [None]:
import requests

def make_prediction(data):
    url = "https://api-ares.traversaal.ai/live/predict"
    headers = {
        "x-api-key": "ares_cf0375c172389be831702b331ac848b1b2d6a376ec5f84ce5bd5c2dd0b2121b3",
        "content-type": "application/json"
    }

    payload = {"query": data}

    try:
        response = requests.post(url, json=payload, headers=headers)

        if response.status_code == 200:
            # The request was successful
            print("Request was successful.")
            # If the response contains JSON data, you can parse it using response.json()
            try:
                json_data = response.json()
                #print("Parsed JSON data:", json_data)
                return json_data
            except ValueError:
                print("No JSON data in the response.")
                return None
        else:
            # The request was not successful, handle the error
            print(f"Request failed with status code {response.status_code}.")
            return None
    except requests.exceptions.RequestException as e:
        print(f"Error during request: {e}")
        return None

# Example usage



In [None]:
response=make_prediction(['I am planning my 10th Anniversary, provide me a list of places in Boston which are quiet, private and climate controlled. '])

Request was successful.


In [None]:
response

{'data': {'response_text': "Here are some places in Boston that are quiet, private, and climate controlled for your 10th Anniversary:\n\n1. The Liberty Hotel: This historic hotel offers elegant and private event spaces with climate control for a quiet and intimate celebration.\n\n2. The Lenox Hotel: Located in the heart of Boston, The Lenox Hotel offers luxurious and private venues for a quiet anniversary celebration. Their event spaces are climate controlled for your comfort.\n\n3. The Taj Boston: This iconic hotel features elegant and private event spaces that are perfect for a quiet and intimate anniversary celebration. The venues are climate controlled to ensure your comfort.\n\n4. The Boston Harbor Hotel: With stunning waterfront views, this hotel offers private event spaces that are quiet and climate controlled. It's a perfect choice for a romantic anniversary celebration.\n\n5. The Fairmont Copley Plaza: This historic hotel offers elegant and private event spaces that are climat

In [None]:
pprint(response['data']['response_text'])

('Here are some places in Boston that are quiet, private, and climate '
 'controlled for your 10th Anniversary:\n'
 '\n'
 '1. The Liberty Hotel: This historic hotel offers elegant and private event '
 'spaces with climate control for a quiet and intimate celebration.\n'
 '\n'
 '2. The Lenox Hotel: Located in the heart of Boston, The Lenox Hotel offers '
 'luxurious and private venues for a quiet anniversary celebration. Their '
 'event spaces are climate controlled for your comfort.\n'
 '\n'
 '3. The Taj Boston: This iconic hotel features elegant and private event '
 'spaces that are perfect for a quiet and intimate anniversary celebration. '
 'The venues are climate controlled to ensure your comfort.\n'
 '\n'
 '4. The Boston Harbor Hotel: With stunning waterfront views, this hotel '
 "offers private event spaces that are quiet and climate controlled. It's a "
 'perfect choice for a romantic anniversary celebration.\n'
 '\n'
 '5. The Fairmont Copley Plaza: This historic hotel offers el

In [None]:
response['data']['web_url']

['https://sf.eater.com/maps/best-tacos-san-francisco',
 'https://www.sftravel.com/article/where-to-find-best-tacos-san-francisco',
 'https://lataco.com/san-francisco-best-tacos-guide',
 'https://www.reddit.com/r/AskSF/comments/16bn1w1/best_tacos_in_sf/',
 'https://www.femalefoodie.com/restaurant-reviews/best-tacos-in-san-francisco/',
 'https://www.yelp.com/search?find_desc=Street+Tacos&find_loc=San+Francisco%2C+CA',
 'https://traveloutlandish.com/blog/best-tacos-in-san-francisco-taquerias/',
 'https://www.foodtalkcentral.com/t/sf-chronicle-bay-area-tacos/15225',
 'https://www.yelp.com/search?find_desc=Tacos&find_loc=Outer+Sunset%2C+San+Francisco%2C+CA',
 'https://www.toasttab.com/local/san-francisco-ca-restaurants/dish/tacos']

Instead of using an LLM endpoint, we will be using Ares API for retrieval and generation, however you can replace is with your own rag function in 'generate answer' function

In [None]:
import faiss
import json
import numpy as np
from sentence_transformers import SentenceTransformer
from transformers import AutoTokenizer, AutoModelForCausalLM
import time

class SemanticCaching:
    def __init__(self, json_file='cache.json'):
        # Initialize Faiss index with Euclidean distance
        self.index = faiss.IndexFlatL2(768)  # Use IndexFlatL2 with Euclidean distance
        if self.index.is_trained:
            print('Index trained')

        # Initialize Sentence Transformer model
        self.encoder = SentenceTransformer('all-mpnet-base-v2')


        # Uncomment the following lines to use DialoGPT for question generation
        # self.tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-large")
        # self.model = AutoModelForCausalLM.from_pretrained("microsoft/DialoGPT-large")

        # Set Euclidean distance threshold
        self.euclidean_threshold = 0.3
        self.json_file = json_file
        self.load_cache()

    def load_cache(self):
        # Load cache from JSON file, creating an empty cache if the file is not found
        try:
            with open(self.json_file, 'r') as file:
                self.cache = json.load(file)
        except FileNotFoundError:
            self.cache = {'questions': [], 'embeddings': [], 'answers': [], 'response_text': []}

    def save_cache(self):
        # Save the cache to the JSON file
        with open(self.json_file, 'w') as file:
            json.dump(self.cache, file)

    def ask(self, question: str) -> str:
        # Method to retrieve an answer from the cache or generate a new one
        start_time = time.time()
        try:
            l = [question]
            embedding = self.encoder.encode(l)

            # Search for the nearest neighbor in the index
            D, I = self.index.search(embedding, 1)

            if D[0] >= 0:
                if I[0][0] != -1 and D[0][0] <= self.euclidean_threshold:
                    row_id = int(I[0][0])
                    print(f'Found cache in row: {row_id} with score {1 - D[0][0]}')
                    end_time = time.time()
                    elapsed_time = end_time - start_time
                    print(f"Time taken: {elapsed_time} seconds")
                    return self.cache['response_text'][row_id]

            # Handle the case when there are not enough results or Euclidean distance is not met
            answer, response_text = self.generate_answer(question)

            self.cache['questions'].append(question)
            self.cache['embeddings'].append(embedding[0].tolist())
            self.cache['answers'].append(answer)
            self.cache['response_text'].append(response_text)

            self.index.add(embedding)
            self.save_cache()
            end_time = time.time()
            elapsed_time = end_time - start_time
            print(f"Time taken: {elapsed_time} seconds")

            return response_text
        except Exception as e:
            raise RuntimeError(f"Error during 'ask' method: {e}")

    def generate_answer(self, question: str) -> str:
        # Method to generate an answer using a separate function (make_prediction in this case)
        try:
            result = make_prediction([question])
            response_text = result['data']['response_text']

            return result, response_text
        except Exception as e:
            raise RuntimeError(f"Error during 'generate_answer' method: {e}")


In [None]:
cache = SemanticCaching()



Index trained


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

  return self.fget.__get__(instance, owner)()


tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [None]:
question1 = "What is the capital of France?"
answer1 = cache.ask(question1)
print(answer1)

# Question not seen before, generates answer from LLM

question2 = "Who is the CEO of Apple?"
answer2 = cache.ask(question2)
print(answer2)

# Stores question2, embedding and answer2 in cache

question3 = "Who is the CEO of Facebook?"
answer3 = cache.ask(question3)
print(answer3)

# Finds question2 is similar above threshold
# Returns cached answer2 instead of generating new answer

Request was successful.
Time taken: 2.2254648208618164 seconds
The capital of France is Paris.
Request was successful.
Time taken: 0.8209726810455322 seconds
The CEO of Apple is Timothy Donald Cook. He became the CEO in 2011, succeeding Steve Jobs. Cook joined Apple in 1998 and held various executive positions before becoming CEO. He is known for his successful streamlining of the company's supply chain and operations. Cook has also been involved in philanthropy and advocacy for political reform, cybersecurity, and environmental preservation.
Request was successful.
Time taken: 1.2991752624511719 seconds
The CEO of Facebook is Mark Zuckerberg.


In [None]:
answer4 = cache.ask("What is the Capital of India")
print(answer4)

Request was successful.
Time taken: 1.8540325164794922 seconds
The capital of India is New Delhi.


In [None]:
answer4 = cache.ask("Can you tell me what is the Capital of India")
print(answer4)

Found cache in row: 3 with score 0.80598483979702
Time taken: 0.07313203811645508 seconds
The capital of India is New Delhi.


In [None]:
print(cache.ask('Who is the CEO of Facebook?'))

Found cache in row: 2 with score 1.0
Time taken: 0.07919716835021973 seconds
The CEO of Facebook is Mark Zuckerberg.


In [None]:
print(cache.ask('Who is the current CEO of Google?'))

Request was successful.
Time taken: 2.804334878921509 seconds
The current CEO of Google is Sundar Pichai.


In [None]:
print(cache.ask('Is Sundar Pichai the CEO of Google?'))

Request was successful.
Time taken: 2.261371612548828 seconds
Yes, Sundar Pichai is the CEO of Google.


In [None]:
print(cache.ask('Best local food spots in Edinburgh for a couple?'))

Found cache in row: 6 with score 0.8507776856422424
Time taken: 0.08127784729003906 seconds
Here are some of the best local food spots in Edinburgh:

1. Baba: This restaurant offers exquisite Levantine cuisine with a contemporary Scottish twist. Their mezze platters and slow-cooked lamb shoulder are highly recommended.

2. Dishoom: Known for its long queues, Dishoom is a favorite among locals and visitors alike. It offers delicious Indian cuisine and is particularly famous for its lunch reservations.

3. Purslane: If you're looking for a splurge, Purslane is a great choice. This restaurant specializes in seafood and offers fabulous dishes with excellent service.

4. Mussel Inn: For seafood lovers, Mussel Inn is a must-visit. They serve fantastic seafood dishes in a casual setting.

5. Gordon's Trattoria: This small family-run Italian restaurant on the Royal Mile is highly recommended for its authentic Italian food. It's a favorite among locals and visitors alike.

6. The Piemaker: If y

In [None]:
print(cache.ask('Best local food spots in Edinburgh?'))

Found cache in row: 4 with score 1.0
Time taken: 0.0793464183807373 seconds
Here are some of the best local food spots in Edinburgh:

1. Baba: This restaurant offers exquisite Levantine cuisine with a contemporary Scottish twist. Their mezze platters and slow-cooked lamb shoulder are highly recommended.

2. Dishoom: Known for its long queues, Dishoom is a favorite among locals and visitors alike. It offers delicious Indian cuisine and is particularly famous for its lunch reservations.

3. Purslane: If you're looking for a splurge, Purslane is a great choice. This restaurant specializes in seafood and offers fabulous dishes with excellent service.

4. Mussel Inn: For seafood lovers, Mussel Inn is a must-visit. They serve fantastic seafood dishes in a casual setting.

5. Gordon's Trattoria: This small family-run Italian restaurant on the Royal Mile is highly recommended for its authentic Italian food. It's a favorite among locals and visitors alike.

6. The Piemaker: If you're in the moo

In [None]:
print(cache.ask('Best local food spots in London?'))

Request was successful.
Time taken: 1.5911924839019775 seconds
Here are some of the best local food spots in London:

1. The Laundry - Located in Brixton, this restaurant offers classic dishes with originality and flair. Try their succulent roasted pork belly and cured day-boat seabass.

2. SW16 Bar and Kitchen - Situated in Streatham, this Italian restaurant welcomes pets, children, and noisy friends. Enjoy their rich slow-cooked lamb ragu tagliatelle and delicious cocktails.

3. Plaquemine Lock - This Cajun and Creole restaurant in Angel serves up hearty and flavorsome dishes inspired by the cuisine of New Orleans. Don't miss their gumbo, buttermilk fried chicken, and beignets.

4. Brawn - Located in Columbia Road, this neighborhood restaurant offers a daily menu of seasonal, European-inspired dishes. Try their hand-made pasta and creamy Tiramisu.

5. Gold - Notting Hill's Gold restaurant offers a British-tapas style menu with inventive dishes. Don't miss their burrata, mushrooms on 

In [None]:
print(cache.ask('Best local food spots in London?'))

Found cache in row: 7 with score 1.0
Time taken: 0.06894540786743164 seconds
Here are some of the best local food spots in London:

1. The Laundry - Located in Brixton, this restaurant offers classic dishes with originality and flair. Try their succulent roasted pork belly and cured day-boat seabass.

2. SW16 Bar and Kitchen - Situated in Streatham, this Italian restaurant welcomes pets, children, and noisy friends. Enjoy their rich slow-cooked lamb ragu tagliatelle and delicious cocktails.

3. Plaquemine Lock - This Cajun and Creole restaurant in Angel serves up hearty and flavorsome dishes inspired by the cuisine of New Orleans. Don't miss their gumbo, buttermilk fried chicken, and beignets.

4. Brawn - Located in Columbia Road, this neighborhood restaurant offers a daily menu of seasonal, European-inspired dishes. Try their hand-made pasta and creamy Tiramisu.

5. Gold - Notting Hill's Gold restaurant offers a British-tapas style menu with inventive dishes. Don't miss their burrata,