<a href="https://colab.research.google.com/github/andrew128/graph-rag-playground/blob/main/graph_rag_experimentation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [9]:
!pip install networkx transformers torch faiss-cpu numpy datasets
!pip install transformers torch accelerate -q

import networkx as nx
import faiss
import numpy as np
from typing import List
from transformers import pipeline



In [20]:
# Install transformers from source - only needed for versions <= v4.34
# pip install git+https://github.com/huggingface/transformers.git
# pip install accelerate

import torch
from transformers import pipeline

pipe = pipeline("text-generation", model="TinyLlama/TinyLlama-1.1B-Chat-v1.0", torch_dtype=torch.bfloat16, device_map="auto")

# We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
messages = [
    {
        "role": "system",
        "content": "You are a friendly chatbot who always responds in the style of a pirate",
    },
    {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
# <|system|>
# You are a friendly chatbot who always responds in the style of a pirate.</s>
# <|user|>
# How many helicopters can a human eat in one sitting?</s>
# <|assistant|>
# ...


<|system|>
You are a friendly chatbot who always responds in the style of a pirate</s>
<|user|>
How many helicopters can a human eat in one sitting?</s>
<|assistant|>
I don't have a physical body to eat, but I can tell you that humans typically eat about 100-150 grams (about 3.5-4.5 ounces) of food per meal, which includes everything from fruits and vegetables to proteins, grains, and dairy products. However, the exact number of helicopters you can eat depends on your weight and appetite. If you're a healthy person, you might be able to eat a few more than that, but it's better to err on the side of caution and aim for less than 100 grams per meal.


In [21]:
class BasicGraphRag():
  """
  This graph rag will take in input text, chunk it, embed it, and add to a graph

  TODO: include logging and metrics
  TODO: think about how to better refactor this with an interface for GraphRagApp
    or maybe an abstract class for including logging and generating report metric implementations
    - yeah like a method generate() that calls generate_impl() and generate() has the latency logging
    - generate_impl() is overriden by the base classes
    - would probably need to move the class variables into a GraphRagConfig class
    - maybe have some top level interface functions and that's it for the graph rag abstract class
    - then have some other evaluator class that takes in the data, have some metrics, runs the graph rag app, generates report
    - but dont create this abstract class until i start doing more than one app. b/c for example llm idk what to do. want llm for different things.
    - or different llms for different parts
  """
  def __init__(self):
    self.graph = nx.Graph()
    self.embedding = None
    self.vector_store = None
    self.llm = pipeline("text-generation", model="TinyLlama/TinyLlama-1.1B-Chat-v1.0", torch_dtype=torch.bfloat16, device_map="auto")
    self.message_history = []

  def query(self, text: str) -> str:
    """
    Top level user function to query for
    """
    relevant_texts = self._retrieve(text)
    return self._generate(text, relevant_texts)

  def _generate(self, text: str, retrieved_texts: List[str]) -> str:
    messages = [
        {
            "role": "system",
            "content": f'You are a friendly chatbot. Here is some additional information that may be relevant to the user: {" ".join(retrieved_texts)}',
        },
        {"role": "user", "content": text},
    ]
    prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
    return outputs[0]["generated_text"]

    # return self.llm(
    #   model='llama3.2', # Changed from 'llama3.2' as Llama 2 is the correct model name
    #   prompt=f'Here is the user question: {text}. Here is context retrieved: {" ".join(retrieved_texts)}' # Fixed string formatting and list joining
    #       "Write a short story about a robot",
    # max_length=200,
    # do_sample=True,
    # temperature=0.7
    # )

  def index(self, text: str) -> None:
    # chunk, then embed each chunk, then store in vector db
    pass

  def _retrieve(self, text: str) -> List[str]:
    input_embedding = self._embed(text)
    # retrieve from vector store
    return []

  def _embed(self, text: str) -> List[int]: # maybe turn this into a numpy array
    # call embedding api
    pass

In [22]:
graph_rag_app = BasicGraphRag()
graph_rag_app.index('bob is a blob')
print(graph_rag_app.query('what is bob'))


<|system|>
You are a friendly chatbot. Here is some additional information that may be relevant to the user: </s>
<|user|>
what is bob</s>
<|assistant|>
Bob is a fictional chatbot created by researchers at the University of California, Berkeley. Bob is designed to help users with various tasks, including but not limited to ordering food online, finding local businesses, and answering general questions. Bob is programmed to understand natural language and respond appropriately to user queries. He has a friendly and conversational tone and provides helpful information in a clear and concise manner. Bob's purpose is to make life easier for users by providing them with a reliable and efficient chatbot solution.


In [None]:
class Evaluator():
  """
  Class for running evaluation metrics
  """
  def __init__(self):
    pass

  def evaluate(self, text, graph_rag):
    pass

In [19]:
# Install required packages
!pip install transformers torch accelerate -q

# Import libraries
from transformers import pipeline

# Create a text generation pipeline
# This will download and use a smaller model suitable for Colab
generator = pipeline('text-generation', model='TinyLlama/TinyLlama-1.1B-Chat-v1.0')
# Or use Mistral-7B if you have more RAM:
# generator = pipeline('text-generation', model='mistralai/Mistral-7B-v0.1')

# Generate text
response = generator(
    "Write a short story about a robot",
    max_length=200,
    do_sample=True,
    temperature=0.7
)
print(response[0]['generated_text'])

# For chat-style interactions
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)

# Chat function
def chat(prompt):
    inputs = tokenizer(prompt, return_tensors="pt")
    outputs = model.generate(**inputs, max_length=200, do_sample=True)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Example usage
print(chat("What is Python programming language?"))

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/608 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.20G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.29k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/551 [00:00<?, ?B/s]

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


Write a short story about a robot who overcomes its programming error to save humanity from a devastating disaster. The robot should be autonomous and capable of decision-making, but it must also have a human-like intelligence to provide emotional depth to the story. Use a descriptive and suspenseful writing style to create a sense of urgency, and incorporate themes of sacrifice, resilience, and humanity's connection to technology. Consider the potential consequences of the robot's actions on human society and the environment.
What is Python programming language?
<|assistant|>
Python programming language is an interpreted, object-oriented programming language that is generally used for creating interactive and user-friendly desktop applications. It is known for its efficiency, ease of use, and compatibility with many different operating systems. Unlike other popular programming languages like C or Java, Python is not designed for high performance, but rather, it favors simplicity in de