# Homework: Implementing a RAG Example with Ollama and Mistral LLM

## Overview

In this homework, you will be working on a practical application of the DataStax RAGStack. The goal is to modify this Jupyter Notebook that currently leverages OpenAI's LLM (Large Language Models) for a RAG example. Your task is to adapt this notebook to use Ollama running Mistral LLM, as the backbone for the RAG implementation.

## Why Ollama?

Ollama offers the option to run a LLM on a local machine. Self-managed LLMs are especially of interest for Customers using Cassandra or DSE on-prem and in internet-restricted environments, and for those using Cassandra, DSE and Astra DB who are cautious about sending sensitive data to cloud-based LLM services due to privacy concerns and cost considerations. Ollama enables local execution of Large Language Models, providing a private solution. This is particularly beneficial for demonstrations and aligns with the requirements of businesses handling critical data, ensuring it remains within their controlled environment.

For those seeking self-managed LLMs, alternatives like Mistral are available, offering performance comparable to OpenAI's models. Interested parties are encouraged to [review the Mistral documentation](https://huggingface.co/docs/transformers/main/en/model_doc/mistral). Mistral is designed for easy installation and can be efficiently hosted on the robust computing resources available in customer data centers.

## Objectives

1. **Understand the Current Implementation**: Begin by familiarizing yourself with the existing Jupyter Notebook. It uses DataStax's RAGStack, integrating Astra DB as a vector store, and employs an OpenAI LLM for generating responses.

2. **Transition to Ollama and Mistral LLM**: Your primary task is to modify the code in the notebook to replace the OpenAI LLM with Ollama running Mistral LLM. This will involve understanding the differences between the two models and adapting the API calls and data handling accordingly.

3. **Test and Validate**: Keep in mind you run notebook and ollama on your local machine. After implementing the changes, test the notebook to ensure that it functions correctly with the new LLM.

## Resources

- **Ollama Documentation**: [How to install Ollama](https://ollama.com/download) and [How to run Mistral LLM powered by Ollama](https://ollama.com/library/mistral/tags)

## Submission Guidelines

- Complete the task in the provided Jupyter Notebook.
- Ensure all code cells are well-documented.
- Submit the final notebook along with a brief report summarizing your approach, key challenges, and solutions.

Good luck, and feel free to reach out if you have any questions or need further clarifications!


In [1]:
!pip install ragstack-ai sentence-transformers

Collecting sentence-transformers
  Downloading sentence_transformers-2.5.1-py3-none-any.whl.metadata (11 kB)
Collecting transformers<5.0.0,>=4.32.0 (from sentence-transformers)
  Downloading transformers-4.38.2-py3-none-any.whl.metadata (130 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m130.7/130.7 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Collecting torch>=1.11.0 (from sentence-transformers)
  Downloading torch-2.2.1-cp311-cp311-manylinux2014_aarch64.whl.metadata (25 kB)
Collecting scikit-learn (from sentence-transformers)
  Downloading scikit_learn-1.4.1.post1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (11 kB)
Collecting scipy (from sentence-transformers)
  Downloading scipy-1.12.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (60 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.4/60.4 kB[0m [31m5.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting huggingface-hub>=0.15.

In [8]:
ASTRA_DB_API_ENDPOINT = input("Astra DB API Endpoint:")

Astra DB API Endpoint: https://1014346a-a40c-4d1a-b1a3-78769cc72312-us-east1.apps.astra.datastax.com


In [3]:
import getpass

ASTRA_DB_APPLICATION_TOKEN = getpass.getpass("Enter your ASTRA_DB_APPLICATION_TOKEN: ")

Enter your ASTRA_DB_APPLICATION_TOKEN:  ········


In [9]:
from astrapy.db import AstraDB

db = AstraDB(
  token=ASTRA_DB_APPLICATION_TOKEN,
  api_endpoint=ASTRA_DB_API_ENDPOINT)

print(f"Connected to Astra DB: {db.get_collections()}")

Connected to Astra DB: {'status': {'collections': ['book_collection', 'langchain_message_store', 'pdf_collection', 'vector_context_datastax', 'vector_test']}}


In [10]:
import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"

In [7]:
import requests
response = requests.get('https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt')
text = response.text

f = open('essay.txt', 'w')
f.write(text)
f.close()

In [11]:
from langchain_community.document_loaders import TextLoader
from langchain_openai import ChatOpenAI
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import AstraDB
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains import create_retrieval_chain

In [14]:
from langchain_community.llms import Ollama

ollama = Ollama(model="mistral")

In [15]:
ollama.base_url = 'http://host.docker.internal:11434'

In [16]:
ollama.invoke("Why is the sky blue?")

" The color of the sky appears blue due to a process called Rayleigh scattering. When the sun's rays enter Earth's atmosphere, they are scattered in all directions by gas molecules and particles present in the air. Blue light has a shorter wavelength and gets scattered more easily than other colors such as red or yellow. As a result, when we look up at the sky, we predominantly see the blue light that has been scattered, making the sky appear blue to our eyes."

In [18]:
# Load data
loader = TextLoader("essay.txt")
docs = loader.load()
# Split text into chunks
text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(docs)
# Define the embedding model
embeddings = HuggingFaceEmbeddings()
#vector = load_vector_store()
vector = AstraDB.from_documents(documents, embeddings,collection_name="essay_for_ollama", api_endpoint=ASTRA_DB_API_ENDPOINT, token=ASTRA_DB_APPLICATION_TOKEN)

In [19]:
# Define a retriever interface
retriever = vector.as_retriever()

# Define prompt template
prompt = ChatPromptTemplate.from_template("""Answer the following question based only on the provided context:

<context>
{context}
</context>

Question: {input}""")

# Create a retrieval chain to answer questions
document_chain = create_stuff_documents_chain(ollama, prompt)
retrieval_chain = create_retrieval_chain(retriever, document_chain)
response = retrieval_chain.invoke({"input": "What were the two main things the author worked on before college?"})
print(response["answer"])

 The two main things the author worked on before college were Interleaf, a software company where he did Lisp programming, and freelance Lisp programming for other clients. He also wrote a Lisp book during this time.


In [20]:
response = retrieval_chain.invoke({"input": "著者が大学入学前に取り組んだ主な 2 つのことは何ですか?"})
print(response["answer"])

 The author worked mainly on two things before college: writing and programming. He wrote short stories but they were awful, with hardly any plot. The first programs he tried writing were on an IBM 1401 computer using a punch card system. He couldn't figure out what to do with it and his clearest memory is of the moment he learned it was possible for programs not to terminate, which was both a technical and social error. With microcomputers, everything changed as you could have a computer sitting right in front of you that could respond to your keystrokes as it ran instead of just churning through a stack of punch cards and then stopping. The first friend of the author's to get a microcomputer built it himself using a kit called Heathkit. Computers were expensive in those days and it took the author years of nagging before his father finally bought one, a TRS-80, around 1980. From then on he really started programming, writing simple games, programs to predict how high his model rocket

The above Japanese output is totally useless