In [9]:
import os
from langchain_ollama import OllamaEmbeddings
import faiss
import numpy as np
from langchain.document_loaders import CSVLoader
from langchain_community.llms import Ollama
from langchain_ollama import OllamaEmbeddings, OllamaLLM
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.schema.output_parser import StrOutputParser



This code uses several different models at different steps of the RAG (Retrieval-Augmented Generation) pipeline:

1. **Embedding Model**: 
   - `OllamaEmbeddings(model="nomic-embed-text:latest")`
   - This uses the Nomic Embed Text model running through Ollama to convert text into vector embeddings
   - Used for both document embedding and query embedding

2. **Vector Database**: 
   - `faiss.IndexFlatL2` 
   - Not a machine learning model per se, but a vector similarity search index from Facebook AI Similarity Search (FAISS)
   - Uses L2 (Euclidean) distance to find similar vectors

3. **Language Model (LLM)**:
   - `Ollama(model="llama3")`
   - This uses Llama 3 running locally through Ollama
   - Generates the final response based on the retrieved context and the user's question

The workflow is:
1. Documents are embedded using Nomic Embed Text
2. User queries are embedded using the same model
3. FAISS finds the most similar document embeddings to the query embedding
4. The retrieved documents are combined into context
5. Llama 3 generates a response using the context and question

This is a fully local RAG implementation, with all models running on your machine through Ollama.


In [2]:
# Load data
loader = CSVLoader('data-raw/fighter_info.csv')
data = loader.load()

# Embed documents
embeddings = OllamaEmbeddings(model="nomic-embed-text:latest")
embedded_doc = embeddings.embed_documents([text.page_content for text in data])

# Convert to numpy array
embedded_doc = np.array(embedded_doc)

# Create a FAISS index
index = faiss.IndexFlatL2(embedded_doc.shape[1])

# Add vectors to the index
index.add(embedded_doc)

# Save embeddings and index (run this before closing kernel)
np.save('embedded_doc.npy', embedded_doc)
faiss.write_index(index, 'fighter_index.faiss')


In [3]:
data

[Document(metadata={'source': 'data-raw/fighter_info.csv', 'row': 0}, page_content="Fighter: ottman azaitar\nNickname: bulldozer\nBirth Date: feb 20, 1990\nNationality: germany\nHometown: cologne\nAssociation: jupps fight team\nWeight Class: lightweight\nHeight: 5'7\nWins: 13\nLosses: 3\nWin_Decision: 1\nWin_KO: 10\nWin_Sub: 2\nLoss_Decision: 0\nLoss_KO: 3\nLoss_Sub: 0\nFighter_ID: 173073"),
 Document(metadata={'source': 'data-raw/fighter_info.csv', 'row': 1}, page_content="Fighter: sean woodson\nNickname: the sniper\nBirth Date: jun 7, 1992\nNationality: united states\nHometown: st. louis, missouri\nAssociation: wolves' den training center\nWeight Class: featherweight\nHeight: 6'2\nWins: 13\nLosses: 1\nWin_Decision: 8\nWin_KO: 4\nWin_Sub: 1\nLoss_Decision: 0\nLoss_KO: 0\nLoss_Sub: 1\nFighter_ID: 235563"),
 Document(metadata={'source': 'data-raw/fighter_info.csv', 'row': 2}, page_content="Fighter: jairzinho rozenstruik\nNickname: bigi boy\nBirth Date: mar 17, 1988\nNationality: surinam

In [4]:
index

<faiss.swigfaiss.IndexFlatL2; proxy of <Swig Object of type 'faiss::IndexFlatL2 *' at 0x1379914a0> >

In [5]:
embedded_doc.shape

(2646, 768)

In [6]:
# Generate a query vector
# query_text = "Find and tell me Jon Jones method of victories from his career"
query_text = "Tell me about Leon Edwards"
query_vector = embeddings.embed_query(query_text)
query_vector = np.array([query_vector])

# Perform similarity search
distances, indices = index.search(query_vector, k=5)

# Print results
print("Similarity distances:", distances)
print("Indices of similar documents:", indices)

Similarity distances: [[0.6606034  0.8654603  0.871382   0.92637384 0.94489515]]
Indices of similar documents: [[ 549 1242 1723 1709  337]]


In [7]:
# Iterate over results and print matching documents
for i, idx in enumerate(indices[0]):
    print(f"Document {i+1} (Index: {idx}):")
    print(data[idx])

Document 1 (Index: 549):
page_content='Fighter: leon edwards
Nickname: rocky
Birth Date: aug 25, 1991
Nationality: england
Hometown: erdington, west midlands
Association: team renegade
Weight Class: welterweight
Height: 6'0
Wins: 22
Losses: 4
Win_Decision: 12
Win_KO: 7
Win_Sub: 3
Loss_Decision: 3
Loss_KO: 0
Loss_Sub: 0
Fighter_ID: 62665' metadata={'source': 'data-raw/fighter_info.csv', 'row': 549}
Document 2 (Index: 1242):
page_content='Fighter: te edwards
Nickname: t
Birth Date: sep 11, 1990
Nationality: united states
Hometown: phoenix, arizona
Association: mma lab
Weight Class: lightweight
Height: 5'8
Wins: 6
Losses: 3
Win_Decision: 0
Win_KO: 6
Win_Sub: 0
Loss_Decision: 2
Loss_KO: 1
Loss_Sub: 0
Fighter_ID: 156303' metadata={'source': 'data-raw/fighter_info.csv', 'row': 1242}
Document 3 (Index: 1723):
page_content='Fighter: yves edwards
Nickname: thugjitsu master
Birth Date: sep 30, 1976
Nationality: united states
Hometown: woodlands, texas
Association: american top team
Weight Class:

In [15]:
# Use retrieved documents with an LLM for better answers
# Optional

llm = Ollama(model="llama3")

# Create prompt template
prompt_template = """
You are an MMA expert assistant. Use the following information to answer the question.

Context:
{context}

Question: {question}

Answer:
"""

prompt = PromptTemplate(
    input_variables=["context", "question"],
    template=prompt_template
)

# Create chain
# chain = LLMChain(llm=llm, prompt=prompt)
chain = prompt | llm | StrOutputParser()

# Get the most similar documents
similar_docs = [data[idx] for idx in indices[0]]

# Combine retrieved documents into context
context = "\n\n".join([doc.page_content for doc in similar_docs])

# Get response
# response = chain.run(context=context, question=query_text)
response = chain.invoke({"context": context, "question": query_text})

print("\nAI Response:")
print(response)


AI Response:
Merab Dvalishvili, also known as "The Machine". Here's a breakdown of his profile:

**Age:** As of January 2023, Merab is 32 years old.

**Nationality:** Georgian

**Hometown:** Tbilisi, Georgia

**Association:** Serra-Longo Fight Team

**Weight Class:** Bantamweight (135 lbs)

**Height:** 5'6" (168 cm)

**Record:**

* **Wins:** 19
	+ **Decisions:** 15
	+ **KOs/TKOs:** 3
	+ **Submissions:** 1
* **Losses:** 4
	+ **Decisions:** 3
	+ **KOs/TKOs:** 0
	+ **Submissions:** 1

Overall, Merab has an impressive record with a high winning percentage. His ability to secure decisions is one of his strengths, indicating that he can outlast and outmaneuver his opponents over the course of three rounds. While he doesn't have a significant number of KOs or submissions, his all-around skillset suggests that he's well-rounded and capable of adapting to different situations in fights.

Would you like me to compare Merab with Enrique Marin (Wasabi) or Marcos Mariano (Dhalsim)?


---

In [14]:
# Reload and ask new question

# query_text = "Analyze and tell me about Aljamain Sterling"
query_text = "Analyze and tell me about Merab"
 
loader = CSVLoader('data-raw/fighter_info.csv')
data = loader.load()
embeddings = OllamaEmbeddings(model="nomic-embed-text:latest")
embedded_doc = np.load('embedded_doc.npy')
index = faiss.read_index('fighter_index.faiss')
llm = Ollama(model="llama3")
prompt = PromptTemplate(
    input_variables=["context", "question"],
    template="""
    You are an MMA expert assistant. Use the following information to answer the question.
    
    Context:
    {context}
    
    Question: 
    {question}
    
    Answer:
    """
    )
# chain = LLMChain(llm=llm, prompt=prompt)
chain = prompt | llm | StrOutputParser()
query_vector = embeddings.embed_query(query_text)
query_vector = np.array([query_vector])
k = 3  
distances, indices = index.search(query_vector, k)
similar_docs = [data[idx] for idx in indices[0]]
print(f"Query: {query_text}")
print(f"\nSimilarity distances: {distances}")
print(f"Indices of similar documents: {indices}")
for i, doc in enumerate(similar_docs):
    print(f"\nDocument {i+1} (Distance: {distances[0][i]:.4f}):")
    print(doc.page_content)
context = "\n\n".join([doc.page_content for doc in similar_docs])
response = chain.invoke({"context": context, "question": query_text})
print("\nAI Response:")
print(response)
# print(response['text'])

Query: Analyze and tell me about Merab

Similarity distances: [[0.9572598 1.0666069 1.0700512]]
Indices of similar documents: [[  33 1533 1294]]

Document 1 (Distance: 0.9573):
Fighter: merab dvalishvili
Nickname: the machine
Birth Date: jan 10, 1991
Nationality: georgia
Hometown: tbilisi
Association: serra-longo fight team
Weight Class: bantamweight
Height: 5'6
Wins: 19
Losses: 4
Win_Decision: 15
Win_KO: 3
Win_Sub: 1
Loss_Decision: 3
Loss_KO: 0
Loss_Sub: 1
Fighter_ID: 157355

Document 2 (Distance: 1.0666):
Fighter: enrique marin
Nickname: wasabi
Birth Date: sep 2, 1986
Nationality: spain
Hometown: seville, andalusia
Association: sutemi mma
Weight Class: welterweight
Height: 5'10
Wins: 13
Losses: 7
Win_Decision: 6
Win_KO: 3
Win_Sub: 4
Loss_Decision: 3
Loss_KO: 1
Loss_Sub: 3
Fighter_ID: 27077

Document 3 (Distance: 1.0701):
Fighter: marcos mariano
Nickname: dhalsim
Birth Date: oct 12, 1986
Nationality: brazil
Hometown: guarulhos, sao paulo
Association: killer bees
Weight Class: lightwei

In [None]:
# Interactive Version ^

loader = CSVLoader('data-raw/fighter_info.csv')
data = loader.load()
embeddings = OllamaEmbeddings(model="nomic-embed-text:latest")
embedded_doc = np.load('embedded_doc.npy')
index = faiss.read_index('fighter_index.faiss')
llm = OllamaLLM(model="llama3")
prompt = PromptTemplate(
    input_variables=["context", "history", "question"],
    template="""
You are an MMA expert assistant. Use the following information to answer the question.

Context information:
{context}

Conversation history:
{history}

Current question: {question}

Answer:
"""
)
chain = prompt | llm | StrOutputParser()
conversation_history = []
def format_history():
    """Format conversation history for the prompt"""
    if not conversation_history:
        return "No previous conversation."
    formatted = []
    for i, (q, a) in enumerate(conversation_history):
        formatted.append(f"Question: {q}")
        formatted.append(f"Answer: {a}")
    return "\n".join(formatted)
def query_mma_assistant(query_text, k=3):
    """Process a query and return a response"""
    query_vector = embeddings.embed_query(query_text)
    query_vector = np.array([query_vector])
    distances, indices = index.search(query_vector, k)
    similar_docs = [data[idx] for idx in indices[0]]
    print(f"\nFound {len(similar_docs)} relevant fighter profiles:")
    for i, doc in enumerate(similar_docs):
        fighter_name = doc.page_content.split("\n")[0].replace("Fighter: ", "")
        print(f"- {fighter_name} (Similarity: {distances[0][i]:.4f})")
    context = "\n\n".join([doc.page_content for doc in similar_docs])
    history = format_history()
    response = chain.invoke({
        "context": context, 
        "history": history,
        "question": query_text
    })
    conversation_history.append((query_text, response))
    return response
print("\n===== MMA Fighter Assistant =====")
print("Ask questions about MMA fighters. Type 'quit' to exit or 'clear' to reset conversation.")
while True:
    query = input("\nYour question: ")
    if query.lower() in ['quit', 'exit', 'q']:
        print("Goodbye!")
        break
    if query.lower() in ['clear', 'reset']:
        conversation_history = []
        print("Conversation history cleared.")
        continue
    if not query.strip():
        continue
    response = query_mma_assistant(query)
    print("\nMMA Assistant:")
    print(response)

---



# Models Used in Your MMA RAG System and Potential Alternatives

Your current RAG implementation uses several models at different stages of the pipeline. Here's a breakdown of what you're using and some improved alternatives you could try:

## 1. Embedding Model
**Current**: `OllamaEmbeddings(model="nomic-embed-text:latest")`

**Improved Alternatives**:
- **BGE Embeddings**: `from langchain_community.embeddings import HuggingFaceEmbeddings; embeddings = HuggingFaceEmbeddings(model_name="BAAI/bge-large-en-v1.5")`
- **E5 Embeddings**: `HuggingFaceEmbeddings(model_name="intfloat/e5-large-v2")`
- **Instructor Embeddings**: `from langchain_community.embeddings import HuggingFaceInstructEmbeddings; embeddings = HuggingFaceInstructEmbeddings(model_name="hkunlp/instructor-xl")`

## 2. Vector Database
**Current**: `faiss.IndexFlatL2`

**Improved Alternatives**:
- **FAISS with HNSW**: `index = faiss.IndexHNSWFlat(dimension, M=16)`
- **Chroma DB**: `from langchain_community.vectorstores import Chroma; db = Chroma.from_documents(documents, embeddings)`
- **Qdrant**: `from langchain_community.vectorstores import Qdrant; db = Qdrant.from_documents(documents, embeddings)`

## 3. Language Model (LLM)
**Current**: `Ollama(model="llama3")`

**Improved Alternatives**:
- **Llama 2 70B**: `Ollama(model="llama2:70b")` - Better performance across all metrics compared to smaller Llama models [source](https://huggingface.co/meta-llama/Llama-2-7b-hf)
- **Mistral 7B**: `Ollama(model="mistral")` - Outperforms Llama2-13B on all metrics despite being smaller [source](https://www.e2enetworks.com/blog/mistral-7b-vs-llama2-which-performs-better-and-why)
- **Mixtral 8x7B**: `Ollama(model="mixtral")`
- **Llama 3 8B/70B**: `Ollama(model="llama3:8b")` or `Ollama(model="llama3:70b")`

## Performance Considerations

According to the search results, Llama 2 70B significantly outperforms the 7B variant across all benchmarks:
- **MMLU**: 68.9% vs 45.3%
- **BBH**: 51.2% vs 32.6%
- **TruthfulQA**: 50.18% vs 33.29%

Mistral 7B is particularly noteworthy as it outperforms Llama2-13B on all metrics while being smaller, which means:
- Faster inference
- Lower memory requirements
- Better throughput

For your MMA chatbot, using Mistral 7B could provide a good balance of performance and efficiency, especially if you're running these models locally.
