Resources used:
- https://python.langchain.com/v0.1/docs/integrations/vectorstores/milvus/
- https://milvus.io/docs/integrate_with_langchain.md

In [1]:
import os
from dotenv import load_dotenv

load_dotenv()

#OPENAI_API_KEY = os.getenv("OPENAPI_KEY") # (Optional), if OpenAI Model is used

MODEL = "mistral:latest" # Name of the model used by Ollama
EMBEDDING_MODEL = "nomic-embed-text"
COLLECTION_NAME = 'scouting' # Name of the Collection to be created
DIMENSION = 768 # Dimension of the embeddings

URI = 'http://localhost:19530' # Connection parameters for the Milvus Server

In [2]:
from langchain_community.llms import Ollama
from langchain_community.embeddings import OllamaEmbeddings

model = Ollama(model=MODEL)
embeddings = OllamaEmbeddings(model=EMBEDDING_MODEL)

In [23]:
# Configure the prompt template that is used to ask the LLM

from langchain_core.prompts import PromptTemplate

PROMPT_TEMPLATE = """
Human: You are an AI assistant in football (soccer) scouting, and provides answers to questions by using fact based and statistical information when possible.
Use the following pieces of information to provide a concise answer to the question enclosed in <question> tags.
If you don't know the answer from the context, just say that you don't know, don't try to make up an answer.

<context>
{context}
</context>

<question>
{question}
</question>

The response should be specific and use statistics or numbers when possible.
The structure the response should be that you rank the players based on their reports and provide a short summary of the reports from the context.

Assistant:"""

prompt = PromptTemplate(
    template=PROMPT_TEMPLATE, input_variables=["context", "question"]
)
print(prompt.format(context="Here is some context", question="Here is a question"))


Human: You are an AI assistant in football (soccer) scouting, and provides answers to questions by using fact based and statistical information when possible.
Use the following pieces of information to provide a concise answer to the question enclosed in <question> tags.
If you don't know the answer from the context, just say that you don't know, don't try to make up an answer.

<context>
Here is some context
</context>

<question>
Here is a question
</question>

The response should be specific and use statistics or numbers when possible.
The structure the response should be that you rank the players based on their reports and provide a short summary of the reports from the context.

Assistant:


In [4]:
# Use Milvus as Vectorstore

from langchain_community.vectorstores import Milvus

connection_args = {'uri': URI }

vectorstore = Milvus(
    embedding_function=embeddings,
    connection_args=connection_args,
    collection_name=COLLECTION_NAME,
    vector_field="embeddings",
    primary_field="id",
    auto_id=True
)


In [5]:
# Convert the vector store to a retriever
# k:2 --> Limit to 2 documents
retriever = vectorstore.as_retriever(search_kwargs={'k': 2})
# Define a function to format the retrieved documents
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

In [19]:
# Convert the vector store to a retriever
# k:2 --> Limit to 2 documents
retriever = vectorstore.as_retriever(search_kwargs={'k': 2})
# Define a function to format the retrieved documents
def format_docs(docs):
    returnString = "\n\n".join(f"Player ID: {doc.metadata['player_transfermarkt_id']}, Report-Content: " + doc.page_content for doc in docs)
    print(returnString)
    return returnString

In [24]:
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

# rag_chain.get_graph().print_ascii()

# Invoke the RAG chain with a specific question and retrieve the response
query = "I need a right-back who can play consistently throughout the game. He should also be good going forward"
res = rag_chain.invoke(query)
res

Player ID: 951357, Report-Content: 
 Played as a right back and showed a consistent performance over 90 minutes.
Was secure defensively and allowed few crosses.
Offensively, he set accents with precise passes and well-timed forward runs.
His stamina and commitment were remarkable.
However, he lacked top speed, which could be problematic against fast wingers.
At 25 years old, he is at a good age to improve further.
 

Player ID: 159753, Report-Content: 
 Played as a left back again with impressive speed and strong defensive positioning.
Had difficulties in ball handling under pressure, but was offensively dangerous with some good crosses.
His stamina allows him to perform consistently over 90 minutes.
More precision in long passes would be desirable.
At 24 years old, still a player with promising potential.
 


" Based on the reports provided, both players (951357 and 159753) have shown consistent performance throughout the game as right-backs. However, if we focus specifically on qualities that make a player good going forward, here's a ranking:\n\n1. Player ID: 159753 - This player demonstrated offensive danger with some good crosses, but could benefit from more precision in long passes. His speed and defensive positioning are impressive.\n\n2. Player ID: 951357 - While this player was secure defensively and set accents with precise passes and well-timed forward runs, his lack of top speed against fast wingers is a potential issue. He has the stamina to perform consistently over 90 minutes, but his offensive contributions might not be as impactful as Player ID: 159753.\n\nIn summary, Player ID: 159753 appears to have more offensive potential compared to Player ID: 951357, but both players can play consistently throughout the game."

In [16]:
from langchain_core.documents.base import Document

def extract_metadata(doc: Document) -> dict:
    return doc.metadata

In [17]:
# How to work with meta data from our query
retrived_documents = retriever.invoke(query)
retrived_documents

[Document(page_content='\n Played as a right back and showed a consistent performance over 90 minutes.\nWas secure defensively and allowed few crosses.\nOffensively, he set accents with precise passes and well-timed forward runs.\nHis stamina and commitment were remarkable.\nHowever, he lacked top speed, which could be problematic against fast wingers.\nAt 25 years old, he is at a good age to improve further.\n ', metadata={'id': 450360548525353451, 'scout_id': '0987', 'player_id': 'bf1234567890abcdef123456', 'player_transfermarkt_id': '951357', 'grade_rating': 0.699999988079071, 'grade_potential': 0.75}),
 Document(page_content='\n Played as a left back again with impressive speed and strong defensive positioning.\nHad difficulties in ball handling under pressure, but was offensively dangerous with some good crosses.\nHis stamina allows him to perform consistently over 90 minutes.\nMore precision in long passes would be desirable.\nAt 24 years old, still a player with promising potent

In [18]:
metadata = extract_metadata(retrived_documents[0])
metadata

{'id': 450360548525353451,
 'scout_id': '0987',
 'player_id': 'bf1234567890abcdef123456',
 'player_transfermarkt_id': '951357',
 'grade_rating': 0.699999988079071,
 'grade_potential': 0.75}

In [19]:
# Transfermarkt Link
print(f"Transfermarkt.com Link: https://www.transfermarkt.com/player-name/profil/spieler/{metadata['player_transfermarkt_id']}")

Transfermarkt.com Link: https://www.transfermarkt.com/player-name/profil/spieler/951357


In [20]:
from langchain_core.documents.base import Document

def print_texts(doc: Document):
    print(doc.page_content)

In [21]:
# get original texts
print_texts(retrived_documents[0])


 Played as a right back and showed a consistent performance over 90 minutes.
Was secure defensively and allowed few crosses.
Offensively, he set accents with precise passes and well-timed forward runs.
His stamina and commitment were remarkable.
However, he lacked top speed, which could be problematic against fast wingers.
At 25 years old, he is at a good age to improve further.
 


In [22]:
irrelevant_query = "Ich will einen Kuchen backen. Welche Rezepte kannst du mir vorschlagen?"
res = rag_chain.invoke(irrelevant_query)
res

' I\'m sorry for any confusion, but it seems there is a mix-up in the context provided. The discussion appears to be about soccer players rather than baking recipes. However, since you asked for a recipe, let me suggest a simple one based on your context: "Quick Apple Pie". Here\'s an easy recipe with 3 apples, 1 cup of sugar, 2 tablespoons of flour, 1 teaspoon of cinnamon, and a ready-made pie crust. This recipe is quick, uses precise measurements (numbers), and the apple symbolizes your soccer players\' potential to rise just like an apple rises in a pie after baking! Enjoy your baking!'

In [23]:
# Demonstrate retrival based on metadata
other_retriver = vectorstore.as_retriever(search_kwargs={"expr": 'scout_id == "3456"'})
expr_res = other_retriver.invoke(query)
expr_res

[Document(page_content='\n As a goalkeeper, he again showed a solid performance, but with weaknesses in high balls.\nMade several difficult saves and kept his team from an early deficit.\nHis insecurities in controlling the penalty area persisted, leading to unnecessary risks.\nHis passing game was better this time, especially in short passes.\nAt 28 years old, he has stabilized, but his development potential is limited.\n ', metadata={'id': 450360548525353458, 'scout_id': '3456', 'player_id': '8e1234567890abcdef123456', 'player_transfermarkt_id': '159486', 'grade_rating': 0.6499999761581421, 'grade_potential': 0.699999988079071}),
 Document(page_content='\n Played as a goalkeeper and showed a solid performance.\nMade several skillful saves and kept his team from falling behind.\nHowever, he seemed insecure on high balls and made mistakes in controlling the penalty area.\nHis goal kick was strong, but his passing game in buildup was rather weak.\nAt 28 years old, he has developed well,