# Purpose
This notebook aims to explain LLM, Prompt Engineering and working with Langchain framework using LCEL . I wish you a good learning :)

## LLM
It simply means Large Langage Model. LLM is a AI model which has been trained on a wide or various data. As LLM , we can note:
* IBM Watson
* LLaMa of Facebook
* Chatgpt (surrely the most known)
LLMs are useful for achieving a lot of tasks. Indeed , they are able to 
* understand Human Langage through NLP(Natural Langage Processing),
* See some images through Computer vision. As a matter of fact, they detect some diseases on scanner

# Prompt Engeniering
We can see Prompt Engineering as the science of implementing some well-formed or useful input which are intended for or aimed at LLM. Actually 
,it allows LLM to generate suitable output. Prompt Engineering is just a driver which leads LLM response. Since LLMs have been on range of data ,
Prompt Engineering is useful for getting suitable response

## In-Context-Learning
LLM learns from examples which have been added in prompt . We dont need to retrain it or updating its weight

In [142]:
from ollama import Client
from langchain_ollama import ChatOllama
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser,JsonOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_community.vectorstores import Chroma
from langchain_ollama import OllamaEmbeddings


# Prompt template
It allows us to use template for our prompts. That way ,we can reuse it . Lets see some types of prompt template

# Zero Shot template

In [66]:
template = """
You are a agriculture expert. Explain this :{concept}
like you speak to a child
Answer:
"""

promptZeroShot= PromptTemplate(
    input_variables = ['concept'],
    template=template
)

## Lets see the result of promptZeroShot

In [67]:
promptZeroShot

PromptTemplate(input_variables=['concept'], input_types={}, partial_variables={}, template='\nYou are a agriculture expert. Explain this :{concept}\nlike you speak to a child\nAnswer:\n')

# One shot Prompting

In [58]:
templateOneshot = """
You are an expert . I will give you some samples of sentences and emotion it leads
Phrase 1: I dont like your behavior
it express a disapointment

Tell which sentiment is expressed in this sentence {sentence}
"""

promptOneShot = PromptTemplate(
    input_variables=['sentence'],
    template= templateOneshot
)

# Other type of prompting template
* Few shot :Like One shot but we give more samples
* CoT (Chain of thought): We foaster the model to split a particular problems in some step-by-step or process to get a final output
* Self-Consistency

In [59]:
### We can see the model response to our promptZeroShot prompting

In [60]:
responseAgri = client.chat(model='llama3', messages=[{'role': 'user', 'content': promptZeroShot.format(concept='planting')}])


In [61]:
print(responseAgri['message']['content'])

Little buddy! So, you know how we need food to eat every day? Like fruits and vegetables? Well, plants grow those yummy foods for us! And planting is the special way we help them grow.

Planting is like giving a little home to a tiny seed. We take that seed, and we put it in the ground, with some dirt and water around it. Then, we give it some love and care, so it can grow into a big strong plant!

Think of it like building a block tower. You start with one block, then you add another, and another, until your tower gets really tall! With plants, we start with a small seed, then we help it grow bigger and stronger by giving it the right food (like sunlight and water), and keeping it safe from harm.

There are lots of special things we do to help plants grow:

1. **Seeding**: We put the tiny seeds in the ground.
2. **Watering**: We give them a drink, so they don't get thirsty!
3. **Sunlight**: We make sure they get plenty of sunshine, like when you play outside on a sunny day!
4. **Ferti

# LCEL: LangChain Expression Langage

## Output parser
It sets the format of response output. Among them , we have
* JsonOutputParser
* CvOutputParser
* StrOutputParser

In [62]:
llm = ChatOllama(model='llama3')
parser = StrOutputParser()

In [63]:
chain = promptZeroShot | llm | parser 

In [64]:
responseChain = chain.invoke({'concept':'engrais'})

In [29]:
print(responseChain)

Yay! Let's talk about ENGRAIS!

So, you know how we need food to eat, right? Like fruits and veggies and grains like bread?

Well, plants need food too! And that's where ENGRAIS comes in.

Engrais is just a fancy word for FERTILIZER. It's like special food for the plants that helps them grow big and strong.

Just like how we need vitamins and minerals to be healthy, plants need certain nutrients to grow well. And that's where engrais comes in!

Engrais has all sorts of good stuff like nitrogen, phosphorus, and potassium that help plants make leaves, stems, and roots. It's like a special boost for the plant's growth!

Farmers use engrais to help their crops grow big and healthy. They put it on the soil around the plants, and it helps the plants absorb all the good nutrients they need.

Just like how we take medicine when we're sick, farmers use engrais to keep their plants healthy and strong. And that means we get yummy food to eat!

So, that's what engrais is! It's like a special food 

In [75]:
prompt = ChatPromptTemplate.from_messages([
    ("system","you are an expert in agriculture"),
    ("user","respond to this question:{question} like you explain to a {user_type}. If you dont know , say it instead of giving falsy response")
])

chain2 = prompt | llm | parser 

input_ = {"question":"how to increase his productivity","user_type":"kid"}

response2 = chain2.invoke(input_)

In [88]:
# Jsonoutputparser


# Create your JSON parser
json_parser = JsonOutputParser()

# Create more explicit format instructions
format_instructions = """RESPONSE FORMAT: Return ONLY a single JSON object—no markdown, no examples, no extra keys.  It must look exactly like:
{
  "title": "movie title",
  "director": "director name",
  "year": 2000,
  "genre": "movie genre"
}

IMPORTANT: Your response must be *only* that JSON.  Do NOT include any illustrative or example JSON."""

# Create your prompt template with clearer instructions
prompt_template = PromptTemplate(
    template="""You are a JSON-only assistant.

Task: Generate info about the movie "{movie_name}" in JSON format.

{format_instructions}
""",
    input_variables=["movie_name"],
    partial_variables={"format_instructions": format_instructions},
)

# Create the chain without cleaning step
movie_chain = prompt_template | llm | json_parser

# Test with a movie name
movie_name = "The Matrix"
result = movie_chain.invoke({"movie_name": movie_name})

# Print the structured result
print("Parsed result:")
print(f"Title: {result['title']}")
print(f"Director: {result['director']}")
print(f"Year: {result['year']}")
print(f"Genre: {result['genre']}")


Parsed result:
Title: The Matrix
Director: The Wachowskis
Year: 1999
Genre: Science Fiction


# Document
Langchain allows us to handle documents. It contains 2 part:
* page_content: We put there the document text
* Metadata: It contains some informations about the document. It can be id, author ...It is not obliged to set this params

# Documents loader
They aimed at load documents . It can be pdf , html

In [121]:
from langchain_core.documents import Document
from langchain_community.document_loaders import PyPDFLoader ,WebBaseLoader
from langchain_text_splitters import CharacterTextSplitter


In [98]:
document  = Document(
    page_content='Math is wonderful',
    metadata={
        "document_id":"99",
        "document_author":"Paul yves etiens"
    }
)


page_content='Math is wonderful' metadata={'document_id': '99', 'document_author': 'Paul yves etiens'}


In [100]:
loader = PyPDFLoader("https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/96-FDF8f7coh0ooim7NyEQ/langchain-paper.pdf")

doc = loader.load()
print(doc)

[Document(metadata={'producer': 'PyPDF', 'creator': 'Microsoft Word', 'creationdate': '2023-12-31T03:50:13+00:00', 'author': 'IEEE', 'moddate': '2023-12-31T03:52:06+00:00', 'title': 's8329 final', 'source': 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/96-FDF8f7coh0ooim7NyEQ/langchain-paper.pdf', 'total_pages': 6, 'page': 0, 'page_label': '1'}, page_content="* corresponding author - jkim72@kent.edu \nRevolutionizing Mental Health Care through \nLangChain: A Journey with a Large Language \nModel\nAditi Singh \n Computer Science  \n Cleveland State University  \n a.singh22@csuohio.edu \nAbul Ehtesham  \nThe Davey Tree Expert \nCompany  \nabul.ehtesham@davey.com \nSaifuddin Mahmud  \nComputer Science & \nInformation Systems  \n Bradley University  \nsmahmud@bradley.edu  \nJong-Hoon Kim* \n Computer Science,  \nKent State University,  \njkim72@kent.edu \nAbstract— Mental health challenges are on the rise in our \nmodern society, and the imperative to address mental di

In [101]:
print(doc[2]) # That way we can see the second page

page_content='Figure 2. An AIMessage illustration 
C. Prompt Template 
Prompt templates [10] allow you to structure input for LLMs. 
They provide a convenient way to format user inputs and 
provide instructions to generate responses. Prompt templates 
help ensure that the LLM understands the desired context and 
produces relevant outputs. 
The prompt template classes in LangChain are built to 
make constructing prompts with dynamic inputs easier. Of 
these classes, the simplest is the PromptTemplate. 
D. Chain 
Chains [11] in LangChain refer to the combination of 
multiple components to achieve specific tasks. They provide 
a structured and modular approach to building language 
model applications. By combining different components, you 
can create chains that address various u se cases and 
requirements. Here are some advantages of using chains: 
• Modularity: Chains allow you to break down 
complex tasks into smaller, manageable 
components. Each component can be developed and 
teste

In [103]:
print(doc[3].page_content[500:]) 

atbot in providing 
mental health support. It assures users of a safe and 
confidential space to express their concerns.  
Step 2. User Input - Prompt: Users can input mental health-
related questions or seek advice by typing their queries 
into the input box integrated into the Streamlit interface. 
Step 3. Data Transfer to LangChain: Implement the 
functionality that sends the user's input (question) as a 
chat prompt template to the LangChain framework. This 
input serves as the "human message prompt" template. 
Step 4. LangChain Framework: In this phase, the LangChain 
framework serves as the backbone of the chatbot, where 
all the foundational components and building blocks are 
meticulously orchestrated. Here's a deeper dive into the 
critical elements of LangChain Processing: 
• ChatMessage and Prompt Templates:  Within 
LangChain, the chatbot's core communication 
infrastructure is established by  creating 
ChatMessage and prompt templates for optimal 
chatbot engagement. 
• LL

# Loading a website content

In [106]:
loader = WebBaseLoader("https://python.langchain.com/v0.2/docs/introduction/")

web_data = loader.load()

print(web_data[0].page_content[:700])

LangChain overview - Docs by LangChainSkip to main contentWe've raised a $125M Series B to build the platform for agent engineering. Read more.Docs by LangChain home pageLangChain + LangGraphSearch...⌘KGitHubTry LangSmithTry LangSmithSearch...NavigationLangChain overviewLangChainLangGraphDeep AgentsIntegrationsLearnReferenceContributePythonOverviewLangChain v1.0Release notesMigration guideGet startedInstallQuickstartPhilosophyCore componentsAgentsModelsMessagesToolsShort-term memoryStreamingMiddlewareStructured outputAdvanced usageGuardrailsRuntimeContext engineeringModel Context Protocol (MCP)Human-in-the-loopMulti-agentRetrievalLong-term memoryUse in productionStudioTestDeployAgent Chat UI


In [129]:
# chunk_overlap=20, That way, we keep context
text_splitter = CharacterTextSplitter(chunk_size=200, chunk_overlap=20, separator="\n")

chunks = text_splitter.split_documents(doc)

print('Chunk length',len(chunks))
print(chunks[100].page_content)

Chunk length 147
challenges or concerns you may have. Please feel free 
to share what's on your mind, and we'll work together 
to address your needs. Remember, this is a safe and


# Loading a document and splitting it

In [134]:

# Load the LangChain paper
paper_url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/96-FDF8f7coh0ooim7NyEQ/langchain-paper.pdf"
pdf_loader = PyPDFLoader(paper_url)
pdf_document = pdf_loader.load()

# Load content from LangChain website
web_url = "https://python.langchain.com/v0.2/docs/introduction/"
web_loader = WebBaseLoader(web_url)
web_document = web_loader.load()

# Create two different text splitters
splitter_1 = CharacterTextSplitter(chunk_size=300, chunk_overlap=30, separator="\n")
splitter_2 = CharacterTextSplitter(chunk_size=400,chunk_overlap=20,separator="\n")

# Apply both splitters to the PDF document
chunks_1 = splitter_1.split_documents(pdf_document)
chunks_2 = splitter_2.split_documents(pdf_document)

# Define a function to display document statistics
def display_document_stats(docs, name):
    """Display statistics about a list of document chunks"""
    total_chunks = len(docs)
    total_chars = sum(len(doc.page_content) for doc in docs)
    avg_chunk_size = total_chars / total_chunks if total_chunks > 0 else 0
    
    # Count unique metadata keys across all documents
    all_metadata_keys = set()
    for doc in docs:
        all_metadata_keys.update(doc.metadata.keys())
    
    # Print the statistics
    print(f"\n=== {name} Statistics ===")
    print(f"Total number of chunks: {total_chunks}")
    print(f"Average chunk size: {avg_chunk_size:.2f} characters")
    print(f"Metadata keys preserved: {', '.join(all_metadata_keys)}")
    
    if docs:
        print("\nExample chunk:")
        example_doc = docs[min(5, total_chunks-1)]  # Get the 5th chunk or the last one if fewer
        print(f"Content (first 150 chars): {example_doc.page_content[:150]}...")
        print(f"Metadata: {example_doc.metadata}")
        
        # Calculate length distribution
        lengths = [len(doc.page_content) for doc in docs]
        min_len = min(lengths)
        max_len = max(lengths)
        print(f"Min chunk size: {min_len} characters")
        print(f"Max chunk size: {max_len} characters")

# Display stats for both chunk sets
display_document_stats(chunks_1, "Splitter 1")
display_document_stats(chunks_2, "Splitter 2")


=== Splitter 1 Statistics ===
Total number of chunks: 95
Average chunk size: 263.80 characters
Metadata keys preserved: author, source, creator, title, total_pages, producer, page_label, page, creationdate, moddate

Example chunk:
Content (first 150 chars): comprehensive support within the field of mental health. 
Additionally, the paper discusses the implementation of 
Streamlit to enhance the user ex pe...
Metadata: {'producer': 'PyPDF', 'creator': 'Microsoft Word', 'creationdate': '2023-12-31T03:50:13+00:00', 'author': 'IEEE', 'moddate': '2023-12-31T03:52:06+00:00', 'title': 's8329 final', 'source': 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/96-FDF8f7coh0ooim7NyEQ/langchain-paper.pdf', 'total_pages': 6, 'page': 0, 'page_label': '1'}
Min chunk size: 49 characters
Max chunk size: 299 characters

=== Splitter 2 Statistics ===
Total number of chunks: 71
Average chunk size: 353.62 characters
Metadata keys preserved: author, source, creator, title, total_pages, p

# Chroma

In [148]:
embedding = OllamaEmbeddings(model="llama3")

docsearch = Chroma.from_documents(chunks_1,embedding)

In [None]:
QUERY = 'langchain'

docs = docsearch.similarity_search(QUERY)
print(docs.page_content[0])