# Langchain (Core Framework)

Langchain is a powerful framework built to help developers create applications that utilize language models for complex workflows. It‚Äôs designed to make it easier to build chains of prompts, connect different models, and orchestrate these models for real-world tasks like chatbots, document processing, and data extraction.

In [1]:
# Install Langchain if not
# !pip install langchain
# !pip install langchain-community

In [2]:
from IPython.display import Markdown, display, update_display
import time
from langchain_community.llms import Ollama

In [3]:
tell_a_joke = [
    {"role": "user", "content": "Tell a joke for a student on the journey to becoming an expert in LLM Engineering"},
]

In [4]:
llm = Ollama(model="gemma3")

  llm = Ollama(model="gemma3")


In [5]:
start_time = time.time()
response_langchan = llm.invoke(tell_a_joke)
print(time.time() - start_time)
display(Markdown(response_langchan))

11.353596687316895


Okay, here's a joke for a student on the journey to becoming an LLM Engineering expert:

Why did the LLM break up with the prompt? 

... Because it said, "I need more context! You're not giving me enough *parameters*!" 

---

Hope that brought a smile to your face! Let me know if you'd like another one. üòä 

Do you want a joke related to a specific aspect of LLM engineering (like fine-tuning, prompt engineering, or evaluation)?

In [7]:
# from langchain_community.chat_models import ChatOllama
# from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
# from langchain_core.output_parsers import StrOutputParser
# from langchain.memory import ConversationBufferMemory
# from langchain.chains import LLMChain

# llm = ChatOllama(model="gemma3")

# memory = ConversationBufferMemory(return_messages=True)

# prompt = ChatPromptTemplate.from_messages([
#     ("system", "You are a friendly assistant who remembers context."),
#     MessagesPlaceholder(variable_name="history"),
#     ("human", "{message}")
# ])

# chain = LLMChain(
#     llm=llm,
#     prompt=prompt,
#     memory=memory
# )

# print(chain.invoke({"message": "Hi, my name is John."}))
# print(chain.invoke({"message": "What is my name?"}))


In [8]:
# from langchain_community.chat_models import ChatOllama
# from langchain_core.prompts import ChatPromptTemplate
# from langchain_core.output_parsers import StrOutputParser

# llm = ChatOllama(model="gemma3")

# prompt = ChatPromptTemplate.from_messages([
#     ("system", "You are a witty comedian."),
#     ("human", "{question}")
# ])

# chain = prompt | llm | StrOutputParser()

# print(chain.invoke({"question": "Tell me a joke."}))


In [9]:
# from langchain_community.chat_models import ChatOllama
# from langchain_core.prompts import ChatPromptTemplate
# from langchain_core.output_parsers import StrOutputParser


# class OllamaChatChain:
#     def __init__(self, model_name="gemma3", system_prompt="You are a helpful assistant."):
#         # LLM
#         self.llm = ChatOllama(model=model_name)

#         # Prompt template with system + human messages
#         self.prompt = ChatPromptTemplate.from_messages([
#             ("system", system_prompt),
#             ("human", "{user_input}")
#         ])

#         # Create the chain
#         self.chain = self.prompt | self.llm | StrOutputParser()

#     def __call__(self, text):
#         return self.chain.invoke({"user_input": text})


# # ---- USAGE ----
# chat = OllamaChatChain(system_prompt="You are a witty comedian.")

# print(chat("Tell me a joke about computers."))


# LiteLLM

LiteLLM is an open-source Python/JS library that provides a unified API for calling many different large-language-model providers using an OpenAI-style interface.

Think of it as a ‚Äúmodel router‚Äù or compatibility layer: you can switch between OpenAI, Anthropic, Google Gemini, Azure, local models (like Ollama), and others without changing your code.

In [10]:
# Install liteLLM if not installed
# !pip install litellm

In [11]:
from litellm import completion

response_litellm = completion(
            model="ollama/gemma3",
            messages = tell_a_joke,
            api_base="http://localhost:11434"
)

In [12]:
print(response_litellm)

ModelResponse(id='chatcmpl-118c30b2-080d-44d0-ae53-1f7b4bc49ef2', created=1763528588, model='ollama/gemma3', object='chat.completion', system_fingerprint=None, choices=[Choices(finish_reason='stop', index=0, message=Message(content='Okay, here‚Äôs a joke for a student on the journey to becoming an LLM Engineering expert:\n\nWhy did the LLM cross the road? \n\n‚Ä¶ To optimize its attention mechanism and generate a more nuanced response! \n\n---\n\nWould you like to hear another one, or maybe a joke related to a specific area of LLM engineering (like prompt engineering or fine-tuning)?', role='assistant', tool_calls=None, function_call=None, provider_specific_fields=None, reasoning_content=None))], usage=Usage(completion_tokens=81, prompt_tokens=31, total_tokens=112, completion_tokens_details=None, prompt_tokens_details=None))


In [13]:
reply = response_litellm.choices[0].message.content
display(Markdown(reply))

Okay, here‚Äôs a joke for a student on the journey to becoming an LLM Engineering expert:

Why did the LLM cross the road? 

‚Ä¶ To optimize its attention mechanism and generate a more nuanced response! 

---

Would you like to hear another one, or maybe a joke related to a specific area of LLM engineering (like prompt engineering or fine-tuning)?

In [14]:
print(f"Input tokens: {response_litellm.usage.prompt_tokens}")
print(f"Output tokens: {response_litellm.usage.completion_tokens}")
print(f"Total tokens: {response_litellm.usage.total_tokens}")
print(f"Total cost: {response_litellm._hidden_params["response_cost"]*100:.4f} cents")

Input tokens: 31
Output tokens: 81
Total tokens: 112
Total cost: 0.0000 cents


In [15]:
response_litellm._hidden_params

{'custom_llm_provider': 'ollama',
 'region_name': None,
 'optional_params': {},
 'litellm_call_id': '84494e4e-1557-4736-a086-7dc3b3038565',
 'api_base': 'http://localhost:11434',
 'model_id': None,
 'response_cost': 0.0,
 'additional_headers': {},
 'litellm_model_name': 'ollama/gemma3',
 '_response_ms': 4129.165999999999}

# Now - let's use LiteLLM to illustrate a Pro-feature: Prompt caching

**Prompt caching** is a technique used to speed up LLM applications and reduce cost by reusing (or partially reusing) results from previous model calls instead of recomputing them every time.

In [16]:
with open("/home/alexender/Downloads/hamlet.txt", 'r', encoding='utf-8')as f:
    hamlet = f.read()

loc = hamlet.find("Speak, man")
print(hamlet[loc:loc+100])

Speak, man.
  Laer. Where is my father?
  King. Dead.
  Queen. But not by him!
  King. Let him deman


In [30]:
question = [{"role": "user", "content": "In Hamlet, when Laertes asks 'Where is my father?' what is the reply?"}]

response = completion(
    model="ollama/gemma3",
    messages = question,
    api_base="http://localhost:11434"
)

reply = response.choices[0].message.content
display(Markdown(reply))

print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

The reply to Laertes‚Äôs question, ‚ÄúWhere is my father?‚Äù in Hamlet is:

**‚ÄúHe is gone to France.‚Äù**

This is delivered by Hamlet, and it's a deliberately vague and misleading response intended to frustrate and enrage Laertes, furthering the play's central themes of deception and suspicion.

Input tokens: 32
Output tokens: 68
Total tokens: 100
Total cost: 0.0000 cents


In [31]:
# Some times it can answer wrongly so we are adding the context

question[0]["content"] = "\n\nFor context, here is the entire text of Hamlet:\n\n"+hamlet+question[0]["content"]

response = completion(
    model="ollama/gemma3",
    messages = question,
    api_base="http://localhost:11434"
)

reply = response.choices[0].message.content
display(Markdown(reply))

In *Hamlet*, when Laertes asks ‚ÄúWhere is my father?‚Äù Horatio replies:

‚Äú**He is dead, doubled, like a Valkyrie.**‚Äù 

This is a key moment in the play, revealing the truth about Hamlet's father‚Äôs death and setting the stage for the tragic events to come.

In [32]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Input tokens: 4096
Output tokens: 68
Total tokens: 4164
Total cost: 0.0000 cents
