# Intro

In this example, I am using LangChain to build a simple LLM-powered assistant. I will focus on using:
1. Google Gemini
2. OpenAI gpt-4o-mini

In [1]:
# !pip install -qU \
#   langchain-core==0.3.33 \
#   langchain-openai==0.3.3 \
#   langchain-community==0.3.16

First need to initialize an LLM. I'll use _Google Gemini_ or _OpenAI 'gpt-4.1-nano'_ model. You can get an API key from [OpenAI](https://platform.openai.com/settings/organization/api-keys).

# Problem Setting

I'll take a `new_report.txt`, `metrics.csv`, and `training_examples.csv` and use LangChain to:
1. Score the new report
2. Extract from the new_report 3 pieces of supporting evidence for this Score

# Initialize LLM

In [None]:
AI_Provider = "GOOGLE"
# LLM_Provider = "OPENAI"

if AI_Provider == "GOOGLE":
    llm_model = "gemini-2.0-flash"
elif AI_Provider == "OPENAI":
    llm_model = "gpt-4.1-nano"  # less expensive than gpt-4o-mini

In [None]:
from dotenv import load_dotenv
import os
from getpass import getpass


load_dotenv()  # Try to load local .env file (for local dev); silently skip if not found (for CI)
os.environ[f"{AI_Provider}_API_KEY"] = os.getenv(f"{AI_Provider}_API_KEY") or getpass(f"Enter {AI_Provider} API Key: ")  # Get API key from environment or user input
if os.getenv(f"{AI_Provider}_API_KEY") is None:
    raise ValueError(f"❌ {AI_Provider}_API_KEY not found. Make sure it's in your .env file or set as a GitHub Action secret.")
else:
    print(f"✅ {AI_Provider}_API_KEY loaded successfully (not printing it for security).")

✅ GOOGLE_API_KEY loaded successfully (not printing it for security).


In [None]:
if AI_Provider == "GOOGLE":
    from langchain_google_genai import ChatGoogleGenerativeAI as ChatLLM
elif AI_Provider == "OPENAI":
    from langchain_openai import ChatOpenAI as ChatLLM

llm = ChatLLM(temperature=0.0, model=llm_model)  # For normal accurate responses
# print("My LLM version is:", llm.invoke("What LLM version are you?").content)

My LLM version is: I'm currently running on the Gemini Pro model.


Since we output _several fields_ we'll specify for the LLM to use __structured outputs__ to make the generated fields aligned with our requirements.
or this we create a _pydantic object_ and describe the required output format - this format description is then passed to our model using the `with_structured_output` method:

In [None]:
from pydantic import BaseModel, Field


class CompanyScore(BaseModel):
    Company: str = Field(..., description="The name of the company")
    MetricID: int = Field(..., description="The ID of the metric")
    Score: int = Field(..., description="Score must be 1, 2, or 3")  # Accepts only int values 1, 2, or 3
    Reason1: str = Field(..., description="First reason for the score")
    Reason2: str = Field(..., description="Second reason for the score")
    Reason3: str = Field(..., description="Third reason for the score")


llm_structured = llm.with_structured_output(CompanyScore)

# Create VectorDB from TXT

In [6]:
chunk_size = 250  # FIXME: calibrate
chunk_overlap = 50  # FIXME: calibrate

In [7]:
# Read the IBM report file into a string
with open('data/new_report_IBM.txt', 'r', encoding='utf-8') as file:
    report = file.read()
print("Report loaded successfully!")
print(f"Report length: {len(report)} characters")
print("First 200 characters of the report:")
print(report[:200] + "...")

Report loaded successfully!
Report length: 2591 characters
First 200 characters of the report:
Balanced Power of IBM

IBM’s latest sustainability report opens with a bold statement: “IBM is committed to achieving net-zero GHG Scope 1 emissions by 2050 through a comprehensive, three-phase decarb...


In [8]:
from langchain.text_splitter import RecursiveCharacterTextSplitter  # to split YouTube transcript into chunks
from langchain_core.documents import Document

documents = [Document(page_content=report)]  # wrap the report string in a Document
text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)  # initilize the text splitter
docs = text_splitter.split_documents(documents)  # split report into overlapping chunks
print(f'The number of chunks is {len(docs)}')
# # test
# print(f'Test: docs[0].page_content:\n{docs[0].page_content}')
# for i, doc in enumerate(docs):
#     print(f'Test: docs[{i}].page_content:\n{doc.page_content}')

The number of chunks is 16


In [9]:
from langchain_huggingface import HuggingFaceEmbeddings  # Alternative:  from langchain.embeddings.openai import OpenAIEmbeddings
    # Try:          from langchain_google_genai.embeddings import GoogleGenerativeAIEmbeddings

# define how to map a string (can be a sentence or a paragraph) to a vector
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")  # This model is small (~80MB), fast on CPU, good for English # Alternative: ultra-fast memory-light (~45MB): model_name="sentence-transformers/paraphrase-MiniLM-L3-v2" # Alternative: embeddings = OpenAIEmbeddings()
# # Test: one embed string
# embed_test = embeddings.embed_query("What company is the report about?")
# print(f'Test: len(embeddings)={len(embed_test)}, embeddings[:5]={embed_test[:5]}')  # Should return a 384-dim vector


  from .autonotebook import tqdm as notebook_tqdm


In [10]:
from langchain_community.vectorstores import FAISS  # Vector Database (indexes); alternatives: Pinecon, Weaviate
db = FAISS.from_documents(docs, embeddings)  # create a DB of vector embeddings from the docs
print(f'The number of vectors in the DB is {db.index.ntotal}')

# db.save_local("data/faiss_index")
# embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")  # This model is small (~80MB), fast on CPU, good for English # Alternative: ultra-fast memory-light (~45MB): model_name="sentence-transformers/paraphrase-MiniLM-L3-v2" # Alternative: embeddings = OpenAIEmbeddings()
# db = FAISS.load_local("data/faiss_index", embeddings)

The number of vectors in the DB is 16


In [11]:
def make_vectordb_from_report(report_filename: str) -> FAISS:
    return db # TODO: Implement this function

# Simularity Search

In [12]:
from langchain.prompts import SystemMessagePromptTemplate, HumanMessagePromptTemplate
def prep_simularity_search_query(metrics, metric_id:int) -> str:

    user_prompt = HumanMessagePromptTemplate.from_template(
        """{metric_name} metric:\n{metric_description}""",
        input_variables=["metric_name", "metric_description"]
    )

    query = user_prompt.format(
        metric_name = metrics[metrics["MetricID"]==metric_id]["MetricName"].values[0],
        metric_description = metrics[metrics["MetricID"]==metric_id]["MetricDescription"].values[0]).content

    return query

In [13]:
def simularity_search(query:str, db, chunks_number:int=4) -> str:
    docs = db.similarity_search(query, k=chunks_number)  # find chunks_number docs similar to the user's query; FAISS does the similarity search
    new_company_report_chunks_summary = " ".join([doc.page_content for doc in docs])  # combine "page_content" fields from each of the found docs
    return new_company_report_chunks_summary

# Prompt

In [14]:
import pandas as pd
metrics_filename = 'data/metrics.csv'
metrics = pd.read_csv(metrics_filename)
# print(metrics.iloc[0]["MetricDescription"])
# metrics.head()

In [15]:
train_examples_filename = 'data/train_examples.csv'
train_examples = pd.read_csv(train_examples_filename)
# train_examples.head(1)

In [16]:
def prep_prompt_inputs(new_company, metrics, metric_id, train_examples, new_company_report_chunks_summary) -> dict:
    prompt_inputs = {
        'new_company': new_company,
        'metric_id': metric_id,
        'metric_name': metrics[metrics["MetricID"]==metric_id]["MetricName"].values[0],
        'metric_description': metrics[metrics["MetricID"]==metric_id]["MetricDescription"].values[0]
    }

    # add inputs from exaples
    df = train_examples[train_examples['MetricID']==metric_id].reset_index(drop=True)
    assert len(df) == 3, "Expected exactly 3 example companies for 1 metric"  #Safety check  #FIXME:if we add more trainig examples later

    prompt_inputs['Company_1'] = df.loc[0, 'Company']
    prompt_inputs['Score_1'] = df.loc[0, 'Score']
    prompt_inputs['Reason1_1'] = df.loc[0, 'Reason1']
    prompt_inputs['Reason2_1'] = df.loc[0, 'Reason2']
    prompt_inputs['Reason3_1'] = df.loc[0, 'Reason3']
    
    prompt_inputs['Company_2'] = df.loc[1, 'Company']
    prompt_inputs['Score_2'] = df.loc[1, 'Score']
    prompt_inputs['Reason1_2'] = df.loc[1, 'Reason1']
    prompt_inputs['Reason2_2'] = df.loc[1, 'Reason2']
    prompt_inputs['Reason3_2'] = df.loc[1, 'Reason3']
    
    prompt_inputs['Company_3'] = df.loc[2, 'Company']
    prompt_inputs['Score_3'] = df.loc[2, 'Score']
    prompt_inputs['Reason1_3'] = df.loc[2, 'Reason1']
    prompt_inputs['Reason2_3'] = df.loc[2, 'Reason2']
    prompt_inputs['Reason3_3'] = df.loc[2, 'Reason3']

    prompt_inputs['new_company_report_chunks_summary'] = new_company_report_chunks_summary
    
    return prompt_inputs

In [17]:
from langchain.prompts import SystemMessagePromptTemplate, HumanMessagePromptTemplate
# Defining the system prompt (how the AI should act)
system_prompt = SystemMessagePromptTemplate.from_template('You are a sustainability consultant tasked to score a company against the provided metric. Score can be: 1, 2, or 3.')

# the user prompt is provided by the user, in this case however the only dynamic input is the article
user_prompt = HumanMessagePromptTemplate.from_template(
    """# You need to score the company "{new_company}" against the metric and criteria (provided below) and provide 3 reasons for the score.
        ## The output should be a JSON object with the following fields (no other explanation or text or fields are allowed):
        - Company: the name of the company
        - MetricID: MetricID
        - Score: the score of the company
        - Reason1: first reason for the score
        - Reason2: second reason for the score
        - Reason3: third reason for the score
    
    MetricID: {metric_id}
    Metric name: {metric_name}
    Scoring criteria: {metric_description}

    # Below are examples of the scoring applied to 3 companies:
    Company 1: {Company_1}
    Score: {Score_1}
    Reason 1: {Reason1_1}
    Reason 2: {Reason2_1}
    Reason 3: {Reason3_1}

    Company 2: {Company_2}
    Score: {Score_2}
    Reason 1: {Reason1_2}
    Reason 2: {Reason2_2}
    Reason 3: {Reason3_2}

    Company 3: {Company_3}
    Score: {Score_3}
    Reason 1: {Reason1_3}
    Reason 2: {Reason2_3}
    Reason 3: {Reason3_3}
    
    # The report of the company "{new_company}" is:
    {new_company_report_chunks_summary}
    """,

    input_variables=["metric_id", "metric_name", "metric_description", "new_company",
        "Company_1", "Score_1", "Reason1_1", "Reason2_1", "Reason3_1",
        "Company_2", "Score_2", "Reason1_2", "Reason2_2", "Reason3_2",
        "Company_3", "Score_3", "Reason1_3", "Reason2_3", "Reason3_3", "new_company_report_chunks_summary"]
)

Now we can merge the system and user prompts into a full chat prompt using the `ChatPromptTemplate`:

In [18]:
from langchain.prompts import ChatPromptTemplate

# prompt template 1: create an article title
prompt = ChatPromptTemplate.from_messages([system_prompt, user_prompt])

`ChatPromptTemplate` prefixes each individual message with it's role, ie `System:`, `Human:`, or `AI:`.

In [19]:
# prompt.messages

By default, the `ChatPromptTemplate` will read the `input_variables` from each of the prompt templates inserted and allow us to use those input variables when formatting the full chat prompt template:

In [20]:
# print(prompt.format(**prompt_inputs))

# Main Pipeline

Now chain the `prompt` template and the `llm` object defined earlier to create an LLM chain for **prompt formatting > llm generation > get output**.

Let's use __LCEL__ to construct the chain: define inputs with `{"metric_id": lambda x: x["metric_id"], ...}` and use the pipe operator (`|`) to feed the output from its left into the input to its right.

In [21]:
# Chain outputs (for the given metric) the new company's score and provides 3 reasons
chain = (
    {
        "metric_id": lambda x: x["metric_id"],
        "metric_name": lambda x: x["metric_name"],
        "metric_description": lambda x: x["metric_description"],
        "new_company": lambda x: x["new_company"],
        "Company_1": lambda x: x["Company_1"],
        "Score_1": lambda x: x["Score_1"],
        "Reason1_1": lambda x: x["Reason1_1"],
        "Reason2_1": lambda x: x["Reason2_1"],
        "Reason3_1": lambda x: x["Reason3_1"],
        "Company_2": lambda x: x["Company_2"],
        "Score_2": lambda x: x["Score_2"],
        "Reason1_2": lambda x: x["Reason1_2"],
        "Reason2_2": lambda x: x["Reason2_2"],
        "Reason3_2": lambda x: x["Reason3_2"],
        "Company_3": lambda x: x["Company_3"],
        "Score_3": lambda x: x["Score_3"],
        "Reason1_3": lambda x: x["Reason1_3"],
        "Reason2_3": lambda x: x["Reason2_3"],
        "Reason3_3": lambda x: x["Reason3_3"],
        "new_company_report_chunks_summary": lambda x: x["new_company_report_chunks_summary"]
    }
    | prompt
    | llm_structured
    | {
        "Company": lambda llm_output: llm_output.Company,
        "MetricID": lambda llm_output: llm_output.MetricID,
        "Score": lambda llm_output: llm_output.Score,
        "Reason1": lambda llm_output: llm_output.Reason1,
        "Reason2": lambda llm_output: llm_output.Reason2,
        "Reason3": lambda llm_output: llm_output.Reason3
        }
)

In [22]:
new_company = "IBM"  # FIXME

new_company_scores_df = pd.DataFrame(columns=train_examples.columns)
new_company_scores_df

Unnamed: 0,Company,MetricID,Score,Reason1,Reason2,Reason3


In [None]:
# # Run 1 iteration of the below for loop
# i = 0

# metric_id = metrics.iloc[i]["MetricID"]
# print(f"> MetricID: {metric_id}")

# query = prep_simularity_search_query(metrics, metric_id)
# print(query)

# new_company_report_chunks_summary = simularity_search(query, db, chunks_number=3)
# print(f"\n> new_company_report_chunks_summary: {new_company_report_chunks_summary}")

# prompt_inputs = prep_prompt_inputs(new_company, metrics, metric_id, train_examples, new_company_report_chunks_summary)

# # output = llm_structured.invoke(prompt.format(**prompt_inputs))  # run the LLM to score the new company and extract 3 reasons for the score
# output = chain.invoke(prompt_inputs)
# print("\n> ", type(output))
# print("\n> ", output)

In [23]:
for i in range(len(metrics)):
    metric_id = metrics.iloc[i]["MetricID"]
    # print(f"MetricID: {metric_id}")

    query = prep_simularity_search_query(metrics, metric_id)
    # print(query)

    new_company_report_chunks_summary = simularity_search(query, db, chunks_number=3)
    # print(f"new_company_report_chunks_summary: {new_company_report_chunks_summary}")

    prompt_inputs = prep_prompt_inputs(new_company, metrics, metric_id, train_examples, new_company_report_chunks_summary)
    output = llm.invoke(prompt.format(**prompt_inputs))  # run the LLM to score the new company and extract 3 reasons for the score
    
    # llm_output = llm_structured.invoke(prompt.format(**prompt_inputs))
    # print(output)
    output = chain.invoke(prompt_inputs)
    new_company_scores_df = pd.concat([new_company_scores_df, pd.DataFrame(output, index=[0])], ignore_index=True)

# save new_company_scores_df to new_company_scores.csv
new_company_scores_df.to_csv("data/new_company_scores.csv", index=False)
new_company_scores_df

Unnamed: 0,Company,MetricID,Score,Reason1,Reason2,Reason3
0,IBM,1,2,IBM will get to net zero by 2050 via a three-p...,It'll allow GHG S1 to reach these levels: 2020...,"IBM hit these GHG S1 levels: 2020 - 80 tons, 2..."
1,IBM,2,3,IBM employees run 100m in 14.8s.,IBM employees can do 40 pushups.,IBM employees can do 60 jumps.
2,IBM,3,1,Panelist said math is just a rough guideline.,Modern discourse sometimes sacrifices intellec...,Some educational narratives veer into pseudosc...


## <span style="color:red"> __NEXT__ </span>