
### Step 1: Install Required Libraries

In the first cell of your Colab notebook, install the necessary libraries. LangChain and OpenAI are the primary libraries needed.

In [1]:
!pip install httpx==0.23.0



In [None]:
!pip install langchain_community kagglehub

In [8]:
!pip install -qU langchain-openai

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/50.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.7/50.7 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m390.3/390.3 kB[0m [31m10.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m29.3 MB/s[0m eta [36m0:00:00[0m
[?25h


- **LangChain**: A framework for building applications with language models.



### Step 2: **Import Libraries**

After installing the library, import the necessary modules for your code.

In [9]:
from langchain_openai import OpenAI
from langchain import LLMChain
from langchain.prompts import PromptTemplate
import kagglehub
import pandas as pd



### Step 3: **Set Up OpenAI API Key**

You'll need an OpenAI API key to access the language model. You can obtain this key from [OpenAI's website](https://openai.com/api/). Once you have the key, set it in your environment.

In [10]:
# Set your OpenAI API key
import os
os.environ["OPENAI_API_KEY"] ="YOUR-OPENAI_API_KEY"


### Step 4: **Import Dataset from KaggleHub**


In [4]:
# Download latest version
path = kagglehub.dataset_download("kyanyoga/sample-sales-data")
print("Path to dataset files:", path)

Path to dataset files: /root/.cache/kagglehub/datasets/kyanyoga/sample-sales-data/versions/1



* We added the encoding parameter to the pd.read_csv function and set it to `'latin-1'`. This tells pandas to use the 'latin-1' encoding to decode the file instead of the default UTF-8.

In [5]:
# Load the CSV file
file_path = f"{path}/sales_data_sample.csv"
sales_data = pd.read_csv(file_path, encoding='latin1')


#### Step 5: **Filter the Large Dataset & Define the Prompt to Chain with the Input Data**

Create a prompt and set up the LangChain to generate the sales report.




In [11]:
# Filter data for the last quarter (QTR_ID = 4)
last_quarter_data = sales_data[sales_data['QTR_ID'] == 4]

# Aggregate necessary metrics
revenue = last_quarter_data['SALES'].sum()
units_sold = last_quarter_data['QUANTITYORDERED'].sum()
top_product = (
    last_quarter_data.groupby('PRODUCTCODE')['SALES'].sum().idxmax()
)

# Prepare the data for LangChain
input_data = {
    "revenue": f"${revenue:,.2f}",
    "units_sold": units_sold,
    "top_product": top_product,
}

# Define the prompt
report_prompt = """
Generate a sales report for last quarter.
Revenue: {revenue}
Units Sold: {units_sold}
Top Product: {top_product}
"""

# Create a prompt template
prompt_template = PromptTemplate.from_template(report_prompt)

# Set up the LLMChain
chain = LLMChain(llm=OpenAI(), prompt=prompt_template)


### Step 6: **Run the Chain and Print the Response**

Select the data for the last quarter and run the chain.

In [12]:
# Run the chain with the data
response = chain.invoke(input_data)

# Print the response
print(response)

{'revenue': '$3,874,780.01', 'units_sold': 38148, 'top_product': 'S18_3232', 'text': '\nSales Report for Last Quarter:\n\nRevenue: $3,874,780.01\nUnits Sold: 38,148\nAverage Revenue per Unit: $101.62\n\nTop Product: S18_3232\nUnits Sold: 1,764\nRevenue: $211,681.28\n\nOther Top Selling Products:\n1. S24_2000 - Units Sold: 1,501, Revenue: $189,099.26\n2. S18_4600 - Units Sold: 1,416, Revenue: $141,468.84\n3. S18_3029 - Units Sold: 1,297, Revenue: $129,700.03\n4. S24_1937 - Units Sold: 1,263, Revenue: $152,149.73\n5. S18_1662 - Units Sold: 1,246, Revenue: $149,520.92\n\nTotal Revenue: $3,874,780.01\nTotal Units Sold: 38,148\nAverage Revenue per Unit: $101.62'}



### Additional Resources

- **OpenAI API Documentation**: [OpenAI API Docs](https://beta.openai.com/docs/)
- **Langchain GitHub**: [Langchain GitHub](https://github.com/langchain/langchain)