## Objective

 Accepts user questions in natural language.

- Parses and understands data from a messy 85,000+ row dataset.

- Reads additional context from board meeting PDF and market summary TXT.

- Uses Gemini to generate business insights from combined data sources.

- Deploys the chatbot using Streamlit locally or on the web.

---
**Import Libraries**

In [27]:
import pandas as pd
import numpy as np
from datetime import datetime
import PyPDF2
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import seaborn as sns
from pathlib import Path
import warnings
warnings.filterwarnings("ignore")

---
**Load Dataset**

In [14]:
df=pd.read_csv('customer_orders.csv')
df.head()

Unnamed: 0,Order ID,Order Date,Customer Name,Product Category,Region,Sales Channel,Units Sold,Unit Price,Order Notes,Revenue
0,1,2023-01-31,Sarah Wright,Groceries,North,Wholesale,69,4143,Rich vote represent black three conference tru...,285867
1,2,2023-12-30,Zoe Prince,Clothing,North,Online,54,2311,Today draw story Mrs few beyond thank serve sc...,124794
2,3,2022-05-10,David Waters,Furniture,North,Wholesale,56,2685,Man control movement exist society according w...,150360
3,4,2023-07-18,Keith Wilcox,Books,North,Retail,1,4391,Reach here oil receive piece able heavy reveal...,4391
4,5,2023-02-04,Robert Lester,Clothing,West,Online,59,1421,Right quality bill money idea city bit.,83839


In [15]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 85000 entries, 0 to 84999
Data columns (total 10 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Order ID          85000 non-null  int64 
 1   Order Date        85000 non-null  object
 2   Customer Name     84150 non-null  object
 3   Product Category  84150 non-null  object
 4   Region            84150 non-null  object
 5   Sales Channel     85000 non-null  object
 6   Units Sold        85000 non-null  int64 
 7   Unit Price        85000 non-null  int64 
 8   Order Notes       85000 non-null  object
 9   Revenue           85000 non-null  int64 
dtypes: int64(4), object(6)
memory usage: 6.5+ MB


In [16]:
df.shape

(85000, 10)

---
**Handling missing values**

In [29]:
df.isnull().sum()

Order ID            0
Order Date          0
Customer Name       0
Product Category    0
Region              0
Sales Channel       0
Units Sold          0
Unit Price          0
Order Notes         0
Revenue             0
Month               0
dtype: int64

In [30]:
df[['Customer Name', 'Product Category', 'Region']] = df[['Customer Name', 'Product Category', 'Region']].fillna('Unknown')

In [31]:
df.isnull().sum()

Order ID            0
Order Date          0
Customer Name       0
Product Category    0
Region              0
Sales Channel       0
Units Sold          0
Unit Price          0
Order Notes         0
Revenue             0
Month               0
dtype: int64

---

In [32]:
df['Order Date']=pd.to_datetime(df['Order Date'])
df['Month']=df['Order Date'].dt.to_period('M')