**Week 2 - Stock Processing in Python**

**Tools : Python (pandas,numpy)**

**Capstone Tasks:**

- Load stock movement logs (CSV or JSON)

- Clean and transform data (e.g., date formatting, quantity validation)

- Calculate current stock levels using numpy

- Use pandas to flag items below reorder threshold

**Deliverables:**

 - PySpark script with group and filter logic
  
 - Output file with warehouse-level stock status

Load stock movement logs (CSV or JSON)

In [1]:
from google.colab import files
uploaded = files.upload()

Saving products.csv to products.csv
Saving stock_movements.csv to stock_movements.csv
Saving supplier.csv to supplier.csv
Saving warehouse.csv to warehouse.csv


In [2]:
import pandas as pd
import numpy as np

products = pd.read_csv("products.csv")
suppliers = pd.read_csv("supplier.csv")
warehouses = pd.read_csv("warehouse.csv")
stock_movements = pd.read_csv("stock_movements.csv")

In [3]:
print(products.head())
print(suppliers.head())
print(warehouses.head())
print(stock_movements.head())

   product_id                name                 description     category  \
0           1              LED TV            42 inch Smart TV  Electronics   
1           2   Organic Green Tea  Imported organic green tea      Grocery   
2           3    Wooden Bookshelf     5-tier sturdy bookshelf    Furniture   
3           4  Men's Denim Jacket           Blue denim jacket     Clothing   
4           5     Kindle E-reader       6 inch display device  Electronics   

   unit_price  reorder_level  status  supplier_id        created_at  \
0       35000             10  active       1000.0  01-08-2024 10:00   
1        1200             25  active       1001.0  01-08-2024 11:15   
2        8000              5  active       1002.0  02-08-2024 09:18   
3        2500              6  active       1000.0  02-08-2024 10:15   
4        8500             20  active       1003.0  02-08-2024 11:25   

       last_updated  
0  05-08-2024 09:30  
1  05-08-2024 11:15  
2  05-08-2024 14:23  
3  05-08-2024 13

Clean and transform data (e.g., date formatting, quantity validation)

In [4]:
products['product_id'] = pd.to_numeric(products['product_id'], errors='coerce')
products['name']= products['name'].fillna('Unknown')
products['description'] = products['description'].fillna('Unknown')
products['category'] = products['category'].fillna('Unknown')
products['unit_price'] = pd.to_numeric(products['unit_price'], errors='coerce').fillna(0)
products['reorder_level'] = pd.to_numeric(products['reorder_level'], errors='coerce').fillna(0).astype(int)
products['status'] = products['status'].fillna('inactive')
products['created_at'] = pd.to_datetime(products['created_at'], errors='coerce')
products['last_updated'] = pd.to_datetime(products['last_updated'], errors='coerce')
products['supplier_id'] = pd.to_numeric(products['supplier_id'], errors='coerce')
products.isnull().sum()

Unnamed: 0,0
product_id,0
name,0
description,0
category,0
unit_price,0
reorder_level,0
status,0
supplier_id,1
created_at,0
last_updated,0


In [5]:
suppliers['name']= suppliers['name'].fillna('Unknown')
suppliers['address'] = suppliers['address'].fillna('Unknown')
suppliers['phone'] = suppliers['phone'].fillna('Unknown')
suppliers['email'] = suppliers['email'].fillna('Unknown')
suppliers['gst_number'] = suppliers['gst_number'].fillna('Unknown')
suppliers['rating'] = pd.to_numeric(suppliers['rating'], errors='coerce').fillna(0).astype(int)
suppliers['supplier_id'] = pd.to_numeric(suppliers['supplier_id'], errors='coerce')
suppliers.isnull().sum()

Unnamed: 0,0
supplier_id,0
name,0
address,0
email,0
phone,0
gst_number,0
rating,0


In [6]:
warehouses['warehouse_id'] = pd.to_numeric(warehouses['warehouse_id'], errors='coerce')
warehouses['location'] = warehouses['location'].fillna('Unknown')
warehouses['manager_name'] = warehouses['manager_name'].fillna('Unknown')
warehouses['contact_number']= warehouses['contact_number'].fillna('Unknown')
warehouses['capacity'] = pd.to_numeric(warehouses['capacity'], errors='coerce').fillna(0).astype(int)
warehouses.isnull().sum()

Unnamed: 0,0
warehouse_id,0
location,0
capacity,0
manager_name,0
contact_number,0


In [7]:
stock_movements['movement_id'] = pd.to_numeric(stock_movements['movement_id'], errors='coerce')
stock_movements['product_id'] = pd.to_numeric(stock_movements['product_id'], errors='coerce')
stock_movements['warehouse_id'] = pd.to_numeric(stock_movements['warehouse_id'], errors='coerce')
stock_movements['quantity'] = pd.to_numeric(stock_movements['quantity'], errors='coerce').fillna(0).astype(int)
stock_movements['movement_type'] = stock_movements['movement_type'].fillna('Unknown')
stock_movements['movement_date'] = pd.to_datetime(stock_movements['movement_date'], errors='coerce')
stock_movements['reason_code']= stock_movements['reason_code'].fillna('Unknown')
stock_movements['recorded_by'] = stock_movements['recorded_by'].fillna('Unknown')
stock_movements['verified'] = stock_movements['verified'].fillna(False)

  stock_movements['verified'] = stock_movements['verified'].fillna(False)


In [8]:
valid_product_ids = set(products['product_id'])
valid_warehouse_ids = set(warehouses['warehouse_id'])
valid_supplier_ids = set(suppliers['supplier_id'])

invalid_suppliers = products.loc[~products['supplier_id'].isin(valid_supplier_ids)]
invalid_movements = stock_movements.loc[(~stock_movements['product_id'].isin(valid_product_ids)) | (~stock_movements['warehouse_id'].isin(valid_warehouse_ids))]

stock_movements = stock_movements.loc[(stock_movements['product_id'].isin(valid_product_ids)) & (stock_movements['warehouse_id'].isin(valid_warehouse_ids))].copy()
stock_movements.isnull().sum()

Unnamed: 0,0
movement_id,0
product_id,0
warehouse_id,0
quantity,0
movement_type,0
movement_date,0
reason_code,0
recorded_by,0
verified,0


**Calculate current stock levels using numpy**

In [9]:
stock_movements['direction'] = np.where(stock_movements['movement_type'] == 'IN', 1, -1)

stock_movements['stock_effect'] = stock_movements['quantity'] * stock_movements['direction']

stock_summary = stock_movements.groupby('product_id')['stock_effect'].sum().reset_index()

stock_summary.rename(columns={'stock_effect': 'current_stock'}, inplace=True)

products_stock = pd.merge(products, stock_summary, on='product_id', how='left')
products_stock['current_stock'] = products_stock['current_stock'].fillna(0).astype(int)
products_stock.head()

Unnamed: 0,product_id,name,description,category,unit_price,reorder_level,status,supplier_id,created_at,last_updated,current_stock
0,1,LED TV,42 inch Smart TV,Electronics,35000,10,active,1000.0,2024-01-08 10:00:00,2024-05-08 09:30:00,25
1,2,Organic Green Tea,Imported organic green tea,Grocery,1200,25,active,1001.0,2024-01-08 11:15:00,2024-05-08 11:15:00,42
2,3,Wooden Bookshelf,5-tier sturdy bookshelf,Furniture,8000,5,active,1002.0,2024-02-08 09:18:00,2024-05-08 14:23:00,15
3,4,Men's Denim Jacket,Blue denim jacket,Clothing,2500,6,active,1000.0,2024-02-08 10:15:00,2024-05-08 13:19:00,6
4,5,Kindle E-reader,6 inch display device,Electronics,8500,20,active,1003.0,2024-02-08 11:25:00,2024-05-08 12:01:00,30


Use pandas to flag items below reorder threshold

In [10]:
low_stock_df = products_stock[products_stock['current_stock'] < products_stock['reorder_level']]
print(f"\nProducts below reorder level (total {len(low_stock_df)}):")
low_stock_df


Products below reorder level (total 24):


Unnamed: 0,product_id,name,description,category,unit_price,reorder_level,status,supplier_id,created_at,last_updated,current_stock
7,8,Organic Almonds,Raw shelled almonds,Grocery,1200,12,active,1002.0,2024-03-08 13:20:00,2024-05-08 15:22:00,0
8,9,Wireless Earphones,Noise-cancelling earbuds,Electronics,5500,18,active,1004.0,2024-03-08 14:00:00,2024-05-08 10:05:00,16
9,10,Children’s Story Book,Illustrated fairy tales,Books,800,7,active,1002.0,2024-03-08 15:45:00,2024-05-08 16:00:00,6
12,13,Wooden Dining Table,6-seater solid wood,Furniture,25000,4,active,1002.0,2024-04-08 11:00:00,2024-05-08 18:00:00,3
15,16,Organic Honey,Pure wild honey,Grocery,1900,25,active,1003.0,2024-04-08 13:30:00,2024-06-08 10:00:00,24
16,17,Fiction Novel,Best-selling thriller,Books,450,8,active,1002.0,2024-04-08 14:00:00,2024-06-08 10:30:00,5
18,19,Desk Lamp,LED adjustable lamp,Electronics,1350,30,active,1002.0,2024-04-08 15:00:00,2024-06-08 11:30:00,24
20,21,Organic Coffee Beans,Dark roast premium,Grocery,1500,3,active,1002.0,2024-04-08 16:00:00,2024-06-08 12:30:00,2
21,22,Children’s Puzzle Book,Brain teaser games,Books,650,30,active,1004.0,2024-04-08 16:30:00,2024-06-08 13:00:00,20
23,24,Office Desk,Wooden top with drawers,Furniture,9500,20,active,1004.0,2024-04-08 17:30:00,2024-06-08 14:00:00,12


Saving to csv files

In [11]:
products.to_csv('products_Cleaned.csv')
suppliers.to_csv('suppliers_Cleaned.csv')
warehouses.to_csv('warehouses_Cleaned.csv')
stock_movements.to_csv('stock_movements_Cleaned.csv')
low_stock_df.to_csv('low_stock_df.csv')
products_stock.to_csv('products_stock.csv')

Downloading the cleaned and perfromed csv files

In [12]:
files.download('products_Cleaned.csv')
files.download('suppliers_Cleaned.csv')
files.download('warehouses_Cleaned.csv')
files.download('stock_movements_Cleaned.csv')
files.download('low_stock_df.csv')
files.download('products_stock.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>