# PerishAI: Smart Shelf-Life Aware Routing for Perishable Goods

This notebook will guide you step by step through building an AI/ML-powered solution to optimize delivery routes for perishable goods, minimizing food waste by prioritizing products based on remaining shelf life.

## 1. Import Required Libraries

Import all necessary libraries for the implementation, such as numpy, pandas, or others as needed.

In [20]:
# Import required libraries
import numpy as np
import pandas as pd
import random

## 2. Step 1: Define the Problem or Task

We aim to optimize delivery routes for perishable goods by predicting remaining shelf life using AI/ML and prioritizing deliveries accordingly. 

**Sample Input Fields (with units):**
- product_type (e.g., apple, milk)
- time_in_transit (hours)
- temperature_exposure (°C)
- humidity (%)
- storage_conditions (e.g., cold, ambient)
- shelf_life_left (days)

**Expected Output:**
- A dataset with realistic values for the above fields (with units in column names), to be used for model training and routing.

## 3. Step 2: Plan the Implementation Steps

1. Generate a list of possible product types and storage conditions.
2. Randomly generate values for each field for a set number of deliveries.
3. Calculate or assign shelf_life_left based on the generated parameters.
4. Save the generated dataset as a CSV file for later use.

## 4. Step 3: Implement the First Step

Generate lists of product types and storage conditions to be used in the dataset.

In [21]:
# Define product types and storage conditions
product_types = ['apple', 'banana', 'milk', 'cheese', 'lettuce', 'chicken', 'yogurt']
storage_conditions = [
    'cold', 'ambient', 'frozen', 'vacuum', 'controlled atmosphere', 'dry', 'humidified'
]
print('Product types:', product_types)
print('Storage conditions:', storage_conditions)

Product types: ['apple', 'banana', 'milk', 'cheese', 'lettuce', 'chicken', 'yogurt']
Storage conditions: ['cold', 'ambient', 'frozen']


## 5. Step 4: Test and Validate the First Step

Check that the lists of product types and storage conditions are correctly defined.

In [22]:
# Test: Print the lists
display(product_types)
display(storage_conditions)

['apple', 'banana', 'milk', 'cheese', 'lettuce', 'chicken', 'yogurt']

['cold', 'ambient', 'frozen']

## 6. Step 5: Implement the Next Step

Generate a dummy dataset of deliveries with random values for each field, including a simple logic for shelf_life_left.

In [32]:
# Generate dummy dataset with units in column names
num_samples = 30

data = []
for i in range(num_samples):
    product = random.choice(product_types)
    storage = random.choice(storage_conditions)
    time_in_transit = np.round(np.random.uniform(1, 48), 1)  # hours
    temperature = np.round(np.random.uniform(0, 25) if storage == 'cold' else np.random.uniform(15, 35), 1)
    humidity = np.round(np.random.uniform(40, 90), 1)
    # Simple logic: shelf life left decreases with higher temp, time, and humidity
    base_shelf_life = {'apple': 30, 'banana': 7, 'milk': 10, 'cheese': 60, 'lettuce': 5, 'chicken': 6, 'yogurt': 14}[product]
    shelf_life_left = base_shelf_life - (0.1 * time_in_transit) - (0.2 * max(0, temperature - 5)) - (0.05 * (humidity - 50))
    shelf_life_left = max(0, np.round(shelf_life_left, 1))
    data.append({
        'product_type': product,
        'time_in_transit (hours)': time_in_transit,
        'temperature_exposure (°C)': temperature,
        'humidity (%)': humidity,
        'storage_conditions': storage,
        'shelf_life_left (days)': shelf_life_left
    })
df = pd.DataFrame(data)
df.to_csv('perishai_dummy_data.csv', index=False)
df.head()

Unnamed: 0,product_type,time_in_transit (hours),temperature_exposure (°C),humidity (%),storage_conditions,shelf_life_left (days)
0,apple,13.3,28.9,80.8,frozen,22.4
1,lettuce,30.5,8.1,58.8,cold,0.9
2,apple,11.5,15.5,70.1,ambient,25.7
3,lettuce,19.5,7.4,77.7,cold,1.2
4,cheese,39.6,33.2,53.1,ambient,50.2


## 7. Step 6: Test and Validate the Next Step

Display the first few rows of the generated dataset to ensure it looks correct.

In [33]:
# Prepare data for model training with new column names
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from xgboost import XGBRegressor
import joblib
import pandas as pd
import numpy as np

# Use new column names with units
df = pd.read_csv('perishai_dummy_data.csv')
categorical_cols = ['product_type', 'storage_conditions']
continuous_cols = ['time_in_transit (hours)', 'temperature_exposure (°C)', 'humidity (%)']
target_col = 'shelf_life_left (days)'

X_cat = df[categorical_cols]
X_cont = df[continuous_cols]
y = df[target_col]

# Use handle_unknown='ignore' to allow new categories at prediction time
encoder = OneHotEncoder(sparse_output=False, drop='first', handle_unknown='ignore')
X_cat_encoded = encoder.fit_transform(X_cat)
scaler = StandardScaler()
X_cont_scaled = scaler.fit_transform(X_cont)

X_prepared = np.hstack([X_cat_encoded, X_cont_scaled])

X_train, X_test, y_train, y_test = train_test_split(X_prepared, y, test_size=0.2, random_state=42)

model = XGBRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Save model and encoders
joblib.dump(model, 'shelf_life_model.pkl')
joblib.dump(encoder, 'encoder.pkl')
joblib.dump(scaler, 'scaler.pkl')

# Display the first few rows of the dataset
df.head()

Unnamed: 0,product_type,time_in_transit (hours),temperature_exposure (°C),humidity (%),storage_conditions,shelf_life_left (days)
0,apple,13.3,28.9,80.8,frozen,22.4
1,lettuce,30.5,8.1,58.8,cold,0.9
2,apple,11.5,15.5,70.1,ambient,25.7
3,lettuce,19.5,7.4,77.7,cold,1.2
4,cheese,39.6,33.2,53.1,ambient,50.2


## 8. Step 7: Continue Step-by-Step Implementation

Save the generated dataset as a CSV file for use in later steps.

In [43]:
import pandas as pd
import numpy as np
import random
import joblib

# Define product types and storage conditions (MATCH TRAINING DATA)
product_types = ['apple', 'banana', 'milk', 'cheese', 'lettuce', 'chicken', 'yogurt']
storage_conditions = [
    'cold', 'ambient', 'frozen', 'vacuum', 'controlled atmosphere', 'dry', 'humidified'
]

# Simulate new deliveries and predict shelf life with new column names
sim_data = []
for i in range(20):
    product = random.choice(product_types)
    storage = random.choice(storage_conditions)
    time_in_transit = np.round(np.random.uniform(1, 48), 1)
    temperature = np.round(np.random.uniform(0, 25) if storage == 'cold' else np.random.uniform(15, 35), 1)
    humidity = np.round(np.random.uniform(40, 90), 1)
    sim_data.append({
        'delivery_id': f'DELV{i+1:03d}',
        'product_type': product,
        'time_in_transit (hours)': time_in_transit,
        'temperature_exposure (°C)': temperature,
        'humidity (%)': humidity,
        'storage_conditions': storage
    })
sim_df = pd.DataFrame(sim_data)

# Load encoders and model
encoder = joblib.load('encoder.pkl')
scaler = joblib.load('scaler.pkl')
model = joblib.load('shelf_life_model.pkl')

sim_X_cat = sim_df[['product_type', 'storage_conditions']]
sim_X_cont = sim_df[['time_in_transit (hours)', 'temperature_exposure (°C)', 'humidity (%)']]
sim_X_encoded = encoder.transform(sim_X_cat)
sim_X_scaled = scaler.transform(sim_X_cont)
sim_X_prepared = np.hstack([sim_X_encoded, sim_X_scaled])
sim_df['predicted_shelf_life (days)'] = model.predict(sim_X_prepared)

# Save the dataset as a CSV file
df.to_csv('perishai_dummy_data.csv', index=False)
print('Dataset saved as perishai_dummy_data.csv')

sim_df.to_csv('simulated_deliveries_with_predictions.csv', index=False)
sim_df.head()

Dataset saved as perishai_dummy_data.csv




Unnamed: 0,delivery_id,product_type,time_in_transit (hours),temperature_exposure (°C),humidity (%),storage_conditions,predicted_shelf_life (days)
0,DELV001,yogurt,22.6,24.4,43.3,controlled atmosphere,8.005722
1,DELV002,banana,30.5,23.9,59.5,dry,0.360162
2,DELV003,milk,42.7,33.8,61.5,vacuum,0.00044
3,DELV004,lettuce,4.6,15.3,64.7,humidified,25.468138
4,DELV005,lettuce,31.2,25.0,60.1,frozen,-0.135442


## 9. Step 8: Final Testing and Validation

Verify that the CSV file has been created and contains the expected data.

In [26]:
# Load and display the saved CSV file to verify
check_df = pd.read_csv('perishai_dummy_data.csv')
check_df.head()

# Optimize delivery route based on predicted shelf life (ascending order)
route_df = sim_df.sort_values(by='predicted_shelf_life (days)')
route_df.reset_index(drop=True, inplace=True)
route_df.to_csv('route_plan.csv', index=False)
route_df.head()

Unnamed: 0,delivery_id,product_type,time_in_transit (hours),temperature_exposure (°C),humidity (%),storage_conditions,predicted_shelf_life (days)
0,DELV003,chicken,18.6,26.1,75.6,frozen,-0.215491
1,DELV013,banana,10.6,23.6,82.6,ambient,0.043114
2,DELV020,banana,12.3,30.5,82.6,frozen,0.043638
3,DELV008,apple,47.9,21.4,87.5,ambient,1.650995
4,DELV005,chicken,10.7,30.1,48.2,frozen,2.533785


# Step 2: Build and Train the Shelf-Life Prediction Model

In this section, we will use the generated dataset to train a machine learning model that predicts the remaining shelf life of perishable goods based on features such as product type, time in transit, temperature exposure, humidity, and storage conditions.

## 1. Load the Dataset and Required Libraries

We will load the previously generated dataset and import additional libraries needed for machine learning, such as scikit-learn and xgboost.

In [30]:
# Import additional libraries for ML
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.metrics import mean_absolute_error
from xgboost import XGBRegressor
import joblib

# Load the dataset
df = pd.read_csv('perishai_dummy_data.csv')
df.head()

Unnamed: 0,product_type,time_in_transit,temperature_exposure,humidity,storage_conditions,shelf_life_left
0,yogurt,23.5,16.3,69.9,frozen,1.0
1,yogurt,13.2,20.1,83.0,ambient,1.0
2,apple,18.8,32.4,62.3,ambient,10.2
3,chicken,35.3,16.0,80.3,cold,1.0
4,cheese,45.1,27.6,42.1,ambient,30.7


## 2. Preprocess the Data

We will one-hot encode categorical variables and normalize continuous features to prepare the data for model training.

In [34]:
# Separate features and target
# Use new column names with units
categorical_cols = ['product_type', 'storage_conditions']
continuous_cols = ['time_in_transit (hours)', 'temperature_exposure (°C)', 'humidity (%)']
target_col = 'shelf_life_left (days)'

y = df[target_col]
X = df[categorical_cols + continuous_cols]

# One-hot encode categorical variables
encoder = OneHotEncoder(sparse_output=False, drop='first', handle_unknown='ignore')
X_encoded = encoder.fit_transform(X[categorical_cols])
encoded_cols = encoder.get_feature_names_out(categorical_cols)

# Normalize continuous features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X[continuous_cols])

# Combine all features
import numpy as np
X_prepared = np.hstack([X_encoded, X_scaled])

# For reference, keep feature names
feature_names = list(encoded_cols) + continuous_cols

## 3. Train-Test Split

Split the data into training and testing sets for model evaluation.

In [35]:
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X_prepared, y, test_size=0.2, random_state=42)
print('Train shape:', X_train.shape)
print('Test shape:', X_test.shape)

Train shape: (24, 11)
Test shape: (6, 11)


## 4. Train the XGBoost Regression Model

We will train an XGBoost regressor to predict the remaining shelf life.

In [36]:
# Train the model
model = XGBRegressor(random_state=42)
model.fit(X_train, y_train)

# Predict on test set
predictions = model.predict(X_test)

# Evaluate performance
mae = mean_absolute_error(y_test, predictions)
print(f"Mean Absolute Error (MAE): {mae:.2f}")

Mean Absolute Error (MAE): 9.47


## 5. Save the Trained Model

We will save the trained model to disk for use in later steps of the project.

In [37]:
# Save the trained model
joblib.dump(model, 'shelf_life_model.pkl')
print('Model saved as shelf_life_model.pkl')

Model saved as shelf_life_model.pkl


# Step 3: Simulate Delivery Data and Predict Shelf Life

In this section, we will simulate new delivery data and use the trained model to predict the remaining shelf life for each delivery. This will be used for route optimization in the next step.

## 1. Simulate New Delivery Data

We will generate a new set of mock deliveries with randomized parameters, similar to the original dataset, but without the shelf_life_left column.

In [41]:
# Simulate new delivery data
num_deliveries = 20
sim_data = []
for i in range(num_deliveries):
    product = random.choice(product_types)
    storage = random.choice(storage_conditions)
    time_in_transit = np.round(np.random.uniform(1, 48), 1)
    temperature = np.round(np.random.uniform(0, 25) if storage == 'cold' else np.random.uniform(15, 35), 1)
    humidity = np.round(np.random.uniform(40, 90), 1)
    sim_data.append({
        'delivery_id': f'DELV{i+1:03d}',
        'product_type': product,
        'time_in_transit': time_in_transit,
        'temperature_exposure': temperature,
        'humidity': humidity,
        'storage_conditions': storage
    })
sim_df = pd.DataFrame(sim_data)
sim_df.head()

Unnamed: 0,delivery_id,product_type,time_in_transit,temperature_exposure,humidity,storage_conditions
0,DELV001,lettuce,7.5,1.0,54.3,cold
1,DELV002,lettuce,20.8,27.8,65.2,frozen
2,DELV003,apple,5.9,23.2,55.5,cold
3,DELV004,banana,26.7,7.0,81.8,cold
4,DELV005,lettuce,45.0,1.3,41.6,cold


## 2. Preprocess the Simulated Data

Apply the same preprocessing steps (encoding and scaling) to the new simulated data as were used for the training data.

In [44]:
# Preprocess simulated data
sim_X_encoded = encoder.transform(sim_df[['product_type', 'storage_conditions']])
sim_X_scaled = scaler.transform(sim_df[['time_in_transit (hours)', 'temperature_exposure (°C)', 'humidity (%)']])
sim_X_prepared = np.hstack([sim_X_encoded, sim_X_scaled])



## 3. Predict Remaining Shelf Life for Each Delivery

Use the trained model to predict the remaining shelf life for each simulated delivery.

In [45]:
# Predict shelf life for each simulated delivery
sim_df['predicted_shelf_life'] = model.predict(sim_X_prepared)
sim_df.head()

Unnamed: 0,delivery_id,product_type,time_in_transit (hours),temperature_exposure (°C),humidity (%),storage_conditions,predicted_shelf_life (days),predicted_shelf_life
0,DELV001,yogurt,22.6,24.4,43.3,controlled atmosphere,8.005722,8.005722
1,DELV002,banana,30.5,23.9,59.5,dry,0.360162,0.360162
2,DELV003,milk,42.7,33.8,61.5,vacuum,0.00044,0.00044
3,DELV004,lettuce,4.6,15.3,64.7,humidified,25.468138,25.468138
4,DELV005,lettuce,31.2,25.0,60.1,frozen,-0.135442,-0.135442


## 4. Save the Simulated Deliveries with Predictions

Save the simulated deliveries, including the predicted shelf life, as a CSV file for use in route optimization.

In [46]:
# Save the simulated deliveries with predictions
sim_df.to_csv('simulated_deliveries_with_predictions.csv', index=False)
print('Simulated deliveries with predictions saved as simulated_deliveries_with_predictions.csv')

Simulated deliveries with predictions saved as simulated_deliveries_with_predictions.csv


# Step 4: Rule-Based Route Optimizer

In this section, we will implement a simple rule-based optimizer that sorts deliveries by ascending predicted shelf life, ensuring products closest to expiry are delivered first.

## 1. Load Simulated Deliveries with Predictions

We will load the CSV file containing simulated deliveries and their predicted shelf life.

In [47]:
# Load the simulated deliveries with predictions
route_df = pd.read_csv('simulated_deliveries_with_predictions.csv')
route_df.head()

Unnamed: 0,delivery_id,product_type,time_in_transit (hours),temperature_exposure (°C),humidity (%),storage_conditions,predicted_shelf_life (days),predicted_shelf_life
0,DELV001,yogurt,22.6,24.4,43.3,controlled atmosphere,8.005722,8.005722
1,DELV002,banana,30.5,23.9,59.5,dry,0.360162,0.360162
2,DELV003,milk,42.7,33.8,61.5,vacuum,0.00044,0.00044
3,DELV004,lettuce,4.6,15.3,64.7,humidified,25.468138,25.468138
4,DELV005,lettuce,31.2,25.0,60.1,frozen,-0.135442,-0.135442


## 2. Sort Deliveries by Predicted Shelf Life

We will sort the deliveries so that those with the lowest predicted shelf life are delivered first.

In [48]:
# Sort deliveries by ascending predicted shelf life
route_df_sorted = route_df.sort_values(by='predicted_shelf_life')
route_df_sorted.reset_index(drop=True, inplace=True)
route_df_sorted.head()

Unnamed: 0,delivery_id,product_type,time_in_transit (hours),temperature_exposure (°C),humidity (%),storage_conditions,predicted_shelf_life (days),predicted_shelf_life
0,DELV005,lettuce,31.2,25.0,60.1,frozen,-0.135442,-0.135442
1,DELV003,milk,42.7,33.8,61.5,vacuum,0.00044,0.00044
2,DELV002,banana,30.5,23.9,59.5,dry,0.360162,0.360162
3,DELV006,apple,32.1,24.6,82.5,humidified,1.162857,1.162857
4,DELV016,apple,26.9,21.0,59.0,dry,2.805107,2.805107


## 3. Save the Optimized Route Plan

Save the sorted delivery plan as a CSV file for use in the dashboard and reporting.

In [49]:
# Save the optimized route plan
route_df_sorted.to_csv('route_plan.csv', index=False)
print('Optimized route plan saved as route_plan.csv')

Optimized route plan saved as route_plan.csv


# Step 5: Streamlit Dashboard for Visualization

In this section, we will outline and provide code for a Streamlit dashboard to visualize the optimized delivery plan, product freshness, and sustainability metrics.

## 1. Dashboard Overview

The Streamlit dashboard will display:
- The optimized delivery plan (route table)
- Product freshness tracker (visualization)
- Sustainability metrics (waste avoided, CO₂ saved)

Below is the code to create the dashboard. Save it as a separate Python file (e.g., `dashboard.py`) and run it with Streamlit.

In [None]:
# To run the dashboard, use the following command in your terminal:
# streamlit run dashboard.py

# The dashboard will display the optimized delivery plan, product freshness tracker, and sustainability metrics.

# Step 6: Project Reporting and Presentation Templates

In this final (optional) step, we provide templates for documenting your project and preparing a presentation. Use these as a starting point to communicate your work, results, and impact to stakeholders or for academic/professional purposes.

<h3>Project Report Template</h3>

<b>1. Executive Summary</b>
- Brief overview of PerishAI and its objectives
- Key results and impact

<b>2. Introduction</b>
- Problem statement
- Importance of shelf-life aware routing for perishable goods
- Project goals

<b>3. Data Simulation</b>
- Description of simulated dataset
- Features and rationale
- Data generation process

<b>4. Shelf-Life Prediction Modeling</b>
- Model selection and justification
- Preprocessing steps
- Training and evaluation metrics
- Model performance summary

<b>5. Delivery Simulation & Route Optimization</b>
- Delivery data simulation
- Route optimization logic and rules
- Results and analysis

<b>6. Dashboard Visualization</b>
- Dashboard features and screenshots
- Insights from visualizations

<b>7. Sustainability & Business Impact</b>
- How PerishAI improves sustainability
- Potential business benefits

<b>8. Conclusion & Future Work</b>
- Summary of achievements
- Limitations and next steps

<b>9. References</b>
- Cited works, libraries, and resources

---

<h3>Presentation Template (Slides Outline)</h3>

1. Title Slide: Project name, team, date
2. Problem & Motivation
3. Solution Overview (PerishAI)
4. Data Simulation Process
5. Shelf-Life Prediction Model
6. Route Optimization Approach
7. Dashboard Demo (screenshots)
8. Results & Impact
9. Future Work
10. Q&A

Use these templates to create your final documentation and presentation for PerishAI.