# SafeRoute: Weather-Driven Delivery Feasibility

### Understanding Business Problem

#### This is an Weather Data and the task is to Predict whether a given hourly weather record makes same-day local ground delivery risky/delayed so logistics can re-route, delay or add buffers.

##### For delivery to happen there are few favourable conditions need to be met to make deliveries happen, they are 
##### 1. Temperature > 0
##### 2. Wind Speed <= 40
##### 3. Visibility > 2
##### 4. No Weather conditions like Rain, Rain Showers, Fog, Thunderstorms, Snow Showers, Snow Pellets, Freezing Rain, Snow, Blowing Snow, Ice Pellets, Heavy Rain Showers etc..

### Loading DATA

In [1]:
import pandas as pd

df = pd.read_csv("Weather.csv")
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8784 entries, 0 to 8783
Data columns (total 8 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Date/Time         8784 non-null   object 
 1   Temp_C            8784 non-null   float64
 2   Dew Point Temp_C  8784 non-null   float64
 3   Rel Hum_%         8784 non-null   int64  
 4   Wind Speed_km/h   8784 non-null   int64  
 5   Visibility_km     8784 non-null   float64
 6   Press_kPa         8784 non-null   float64
 7   Weather           8784 non-null   object 
dtypes: float64(4), int64(2), object(2)
memory usage: 549.1+ KB


### Data Pre-Processing

##### Form the info tab we can see there are :-
##### 1. NO Null values in the dataset
##### 2. Date/Time column is an Object so lets clean timestamps

In [2]:
# Before Conversion of Date/Time
df["Date/Time"].head()

0    1/1/2012 0:00
1    1/1/2012 1:00
2    1/1/2012 2:00
3    1/1/2012 3:00
4    1/1/2012 4:00
Name: Date/Time, dtype: object

In [3]:
# After Conversion to DateTime
df["Date/Time"] = pd.to_datetime(df["Date/Time"])
df["Date/Time"].head()

0   2012-01-01 00:00:00
1   2012-01-01 01:00:00
2   2012-01-01 02:00:00
3   2012-01-01 03:00:00
4   2012-01-01 04:00:00
Name: Date/Time, dtype: datetime64[ns]

#### Label Engineering

In [4]:
# Creating a new label for the Weather Column to understand whether delivery can be possible
def delivery_risk(df):
    risky = ['Rain', 'Rain Showers', 'Fog', 'Thunderstorms', 'Snow Showers', 'Snow Pellets','Freezing Rain', 'Snow', 'Blowing Snow', 'Ice Pellets', 'Heavy Rain Showers']
    
    def classify(row):
        temp_ok = row["Temp_C"] > 0
        visibility_ok = row["Visibility_km"] > 2
        wind_ok = row["Wind Speed_km/h"] <= 40
        weather_ok = not any(risk in str(row["Weather"]) for risk in risky)

        if temp_ok and visibility_ok and wind_ok and weather_ok:
            return 'Safe'
        elif temp_ok and visibility_ok and weather_ok and not wind_ok:
            return 'Caution'  # Wind is high, others are okay
        elif temp_ok and visibility_ok:
            return 'Caution'  # Weather or wind may be risky
        else:
            return 'Risky'

    df["Delivery_Risk"] = df.apply(classify, axis = 1)
    return df["Delivery_Risk"].value_counts(normalize = True)
delivery_risk(df)

Delivery_Risk
Safe       0.633766
Risky      0.253643
Caution    0.112591
Name: proportion, dtype: float64

In [5]:
df["Delivery_Risk"].value_counts()

Delivery_Risk
Safe       5567
Risky      2228
Caution     989
Name: count, dtype: int64

#### Encoding for Delivery_Risk Column

In [6]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
df["Labelled_Delivery_Risk"] = le.fit_transform(df["Delivery_Risk"])
df["Labelled_Delivery_Risk"].value_counts()

Labelled_Delivery_Risk
2    5567
1    2228
0     989
Name: count, dtype: int64

#### Auto EDA

In [7]:
from ydata_profiling import ProfileReport
report = ProfileReport(
    df,
    title = "EDA_Report",
    explorative= True,
)
report.to_file("Y_EDA_Report.html")
ai_report = report.to_json()


  from .autonotebook import tqdm as notebook_tqdm


100%|██████████| 10/10 [00:00<00:00, 206.12it/s]00:00, 36.48it/s, Describe variable: Labelled_Delivery_Risk]
Summarize dataset: 100%|██████████| 55/55 [00:04<00:00, 13.44it/s, Completed]                                 
Generate report structure: 100%|██████████| 1/1 [00:02<00:00,  2.25s/it]
Render HTML: 100%|██████████| 1/1 [00:02<00:00,  2.28s/it]
Export report to file: 100%|██████████| 1/1 [00:00<00:00, 63.32it/s]
Render JSON: 100%|██████████| 1/1 [00:00<00:00,  3.66it/s]


##### Adding rolling functions

In [8]:
def rolling_feature(df, col_name):
    df[f"{col_name}_rolling_mean"] = df[col_name].rolling(window = 3).mean()
    df[f"{col_name}_rolling_mean"] = df[f"{col_name}_rolling_mean"].fillna(df[col_name].expanding().mean())

    df[f"{col_name}_rolling_std"] = df[col_name].rolling(window = 3).std()
    df[f"{col_name}_rolling_std"] = df[f"{col_name}_rolling_std"].fillna(df[col_name].expanding().std().fillna(0))


rolling_feature(df,"Temp_C")
rolling_feature(df, "Dew Point Temp_C")
rolling_feature(df, "Rel Hum_%")
rolling_feature(df, "Wind Speed_km/h")
rolling_feature(df, "Visibility_km")
rolling_feature(df, "Press_kPa")

##### Train and Test splitting

In [9]:
# Splitting train and test dat at 80/20 ratio with target stratification
from sklearn.model_selection import train_test_split
train, test = train_test_split(df, test_size=0.2, stratify=df["Labelled_Delivery_Risk"],random_state=42)

In [10]:
# individual EDA for Train and Test
# For Train
train_report = ProfileReport(df = train, title="Y_Train_EDA_Report", explorative=True)
train_report.to_file("Y_Train_EDA.html")

100%|██████████| 22/22 [00:00<00:00, 354.26it/s]<00:00, 47.30it/s, Describe variable: Press_kPa_rolling_std]      
Summarize dataset: 100%|██████████| 355/355 [00:29<00:00, 12.07it/s, Completed]                                                           
Generate report structure: 100%|██████████| 1/1 [00:04<00:00,  4.88s/it]
Render HTML: 100%|██████████| 1/1 [00:06<00:00,  6.33s/it]
Export report to file: 100%|██████████| 1/1 [00:00<00:00, 17.09it/s]


In [11]:
# For Test
test_report = ProfileReport(df = test, title="Y_Test_EDA_Report", explorative=True)
test_report.to_file("Y_Test_EDA.html")

100%|██████████| 22/22 [00:00<?, ?it/s]27 [00:00<00:00, 52.46it/s, Describe variable: Press_kPa_rolling_std]      
Summarize dataset: 100%|██████████| 355/355 [00:29<00:00, 12.01it/s, Completed]                                                           
Generate report structure: 100%|██████████| 1/1 [00:04<00:00,  4.76s/it]
Render HTML: 100%|██████████| 1/1 [00:06<00:00,  6.28s/it]
Export report to file: 100%|██████████| 1/1 [00:00<00:00, 17.28it/s]


In [12]:
# Compare Test and Train data
compare_report = train_report.compare(test_report)
compare_report.to_file("Y_Compare_Train_Test.html")

Generate report structure: 100%|██████████| 1/1 [00:07<00:00,  7.60s/it]
Render HTML: 100%|██████████| 1/1 [00:11<00:00, 11.98s/it]
Export report to file: 100%|██████████| 1/1 [00:00<00:00,  6.46it/s]


#### Auto ML

In [13]:
from sklearn.model_selection import train_test_split

x = df.drop(columns=["Labelled_Delivery_Risk", "Date/Time","Weather","Delivery_Risk"])
y = df["Labelled_Delivery_Risk"]

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size= 0.2, random_state=42)

In [14]:
from flaml import AutoML

ml = AutoML()
ml.fit(x_train, y_train, task="classification", time_budget=30, metric='accuracy', seed=42)
print("FLAML score:", ml.score(x_test, y_test))
print("FLAML best model:", ml.best_estimator)


[flaml.automl.logger: 10-09 14:38:45] {1752} INFO - task = classification
[flaml.automl.logger: 10-09 14:38:45] {1763} INFO - Evaluation method: holdout
[flaml.automl.logger: 10-09 14:38:45] {1862} INFO - Minimizing error metric: 1-accuracy
[flaml.automl.logger: 10-09 14:38:45] {1979} INFO - List of ML learners in AutoML Run: ['lgbm', 'rf', 'xgboost', 'extra_tree', 'xgb_limitdepth', 'sgd', 'lrl1']
[flaml.automl.logger: 10-09 14:38:45] {2282} INFO - iteration 0, current learner lgbm
[flaml.automl.logger: 10-09 14:38:45] {2417} INFO - Estimated sufficient time budget=239s. Estimated necessary time budget=6s.
[flaml.automl.logger: 10-09 14:38:45] {2466} INFO -  at 0.1s,	estimator lgbm's best error=0.1133,	best estimator lgbm's best error=0.1133
[flaml.automl.logger: 10-09 14:38:45] {2282} INFO - iteration 1, current learner lgbm
[flaml.automl.logger: 10-09 14:38:45] {2466} INFO -  at 0.1s,	estimator lgbm's best error=0.1133,	best estimator lgbm's best error=0.1133
[flaml.automl.logger: 10

#### AI DATA ANALYSIS

In [15]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8784 entries, 0 to 8783
Data columns (total 22 columns):
 #   Column                         Non-Null Count  Dtype         
---  ------                         --------------  -----         
 0   Date/Time                      8784 non-null   datetime64[ns]
 1   Temp_C                         8784 non-null   float64       
 2   Dew Point Temp_C               8784 non-null   float64       
 3   Rel Hum_%                      8784 non-null   int64         
 4   Wind Speed_km/h                8784 non-null   int64         
 5   Visibility_km                  8784 non-null   float64       
 6   Press_kPa                      8784 non-null   float64       
 7   Weather                        8784 non-null   object        
 8   Delivery_Risk                  8784 non-null   object        
 9   Labelled_Delivery_Risk         8784 non-null   int64         
 10  Temp_C_rolling_mean            8784 non-null   float64       
 11  Temp_C_rolling_st

In [17]:
import requests
import pandas as pd
import json
import os
import numpy as np

numeric_cols = df.select_dtypes(include=[np.number]).columns
df_numeric = df[numeric_cols]

missing_values = df.isna().sum()
correlations = df_numeric.corr()
outliers = ((df_numeric < df_numeric.quantile(0.05)) | (df_numeric > df_numeric.quantile(0.95))).sum()
summary_stats = df_numeric.describe().T

# Summarize data (so the model doesn't get overloaded with raw rows)
summary = f"""
Dataset Summary:
    - Rows: {df.shape[0]}
    - Columns: {df.shape[1]}
    - Missing Values: {missing_values.to_dict()}
    - Numeric Summary: {summary_stats.head(5).to_dict()}
    - Feature Correlations: {correlations.to_dict()}
    - Potential Outliers (count per column): {outliers.to_dict()}

    Target: {df["Labelled_Delivery_Risk"]}
    Task: "classification"
"""

# Query OpenRequests API (gemma-3-27b)
url = "https://openrouter.ai/api/v1/chat/completions"
HF_TOKEN = os.getenv("Auto_DA")
HEADERS = {"Authorization": f"Bearer {HF_TOKEN}"}  # Make sure HF_TOKEN is set in environment
payload = {
    "model": "google/gemma-3-27b-it:free",
    "messages" : [{
        "role":"user",
        "content" : f"""You are a senior Data Analyst in logistics. Analyze the dataset {summary} and provide actionable insights that could impact delivery operations. 
        - Highlight risk factors from weather and other features.
        - Suggest preventive or operational actions.
        - Focus on correlations, anomalies, and patterns over time.
        - Provide bullet points with impact and recommendation.
        - Make sure these insights are industry-relevant and practical and are related to dataset but not general information.
        """}]
        }

response = requests.post(url, headers= HEADERS,json=payload, stream=True)

# Error detection
if response.status_code != 200:
    print(f"HF API error: {response.status_code} {response.text}")


# Extract AI reply
data = response.json()
try:
    reply = data["choices"][0]["message"]["content"]
    with open("AI_Report.txt", "w") as f:
        f.write(reply)
    print("\nAI Insights:\n", reply)
except (KeyError, IndexError):
    print(str(data))
    





AI Insights:
 ## Logistics Delivery Risk Analysis - Actionable Insights

Here's an analysis of the provided dataset, focusing on identifying risk factors impacting delivery operations and suggesting actionable steps.

**Executive Summary:**

The dataset reveals strong correlations between weather conditions (Temperature, Dew Point, Relative Humidity, Visibility) and delivery risk.  Specifically, lower temperatures, high humidity, and reduced visibility are associated with increased risk.  The rolling statistics suggest that sudden changes in these conditions are also significant.  A substantial number of outliers exist in several weather variables, indicating potentially extreme conditions that require specific attention.

**1. Key Risk Factors & Correlations:**

*   **Temperature & Delivery Risk (Correlation: 0.47):**  A positive correlation indicates that lower temperatures are associated with higher delivery risk. This could be due to ice, snow, or increased driver discomfort impac