## **1. Importing Necessary Libraries**

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix

- **Purpose**: Import libraries required for data manipulation (`pandas`, `numpy`), model training (`sklearn`), and evaluation.


## **2. Generating a Sample Dataset**

In [None]:
# Generate a sample dataset
np.random.seed(42)
n_samples = 1000

data = {
    'distance': np.random.uniform(50, 1000, n_samples),
    'weight': np.random.uniform(0.5, 10, n_samples),
    'priority': np.random.choice(['low', 'medium', 'high'], n_samples),
    'weather_condition': np.random.choice(['normal', 'rain', 'snow', 'extreme weather alert'], n_samples),
    'on_time': np.random.choice([0, 1], n_samples)
}
df = pd.DataFrame(data)

##- **Purpose**: Create a synthetic dataset with 5 features:
  - `distance`: Numeric feature (50 to 1000).
  - `weight`: Numeric feature (0.5 to 10).
  - `priority`: Categorical feature with values `low`, `medium`, `high`.
  - `weather_condition`: Categorical feature with values `normal`, `rain`, `snow`, `extreme weather alert`.
  - `on_time`: Target variable (binary: `0` for late, `1` for on time).


## **3. Converting Categorical Variables to Numeric**


In [None]:
# Convert categorical variables to numeric
df['priority'] = pd.Categorical(df['priority']).codes
df['weather_condition'] = pd.Categorical(df['weather_condition']).codes

- **Purpose**: Transform categorical features (`priority` and `weather_condition`) into numeric codes for model compatibility.
## **4. Splitting Features and Target**


In [None]:
# Split features and target
X = df[['distance', 'weight', 'priority', 'weather_condition']]
y = df['on_time']

- **Purpose**: Separate the dataset into:
  - `X`: Features (input variables).
  - `y`: Target (output variable).
  
## **5. Splitting Data into Training and Testing Sets**


In [None]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

- **Purpose**: Split the data into training (80%) and testing (20%) sets for model evaluation.
## **6. Scaling the Features**


In [None]:
# Scale the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

- **Purpose**: Standardize the features to ensure they have a mean of 0 and a standard deviation of 1 for better model performance.
## **7. Training the Logistic Regression Model**


In [None]:
# Create and train the logistic regression model
model = LogisticRegression(random_state=42)
model.fit(X_train_scaled, y_train)

- **Purpose**: Train a logistic regression model on the scaled training data.


## **8. Making Predictions on the Test Set**

In [None]:
# Make predictions on the test set
y_pred = model.predict(X_test_scaled)

- **Purpose**: Use the trained model to predict delivery status (`on_time`) for the testing set.

---

## **9. Evaluating the Model**

In [None]:
# Print model evaluation metrics
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

- **Purpose**: Assess the model's performance using:
  - **Confusion Matrix**: Shows true positives, true negatives, false positives, and false negatives.
  - **Classification Report**: Displays precision, recall, F1-score, and support for each class.

---

## **10. Analyzing Feature Coefficients**

In [None]:
# Print feature coefficients
feature_names = X.columns
coefficients = model.coef_[0]
for name, coef in zip(feature_names, coefficients):
    print(f"{name}: {coef}")

- **Purpose**: Interpret the impact of each feature on the prediction by displaying the logistic regression coefficients.

---

## **11. Predicting Delivery Status for a New Package**

In [None]:
# Function to predict delivery status for a new package
def predict_delivery(distance, weight, priority, weather_condition):
    priority_map = {'low': 0, 'medium': 1, 'high': 2}
    weather_map = {'normal': 0, 'rain': 1, 'snow': 2, 'extreme weather alert': 3}
    
    features = np.array([[
        distance,
        weight,
        priority_map[priority],
        weather_map[weather_condition]
    ]])
    
    scaled_features = scaler.transform(features)
    prediction = model.predict(scaled_features)
    probability = model.predict_proba(scaled_features)[0][1]
    
    return "On time" if prediction[0] == 1 else "Not on time", probability

In [None]:
# Example usage
result, prob = predict_delivery(500, 5, 'medium', 'rain')
print(f"\nPrediction for a new package: {result}")
print(f"Probability of being on time: {prob:.2f}")

- **Purpose**: Predict delivery status and probability for a new package based on its features:
  - `distance`, `weight`, `priority`, and `weather_condition`.
  - Returns whether the delivery is predicted to be "On time" or "Not on time" with the probability.