# SAR Rental Cancellation Prediction
This machine learning project focuses on predicting car rental cancellations based on booking behavior, travel patterns, and time-related trends. The objective is to identify high-risk bookings early and enable targeted interventions. We explore Logistic Regression and Random Forest models to derive actionable insights.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

# Load dataset
df = pd.read_csv('SAR_Cleaned_Median.csv')
df.head()

## Step 1: Data Preparation

In [None]:
# Check nulls and shape
print(df.shape)
print(df.isnull().sum())

# Separate features and target
X = df.drop(columns=['Car_Cancellation'])
y = df['Car_Cancellation']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## Step 2: Logistic Regression Model

In [None]:
log_model = LogisticRegression(max_iter=1000)
log_model.fit(X_train, y_train)
y_pred_log = log_model.predict(X_test)

print('Logistic Regression Performance:')
print(confusion_matrix(y_test, y_pred_log))
print(classification_report(y_test, y_pred_log))

## Step 3: Random Forest Model

In [None]:
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)
y_pred_rf = rf_model.predict(X_test)

print('Random Forest Performance:')
print(confusion_matrix(y_test, y_pred_rf))
print(classification_report(y_test, y_pred_rf))

# Feature Importance
feat_imp = pd.Series(rf_model.feature_importances_, index=X.columns)
feat_imp.nlargest(10).plot(kind='barh', title='Top 10 Important Features')
plt.xlabel('Importance')
plt.show()

## Step 4: Conclusion & Business Insights
- **Random Forest** performed slightly better than Logistic Regression due to its ability to model non-linear relationships.
- Key features affecting cancellations include **distance**, **booking timing**, and **package type**.
- These insights can be used by SAR Rentals to:
   - Flag risky bookings for review or verification.
   - Offer dynamic pricing or promotions for high-risk cases.
   - Improve customer segmentation and retention strategies.

Future enhancements may include:
- Trying advanced models like XGBoost.
- Handling imbalance with techniques like SMOTE.
- Adding time-of-day and holiday effects.
- Deploying model via real-time APIs.
