# Overview
Online retailers face significant financial challenges due to high return rates, especially in industries like clothing, where return rates can be as high as 50%. While free or low-cost return shipping has become an expectation among customers, it places a substantial burden on businesses. To mitigate this, companies can leverage data-driven strategies to predict and reduce return rates.

Your task is to analyze the provided datasets and develop a predictive model that determines whether an ordered item will be returned. You will also extract actionable insights into the key factors influencing product returns.

# Dataset Description
You will be provided with two datasets:

N2N Train Set: Contains historical order data, including product details, user demographics, and whether the product was returned.

N2N Test Set: A separate dataset for evaluation, where you will apply your model to predict return probabilities.

The datasets contain the following attributes:

```
order_item_id: Unique identifier for each order.
order_date: Date the order was placed.
delivery_date: Date the product was delivered.
item_id: Unique identifier for the product.
item_size: Size of the ordered product.
item_color: Color of the ordered product.
brand_id: Unique identifier for the product's brand.
item_price: Price of the product.
user_id: Unique identifier for the customer.
user_title: Customer's salutation or title.
user_dob: Customer's date of birth.
user_state: Customer's state of residence.
user_reg_date: Date of customer registration on the platform.
return: Binary indicator of return status (0 = not returned, 1 = returned).
```

These fields provide comprehensive details on customer orders, product characteristics, and user demographics, enabling the identification of patterns associated with product returns.

---

## 1. Exploratory Data Analysis (EDA)

- Identify trends and patterns in product returns.

- Examine relationships between product attributes, user demographics, and return behavior.

In [2]:
# Other imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os

# Data Preprocessing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, MinMaxScaler, LabelEncoder, OneHotEncoder

# Machine Learning Models
from sklearn.linear_model import LogisticRegression, LinearRegression
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor, GradientBoostingClassifier
from sklearn.svm import SVC, SVR
from sklearn.neighbors import KNeighborsClassifier

# Model Evaluation
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report, mean_squared_error, r2_score
