# Instax Sales Transaction Analysis

This notebook serves as the main interactive environment for the Instax Sales Transaction Machine Learning project. It includes all parts of the project, from data loading and preprocessing to model training and evaluation.

In [1]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from src.preprocessing import load_and_preprocess_data
from src.linear_regression import LinearRegression
from src.logistic_regression import LogisticRegression
from src.experiments import run_experiments
from src.widgets_interface import create_widgets

# Load and preprocess the data
data = load_and_preprocess_data('data/instax_sales_transaction_data.csv')

# Display the first few rows of the dataset
data.head()

## Exploratory Data Analysis (EDA)

In this section, we will perform EDA to understand the dataset better.

In [2]:
# Dataset shape
print(f'Dataset shape: {data.shape}')

# Missing values overview
missing_values = data.isnull().sum()
print('Missing values overview:')
print(missing_values[missing_values > 0])

# Numeric and categorical feature summary
numeric_summary = data.describe()
categorical_summary = data.select_dtypes(include=['object']).describe()
print('Numeric feature summary:')
print(numeric_summary)
print('Categorical feature summary:')
print(categorical_summary)

# Correlation matrix
plt.figure(figsize=(10, 8))
sns.heatmap(data.corr(), annot=True, fmt='.2f', cmap='coolwarm')
plt.title('Correlation Matrix')
plt.show()

## Data Cleaning

In this section, we will clean the dataset by handling missing values, converting dates, and creating new features.

In [3]:
# Handle missing values
data.fillna(method='ffill', inplace=True)  # Forward fill for simplicity

# Convert dates to datetime
data['transaction_date'] = pd.to_datetime(data['transaction_date'])

# Create new features
data['month'] = data['transaction_date'].dt.month
data['day'] = data['transaction_date'].dt.day
data['season'] = data['transaction_date'].dt.month % 12 // 3 + 1
data['revenue'] = data['quantity'] * data['price']
data['profit'] = data['revenue'] - data['cost']

# Encode categorical features
data = pd.get_dummies(data, columns=['category'], drop_first=True)

# Split dataset for regression and classification tasks
X = data.drop(['total_revenue'], axis=1)
y_regression = data['total_revenue']
y_classification = (data['total_revenue'] > data['total_revenue'].median()).astype(int)

## Linear Regression

In this section, we will implement linear regression from scratch using NumPy.

In [4]:
# Initialize and train the linear regression model
linear_model = LinearRegression(learning_rate=0.01, epochs=1000)
linear_model.fit(X, y_regression)

# Plot MSE vs epochs
plt.plot(linear_model.losses)
plt.title('MSE vs Epochs')
plt.xlabel('Epochs')
plt.ylabel('MSE')
plt.show()

## Logistic Regression

In this section, we will implement logistic regression from scratch.

In [5]:
# Initialize and train the logistic regression model
logistic_model = LogisticRegression(learning_rate=0.01, epochs=1000)
logistic_model.fit(X, y_classification)

# Plot training loss vs epochs
plt.plot(logistic_model.losses)
plt.title('Training Loss vs Epochs')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.show()

## Model Evaluation

In this section, we will evaluate the models using various metrics.

In [6]:
# Evaluate models
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, roc_auc_score

# Predictions
y_pred_class = logistic_model.predict(X)

# Calculate metrics
accuracy = accuracy_score(y_classification, y_pred_class)
precision = precision_score(y_classification, y_pred_class)
recall = recall_score(y_classification, y_pred_class)
f1 = f1_score(y_classification, y_pred_class)
conf_matrix = confusion_matrix(y_classification, y_pred_class)
roc_auc = roc_auc_score(y_classification, y_pred_class)

# Display metrics
print(f'Accuracy: {accuracy}')
print(f'Precision: {precision}')
print(f'Recall: {recall}')
print(f'F1 Score: {f1}')
print('Confusion Matrix:')
print(conf_matrix)
print(f'ROC AUC: {roc_auc}')

## Interactive Interface

In this section, we will create an interactive interface using ipywidgets.

In [7]:
# Create interactive widgets
create_widgets()

## Conclusion

In this notebook, we have performed data loading, preprocessing, exploratory data analysis, and implemented linear and logistic regression models from scratch. We also evaluated the models and created an interactive interface for further exploration.