## Fashion sales project

This project aims to gain insights into the fashion market and optimize the selling process by leveraging on the available fashion sales data and using different machine-learning solutions.

#### Workflow of the project
- Data Collection
- Data Checks to perform
- Exploratory data analysis
- Data Pre-Processing
- Model Training
- Choose best model

In [14]:
# We define all the packages needed to carry out the project
# --- Data visualization and data analysis ---
import matplotlib.pyplot as plt
#from mlxtend.plotting import plot_decision_regions
import seaborn as sns
import numpy as np
from scipy.stats import uniform
import pandas as pd
#import prince
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.metrics import precision_score, recall_score, f1_score
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn import model_selection
from sklearn import preprocessing
from sklearn.utils import resample
from imblearn.over_sampling import SMOTE

# --- Machine learning models ---
from sklearn.ensemble import RandomForestClassifier
from imblearn.ensemble import BalancedRandomForestClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.cluster import KMeans
from sklearn.linear_model import LogisticRegression
from mlxtend.classifier import StackingCVClassifier
from xgboost import XGBClassifier
from sklearn.model_selection import RandomizedSearchCV

%matplotlib inline

# Used to ignore warnings generated from StackingCVClassifier
import warnings
warnings.simplefilter('ignore')

In [3]:
# Load the dataset
df = pd.read_csv('data/mock_fashion_data_uk_us.csv')

#### 4) Data Pre-processing
- Separate data in predictors (indipendent variables) and responses or targets (dependent variables)
- Apply the corresponding transformation on each variable

In [12]:
# Predictors
X = df.drop(columns=['Product Name', 'Price'], axis=1)

# We will use as a response variable the price of the clothes
y = df['Price']

In [13]:
# Check if it was done correctly
X.head()

Unnamed: 0,Brand,Category,Description,Rating,Review Count,Style Attributes,Total Sizes,Available Sizes,Color,Purchase History,Age,Fashion Magazines,Fashion Influencers,Season,Time Period Highest Purchase,Customer Reviews,Social Media Comments,feedback
0,Ralph Lauren,Footwear,Bad,1.421706,492,Streetwear,"M, L, XL",XL,Green,Medium,24,Vogue,Chiara Ferragni,Fall/Winter,Daytime,Mixed,Mixed,Other
1,Ted Baker,Tops,Not Good,1.037677,57,Vintage,"M, L, XL",XL,Black,Above Average,61,Glamour,Leandra Medine,Winter,Weekend,Negative,Neutral,Other
2,Jigsaw,Footwear,Very Bad,3.967106,197,Streetwear,"S, M, L",M,Blue,Average,27,Marie Claire,Gigi Hadid,Summer,Nighttime,Unknown,Negative,Neutral
3,Alexander McQueen,Outerwear,Not Good,2.844659,473,Formal,"S, M, L",L,Red,Very High,50,Marie Claire,Chiara Ferragni,Fall/Winter,Weekend,Neutral,Other,Other
4,Tommy Hilfiger,Bottoms,Very Good,1.183242,55,Sporty,"M, L, XL",S,Green,Above Average,23,Glamour,Song of Style,Spring,Daytime,Positive,Mixed,Positive


In [17]:
# Separete between numerical and categorical features
num_features = X.select_dtypes(exclude="object").columns
cat_features = X.select_dtypes(include="object").columns

# Instantiate transformers
numeric_transformer = StandardScaler()
oh_transformer = OneHotEncoder()

# Create a Column Transformer with 2 types of transformers
preprocessor = ColumnTransformer([("OneHotEncoder", oh_transformer, cat_features),
                                  ("StandardScaler", numeric_transformer, num_features),])

In [18]:
# Apply transformations
X = preprocessor.fit_transform(X)

In [30]:
print([x for x in X[45489, :]])

[<1x107 sparse matrix of type '<class 'numpy.float64'>'
	with 18 stored elements in Compressed Sparse Row format>]
