# Clothes Size Predictions 🧶

#### If you like my work, It will be really great of you to upvote this notebook!
#### If not then you leaving a comment on what do I need to work on and improve will be really helpful!

## Importing Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.simplefilter("ignore")

## Loading up the data

In [None]:
df = pd.read_csv("../input/clothessizeprediction/final_test.csv")
df.head()

In [None]:
df.isna().sum()

In [None]:
# Filling the missing values with the median
df['age'] = df ['age'].fillna(df['age'].median())
df['height'] = df ['height'].fillna(df['height'].median())

In [None]:
df.isna().sum()

In [None]:
df.describe()

In [None]:
# Having a look at the correlation matrix

fig, ax = plt.subplots(figsize=(8,6))
sns.heatmap(df.corr(), annot=True, fmt='.1g', cmap="viridis",);

The clothes size is highly dependent on the weight comapared to height and age. 

In [None]:
df["size"].value_counts()

In [None]:
plt.style.use("seaborn")
fig, ax = plt.subplots(figsize=(8,6))
sns.countplot(x=df["size"], palette="hls");

Most size of the cloth used is `M` and minimum is `XXL`

In [None]:
plt.style.use("seaborn")
fig, ax = plt.subplots(figsize=(8,6))
sns.distplot(df["height"], color="r");

Most of people have their heights between `160 cm to 175 cm`

In [None]:
plt.style.use("seaborn")
fig, ax = plt.subplots(figsize=(8,6))
sns.distplot(df["weight"], color="b");

The average weight appears to be between `50 to 70 kilos`

In [None]:
plt.style.use("seaborn")
fig, ax = plt.subplots(figsize=(8,6))
sns.distplot(df["age"], color="darkorange");

An average age of the people seems to be around `30 to 40 years`

In [None]:
df["size"].value_counts()

## Mapping the size of clothes to make the dataset have numeric values

* **XXS** : 1

* **S** : 2

* **M** : 3

* **L** : 4

* **XL** : 5

* **XXL** : 6

* **XXXL** : 7

In [None]:
df['size'] = df['size'].map({'XXS': 1, 'S': 2, "M" : 3, "L" : 4, "XL" : 5, "XXL" : 6, "XXXL" : 7})

In [None]:
# Having a look at the dataset after the numerical transformation
df.head()

## Splitting the data into training and test datasets
Here, we are trying to predict the clothes size of a person using the given data. Hence, the `size` will be the y label and rest of the data will be the X or the input data.

In [None]:
# X data
X = df.drop("size", axis=1)

In [None]:
X.head()

In [None]:
# y data
y = df["size"]
y.head()

In [None]:
# Splitting the data into X train, X test and y train, y test

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2)

In [None]:
len(X_train), len(X_test)

## Training the model

### Linear Regression

In [None]:
from sklearn.linear_model import LinearRegression
clf = LinearRegression()

In [None]:
clf.fit(X_train,y_train)

In [None]:
clf.predict(X_test)

In [None]:
LinearRegressionScore = clf.score(X_test,y_test)
print("Accuracy obtained by Linear Regression model:",LinearRegressionScore*100)

### Random Forest Classifier

In [None]:
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()

In [None]:
model.fit(X_train,y_train)

In [None]:
model.predict(X_test)

In [None]:
np.array(y_test)

In [None]:
RandomForestClassifierScore = model.score(X_test,y_test)
print("Accuracy obtained by Random Forest Classifier model:",RandomForestClassifierScore*100)

### KNeighborsClassifier

In [None]:
from sklearn.neighbors import KNeighborsClassifier
clf1 = KNeighborsClassifier(42)

In [None]:
clf1.fit(X_train,y_train)

In [None]:
clf1.predict(X_test)

In [None]:
np.array(y_test)

In [None]:
KNeighborsClassifierScore = clf1.score(X_test,y_test)
print("Accuracy obtained by K Neighbors Classifier model:",KNeighborsClassifierScore*100)

### DecisionTreeClassifier

In [None]:
from sklearn.tree import DecisionTreeClassifier
tree = DecisionTreeClassifier()

In [None]:
tree.fit(X_train,y_train)

In [None]:
DecisionTreeClassifierScore = tree.score(X_test,y_test)
print("Accuracy obtained by Decision Tree Classifier model:",DecisionTreeClassifierScore*100)

## Comparing the performance of the models

In [None]:
plt.style.use("seaborn")

x = ["Random Forest Classifier", 
     "K Neighbors Classifier", 
     "Decision Tree Classifier", 
     "Linear Regression"]

y = [RandomForestClassifierScore, 
     KNeighborsClassifierScore, 
     DecisionTreeClassifierScore, 
     LinearRegressionScore]

fig, ax = plt.subplots(figsize=(8,6))
sns.barplot(x=x,y=y, palette="crest");
plt.ylabel("Model Accuracy")
plt.xticks(rotation=40)
plt.title("Model Comparison - Model Accuracy");