#        <font color=red>EDA & Visulizations</font>

## <font color =red>Introduction</font>

 > H&M Group is a family of brands and businesses with 53 online markets and approximately 4,850 stores. Our online store offers shoppers an extensive selection of products to browse through. But with too many choices, customers might not quickly find what interests them or what they are looking for, and ultimately, they might not make a purchase. To enhance the shopping experience, product recommendations are key. More importantly, helping customers make the right choices also has a positive implications for sustainability, as it reduces returns, and thereby minimizes emissions from transportation.

In this competition, H&M Group invites you to develop product recommendations based on data from previous transactions, as well as from customer and product meta data. The available meta data spans from simple data, such as garment type and customer age, to text data from product descriptions, to image data from garment images.

There are no preconceptions on what information that may be useful – that is for you to find out. If you want to investigate a categorical data type algorithm, or dive into NLP and image processing deep learning, that is up to you.

## <font color=red>Data Description</font>

* For this challenge you are given the purchase history of customers across time, along with supporting metadata
* Your challenge is to predict what articles each customer will purchase in the 7-day period immediately after the training data ends.
* Customer who did not make any purchase during that time are excluded from the scoring.

### Three important steps to keep in mind are:
1- Understand the data 

2- Clean the data

3- Find a realtionship between data

In [None]:
#  important libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
articles = pd.read_csv('../input/h-and-m-personalized-fashion-recommendations/articles.csv')
customers = pd.read_csv('../input/h-and-m-personalized-fashion-recommendations/customers.csv')
transactions_train = pd.read_csv('../input/h-and-m-personalized-fashion-recommendations/transactions_train.csv')
sample_submission = pd.read_csv("../input/h-and-m-personalized-fashion-recommendations/sample_submission.csv")

In [None]:
art = articles
art

In [None]:
articles.info()


In [None]:
cust = customers
cust

In [None]:
customers.info()


In [None]:
trans = transactions_train
trans

In [None]:
transactions_train.info()

In [None]:
sample = sample_submission
sample

In [None]:
sample.info()

In [None]:
art.head()


In [None]:
art.tail()

In [None]:
cust.head()


In [None]:
trans.head()

In [None]:
sample.head()

In [None]:
art.shape

In [None]:
cust.shape

In [None]:
trans.shape

In [None]:
art.describe()

In [None]:
cust.describe()

In [None]:
trans.describe()

In [None]:
sample.describe()

In [None]:
art.nunique

In [None]:
cust.nunique

In [None]:
trans.nunique

In [None]:
art.columns

In [None]:
cust.columns

In [None]:
trans.columns

# Cleaning and Filtering the data

In [None]:
# find missing values inside
art.isnull().sum()

In [None]:
#removing missing value column (cleaning data)
art_clean = art.drop(['detail_desc'], axis=1)
art_clean.head()

In [None]:
art_clean.isnull().sum()

In [None]:
art_clean = art_clean.dropna()

In [None]:
art_clean.shape

In [None]:
art_clean.isnull().sum()

In [None]:
art_clean.shape

In [None]:
art.shape

In [None]:
art_clean['article_id'].value_counts()

In [None]:
art.describe()

In [None]:
cust.isnull().sum()

In [None]:
sns.distplot(cust["age"])

In [None]:
cust.fillna(0)

In [None]:
#removing missing value column (cleaning data)
cust_clean = cust.drop(['FN'], axis=1)
cust_clean.head()

In [None]:
# out liers removel\
cust_clean["age"].mean()


In [None]:
cust_clean = cust_clean[cust_clean['age'] < 50]
cust_clean.head()





In [None]:
cust_clean.shape

In [None]:
cust_clean["age"].mean()

In [None]:
sns.distplot(cust_clean["age"])

In [None]:
sns.boxplot(x= "Active", y= "age", data= cust_clean)

In [None]:
cust_clean.head()

In [None]:
cust_clean.boxplot()

In [None]:
cust_clean = cust_clean[cust_clean['Active'] < 300]
cust_clean.boxplot()

In [None]:
sns.distplot(cust_clean['Active'])

In [None]:
cust_clean.hist()

In [None]:
pd.value_counts(cust_clean["Active"]).plot.bar()

In [None]:
cust_clean.groupby(["Active", "age"]).mean()

In [None]:
cust.groupby(["FN", "Active", "age"]).mean()

In [None]:
cust.head()

In [None]:
cust.tail()

In [None]:
# find missing values inside
trans.isnull().sum()

In [None]:
trans.describe()

In [None]:
trans.head()

# Relationship

In [None]:
art_clean.corr()

In [None]:
cust_clean.corr()

In [None]:
corr_art_clean = art_clean.corr()

In [None]:
corr_cust_clean = cust_clean.corr()

In [None]:
sns.heatmap(corr_art_clean, annot= True)

In [None]:
sns.heatmap(corr_cust_clean, annot= True)

In [None]:
sns.relplot(x = "article_id", y= "section_no",hue="product_code", data=art_clean)