<span style="font-family:cursive; font-size:25px; color:crimson;"> Introduction </span>

H&M Group is a family of brands and businesses with 53 online markets and approximately 4,850 stores. Our online store offers shoppers an extensive selection of products to browse through. But with too many choices, customers might not quickly find what interests them or what they are looking for, and ultimately, they might not make a purchase. To enhance the shopping experience, product recommendations are key. More importantly, helping customers make the right choices also has a positive implications for sustainability, as it reduces returns, and thereby minimizes emissions from transportation.

In this competition, H&M Group invites you to develop product recommendations based on data from previous transactions, as well as from customer and product meta data. The available meta data spans from simple data, such as garment type and customer age, to text data from product descriptions, to image data from garment images.

There are no preconceptions on what information that may be useful – that is for you to find out. If you want to investigate a categorical data type algorithm, or dive into NLP and image processing deep learning, that is up to you.

<img src="https://www.ecotextile.com/images/stories/2021/November/HnM.jpg" width=500></img>

<span style="font-family:cursive; font-size:25px; color:crimson;"> Data Description </span> 

For this challenge you are given the purchase history of customers across time, along with supporting metadata. Your challenge is to predict what articles each customer will purchase in the 7-day period immediately after the training data ends. Customer who did not make any purchase during that time are excluded from the scoring.

<span style="font-family:cursive; font-size:25px; color:crimson;"> Files </span>


* images/ - a folder of images corresponding to each article_id; images are placed in subfolders starting with the first three digits of the article_id; note, not all article_id values have a corresponding image.
* articles.csv - detailed metadata for each article_id available for purchase
* customers.csv - metadata for each customer_id in dataset
* sample_submission.csv - a sample submission file in the correct format
* transactions_train.csv - the training data, consisting of the purchases each customer for each date, as well as additional information. Duplicate rows correspond to multiple purchases of the same item. Your task is to predict the article_ids each customer will purchase during the 7-day period immediately after the training data period.

NOTE: You must make predictions for all customer_id values found in the sample submission. All customers who made purchases during the test period are scored, regardless of whether they had purchase history in the training data.

<span style="font-family:cursive; font-size:25px; color:crimson;"> Going through data </span>

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
article = pd.read_csv("../input/h-and-m-personalized-fashion-recommendations/articles.csv")
customer = pd.read_csv("../input/h-and-m-personalized-fashion-recommendations/customers.csv")
sample_submission = pd.read_csv("../input/h-and-m-personalized-fashion-recommendations/sample_submission.csv")
transaction_train = pd.read_csv("../input/h-and-m-personalized-fashion-recommendations/transactions_train.csv")

In [None]:
article.head()

In [None]:
customer.head()

In [None]:
sample_submission.head()

In [None]:
transaction_train.head()

In [None]:
import plotly.express as px
label = article.index_group_name.unique()
count = article.index_group_name.value_counts()
fig = px.pie(article, values=count, names=label,color_discrete_sequence=px.colors.sequential.Purpor)
fig.show()

In [None]:
name = article.garment_group_name.unique()
sizes = article.garment_group_name.value_counts()
fig = px.bar(article, y=sizes, x=name, color = name,color_discrete_sequence=px.colors.sequential.Plotly3
            )
fig.update_layout(
    title="Count of Garment Group Name",
    xaxis_title="Garment Group Name",
    yaxis_title="Count"
)
fig.show()

Jersey Basic has the highest count while Dresses/Skirts for girls has the lowest count 

In [None]:
name = article.index_name.unique()
sizes = article.index_name.value_counts()
fig = px.bar(article, y=sizes, x=name, color = name,color_discrete_sequence=px.colors.diverging.Picnic)
fig.update_layout(
    title="Count of Index name",
    xaxis_title="index_name",
    yaxis_title="Count"
)
fig.show()

From the above graph, Ladieswear has maximum number of count and children accessories,swimwear has the least number of counts

<span style="font-family:cursive; font-size:25px; color:crimson;"> Sunburst Chart </span>

Sunburst plots visualize hierarchical data spanning outwards radially from root to leaves. The root starts from the center and children are added to the outer rings.

In [None]:
fig = px.sunburst(article, path=['index_group_name', 'index_name', 'garment_group_name'],width=800,
    height=800,color_discrete_sequence=px.colors.cyclical.Edge)
fig.show()

**<span style="color:purple;"> Click on different parents(Ladieswear,Sport,Divided,etc) to interact more with it and to see nice visualization effect</span>**

In above graph different color represents different sections

In the middle is the parent and it is further subdivided into various types

For example Baby/Children shown in the middle is the parent and it is further divided into different children sizes, baby sizes, children accessories and swimwear.

I am on my way to find more cool visualizations if I find more I will update the notebook or make a new one :)