In [2]:
#import libraries
import pandas as pd
import matplotlib.pyplot as plt

# Women’s Clothing E-Commerce

This is a Women’s Clothing E-Commerce dataset revolving around the reviews written by customers.

This dataset includes 23486 rows and 10 feature variables. Each row corresponds to a customer review, and includes the variables:

- **Clothing ID**: Integer Categorical variable that refers to the specific piece being reviewed.
- **Age**: Positive Integer variable of the reviewers age.
- **Title**: String variable for the title of the review.
- **Review Text**: String variable for the review body.
- **Rating**: Positive Ordinal Integer variable for the product score granted by the customer from 1 Worst, to 5 Best.
- **Recommended IND**: Binary variable stating where the customer recommends the product where 1 is recommended, 0 is not recommended.
- **Positive Feedback Count**: Positive Integer documenting the number of other customers who found this review positive.
- **Division Name**: Categorical name of the product high level division.
- **Department Name**: Categorical name of the product department name.
- **Class Name**: Categorical name of the product class name.

### Questions:
- What are the most rated class names of product ?
- What are the class names of products have most reviewed?
- What are the top five products recommended by reviewers for other people?


### Steps of the project as follows:
1. Load Dataset
2. Explore Dataset
3. Cleaning Dataset
4. Analysis and Visualization



## Load Dataset

In [3]:
# To raed the dataset 
df=pd.read_csv("Womens Clothing E-Commerce Reviews.csv")

## Explore Dataset

In [6]:
# To read the 100 first row 
df.head()

Unnamed: 0.1,Unnamed: 0,Clothing ID,Age,Title,Review Text,Rating,Recommended IND,Positive Feedback Count,Division Name,Department Name,Class Name,Unnamed: 11,Unnamed: 12
0,0,767,33,,Absolutely wonderful - silky and sexy and comf...,4,1,0,Initmates,Intimate,Intimates,,
1,1,1080,34,,Love this dress! it's sooo pretty. i happene...,5,1,4,General,Dresses,Dresses,,
2,2,1077,60,Some major design flaws,I had such high hopes for this dress and reall...,3,0,0,General,Dresses,Dresses,,
3,3,1049,50,My favorite buy!,"I love, love, love this jumpsuit. it's fun, fl...",5,1,0,General Petite,Bottoms,Pants,,
4,4,847,47,Flattering shirt,This shirt is very flattering to all due to th...,5,1,6,General,Tops,Blouses,,`


In [7]:
df.tail()

Unnamed: 0.1,Unnamed: 0,Clothing ID,Age,Title,Review Text,Rating,Recommended IND,Positive Feedback Count,Division Name,Department Name,Class Name,Unnamed: 11,Unnamed: 12
23481,23481,1104,34,Great dress for many occasions,I was very happy to snag this dress at such a ...,5,1,0,General Petite,Dresses,Dresses,,
23482,23482,862,48,Wish it was made of cotton,"It reminds me of maternity clothes. soft, stre...",3,1,0,General Petite,Tops,Knits,,
23483,23483,1104,31,"Cute, but see through","This fit well, but the top was very see throug...",3,0,1,General Petite,Dresses,Dresses,,
23484,23484,1084,28,"Very cute dress, perfect for summer parties an...",I bought this dress for a wedding i have this ...,3,1,2,General,Dresses,Dresses,,
23485,23485,1104,52,Please make more like this one!,This dress in a lovely platinum is feminine an...,5,1,22,General Petite,Dresses,Dresses,,


In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23486 entries, 0 to 23485
Data columns (total 13 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Unnamed: 0               23486 non-null  int64  
 1   Clothing ID              23486 non-null  int64  
 2   Age                      23486 non-null  int64  
 3   Title                    19676 non-null  object 
 4   Review Text              22641 non-null  object 
 5   Rating                   23486 non-null  int64  
 6   Recommended IND          23486 non-null  int64  
 7   Positive Feedback Count  23486 non-null  int64  
 8   Division Name            23472 non-null  object 
 9   Department Name          23472 non-null  object 
 10  Class Name               23472 non-null  object 
 11  Unnamed: 11              0 non-null      float64
 12  Unnamed: 12              1 non-null      object 
dtypes: float64(1), int64(6), object(6)
memory usage: 2.3+ MB


## Cleaning Dataset
In this step, we seek to clean the data, identify the rows and columns that contain an missing value , and delete the rows that we do not need.

In [10]:
## to drop unimportant columns and rows
df.drop(['Unnamed: 0','Title','Review Text','Unnamed: 11','Unnamed: 12','Positive Feedback Count' ], axis='columns', inplace=True)

In [12]:
df.head()

Unnamed: 0,Clothing ID,Age,Rating,Recommended IND,Division Name,Department Name,Class Name
0,767,33,4,1,Initmates,Intimate,Intimates
1,1080,34,5,1,General,Dresses,Dresses
2,1077,60,3,0,General,Dresses,Dresses
3,1049,50,5,1,General Petite,Bottoms,Pants
4,847,47,5,1,General,Tops,Blouses


In [37]:
#create data frame for data which has missing values in  Department Name column
data=df[df['Department Name'].isnull()]

In [20]:
data.head(50)

Unnamed: 0,Clothing ID,Age,Rating,Recommended IND,Division Name,Department Name,Class Name
9444,72,25,5,1,,,
13767,492,23,5,1,,,
13768,492,49,5,1,,,
13787,492,48,5,1,,,
16216,152,36,5,1,,,
16221,152,37,5,1,,,
16223,152,39,5,1,,,
18626,184,34,5,1,,,
18671,184,54,5,1,,,
20088,772,50,5,1,,,


In [38]:
# To drop values that contain missing values
df.drop(index=data.index,axis=0,inplace=True)

In [30]:
# to show the null values in each column
df.isnull().sum()

Clothing ID        0
Age                0
Rating             0
Recommended IND    0
Division Name      0
Department Name    0
Class Name         0
dtype: int64

## Analysis and Visualization

### Q1 : What are the most rated class names of product ?