# Similar-Products

### 1. Importing Header Files 

In [1]:
import numpy as np
import pandas as pd
import os


### 2. Loading and understanding Data 

In [2]:
# Loading the Data
data = pd.read_json('C:\\Users\\YASH\\Downloads\\tops_fashion.json')

In [3]:
# Shape of Data
print ('Number of Products : ', data.shape[0], \
       'Number of Features/Variables:', data.shape[1])

Number of Products :  183138 Number of Features/Variables: 19


In [4]:
data.columns

Index(['sku', 'asin', 'product_type_name', 'formatted_price', 'author',
       'color', 'brand', 'publisher', 'availability', 'reviews',
       'large_image_url', 'availability_type', 'small_image_url',
       'editorial_review', 'title', 'model', 'medium_image_url',
       'manufacturer', 'editorial_reivew'],
      dtype='object')

Of these 19 features, we will be using only 7 features in our recommendation system including image url
1. asin  ( Amazon standard identification number)
2. brand ( brand to which the product belongs to)
3. color ( Color information of apparel, it can contain many colors as   a value ex: red and black stripes )
4. product_type_name (type of the apperal, ex: SHIRT/TSHIRT )
5. medium_image_url  ( url of the image )
6. title (title of the product.)
7. formatted_price (price of the product)

In [5]:
data = data[['asin', 'brand', 'color', 'medium_image_url', 'product_type_name', 'title', 'formatted_price']]

In [6]:
print ('Number of data points : ', data.shape[0], \
       'Number of features:', data.shape[1])

Number of data points :  183138 Number of features: 7


In [7]:
#viewing data
data.head()

Unnamed: 0,asin,brand,color,medium_image_url,product_type_name,title,formatted_price
0,B016I2TS4W,FNC7C,,https://images-na.ssl-images-amazon.com/images...,SHIRT,Minions Como Superheroes Ironman Long Sleeve R...,
1,B01N49AI08,FIG Clothing,,https://images-na.ssl-images-amazon.com/images...,SHIRT,FIG Clothing Womens Izo Tunic,
2,B01JDPCOHO,FIG Clothing,,https://images-na.ssl-images-amazon.com/images...,SHIRT,FIG Clothing Womens Won Top,
3,B01N19U5H5,Focal18,,https://images-na.ssl-images-amazon.com/images...,SHIRT,Focal18 Sailor Collar Bubble Sleeve Blouse Shi...,
4,B004GSI2OS,FeatherLite,Onyx Black/ Stone,https://images-na.ssl-images-amazon.com/images...,SHIRT,Featherlite Ladies' Long Sleeve Stain Resistan...,$26.26


In [8]:
data.isnull().sum()

asin                      0
brand                   151
color                118182
medium_image_url          0
product_type_name         0
title                     0
formatted_price      154743
dtype: int64

In [9]:
data.describe()

Unnamed: 0,asin,brand,color,medium_image_url,product_type_name,title,formatted_price
count,183138,182987,64956,183138,183138,183138,28395
unique,183138,10577,7380,170782,72,175985,3135
top,B071ZV3NB5,Zago,Black,https://images-na.ssl-images-amazon.com/images...,SHIRT,Nakoda Cotton Self Print Straight Kurti For Women,$19.99
freq,1,223,13207,23,167794,77,945


### Removing Data rows with Null values in color and formatted price

In [10]:
data = data.loc[~data['formatted_price'].isnull()]
data =data.loc[~data['color'].isnull()]
print('Number of data points After eliminating some Null rows :', data.shape[0])

Number of data points After eliminating some Null rows : 28385


### Understanding and Removing Duplicates

In [11]:
print(sum(data.duplicated('title')))

2325


In [12]:
# Remove All products with short titles
data_sorted = data[data['title'].apply(lambda x: len(x.split())>4)]
print("After removal of products with short description:", data_sorted.shape[0])

After removal of products with short description: 27949


In [13]:
#sorting the data alphabetically
data_sorted.sort_values('title',inplace=True, ascending=False)
data_sorted.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,asin,brand,color,medium_image_url,product_type_name,title,formatted_price
61973,B06Y1KZ2WB,Éclair,Black/Pink,https://images-na.ssl-images-amazon.com/images...,SHIRT,Éclair Women's Printed Thin Strap Blouse Black...,$24.99
133820,B010RV33VE,xiaoming,Pink,https://images-na.ssl-images-amazon.com/images...,SHIRT,xiaoming Womens Sleeveless Loose Long T-shirts...,$18.19
81461,B01DDSDLNS,xiaoming,White,https://images-na.ssl-images-amazon.com/images...,SHIRT,xiaoming Women's White Long Sleeve Single Brea...,$21.58
75995,B00X5LYO9Y,xiaoming,Red Anchors,https://images-na.ssl-images-amazon.com/images...,SHIRT,xiaoming Stripes Tank Patch/Bear Sleeve Anchor...,$15.91
151570,B00WPJG35K,xiaoming,White,https://images-na.ssl-images-amazon.com/images...,SHIRT,xiaoming Sleeve Sheer Loose Tassel Kimono Woma...,$14.32
