<a href="https://colab.research.google.com/github/Kanoru01/eagles-final-project/blob/master/Amazon_Fashion_Recommender_System.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# AMAZON FASHION RECOMMENDATION SYSTEM

## Business Understanding

### Problem statement
As customers' fashion needs increase there is a dire need to keep up with them. In the cases where customers need accurate recommendations based on their fashion sense, it has become essential to save time by recommending what product to purchase

### Business Question
The stakeholders are entrepreneurs venturing into the fashion industry and new startups. We would like to build a recommendation system that would predict the products to customers based on their input fashion choices.

### Main objective
To build a model that predicts amazon fashion products to customers using images.

### Supplementary Objectives
- To find out the competition based on sales
- To look at successful brands and project future collaborations
- Recommend highly rated products to customers.
- Recommending highly rated sellers to customers

## DATA UNDERSTANDING

[Data.world](https://data.world/promptcloud/amazon-fashion-products-2020) through PromptCloud, provided the dataset. The dataset contains the sample of the full Amazon Fashion Products Dataset 2020 from DataStock.The information was assembled in 2021 by data mining teams at PromptCloud and DataStock for various analytical purposes and is presented in json format. 

The dataset contains 33 columns alongside 30,000 obseravtions.
Below, are some of the column descriptions:

- uniq_id-- The unique ID of the product
- crawl_timestamp-- The time of the crawl to pull the data
- asin-- The ASIN of the product
- product_url-- The URL of the product
- product_name-- The name of the product
- image_urls__small-- The url of the images in small size
- medium-- The medium by which the product was seen
- large-- the size of the file
- browsenode
- seller_name-- the name of the seller of the product
- seller_id-- the ID of the seller of the product
- brand-- the brand of the product
- sales_price-- the price of the sale of the product
- discount_percentage-- the discount that was being offered on the product
- weight-- the weight of the product
- rating-- the rating of the product
- no__of_reviews-- The number of reviews that have been given to the product
- delivery_type-- the type of delivery the product will be delivered to the buyer
- meta_keywords-- the keywords used to search for the product
- amazon_prime__y_or_n-- If the buyer has an amazon prime membership or no
- best_seller_tag__y_or_n-- the tag of bestseller or no
- technical_details__k_v_pairs-- the repair was given or no 

## Loading the Data

### Loading libraries

Connecting to Google drive and Importing all the necessary Libraries | Modules.

In [None]:
#connecting collab to google drive 

from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
#import all the relevant libraries

import pandas as pd
import numpy as np
import os, shutil

import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns




### Loading the data

The data was loaded using `os.listdir` method

In [None]:
data= pd.read_json('/content/drive/Shareddrives/Eagles/marketing_sample_for_amazon_com-amazon_fashion_products__20200201_20200430__30k_data.ldjson', lines= True)

In [None]:
#printing the contents of the amazon fashion folder
print(os.listdir('/content/drive/Shareddrives/Eagles/amazon_fashion'))

In [None]:
data.head()

Unnamed: 0,uniq_id,crawl_timestamp,asin,product_url,product_name,image_urls__small,medium,large,browsenode,brand,...,colour,no__of_reviews,seller_name,seller_id,left_in_stock,no__of_offers,no__of_sellers,technical_details__k_v_pairs,formats___editions,name_of_author_for_books
0,26d41bdc1495de290bc8e6062d927729,2020-02-07 05:11:36 +0000,B07STS2W9T,https://www.amazon.in/Facon-Kalamkari-Handbloc...,LA' Facon Cotton Kalamkari Handblock Saree Blo...,https://images-na.ssl-images-amazon.com/images...,https://images-na.ssl-images-amazon.com/images...,https://images-na.ssl-images-amazon.com/images...,1968255000.0,LA' Facon,...,,,,,,,,,,
1,410c62298852e68f34c35560f2311e5a,2020-02-07 08:45:56 +0000,B07N6TD2WL,https://www.amazon.in/Sf-Jeans-Pantaloons-T-Sh...,Sf Jeans By Pantaloons Men's Plain Slim fit T-...,https://images-na.ssl-images-amazon.com/images...,https://images-na.ssl-images-amazon.com/images...,https://images-na.ssl-images-amazon.com/images...,1968123000.0,,...,,,,,,,,,,
2,52e31bb31680b0ec73de0d781a23cc0a,2020-02-06 11:09:38 +0000,B07WJ6WPN1,https://www.amazon.in/LOVISTA-Traditional-Prin...,LOVISTA Cotton Gota Patti Tassel Traditional P...,https://images-na.ssl-images-amazon.com/images...,https://images-na.ssl-images-amazon.com/images...,https://images-na.ssl-images-amazon.com/images...,1968255000.0,LOVISTA,...,,,,,,,,,,
3,25798d6dc43239c118452d1bee0fb088,2020-02-07 08:32:45 +0000,B07PYSF4WZ,https://www.amazon.in/People-Printed-Regular-T...,People Men's Printed Regular fit T-Shirt,https://images-na.ssl-images-amazon.com/images...,https://images-na.ssl-images-amazon.com/images...,https://images-na.ssl-images-amazon.com/images...,1968123000.0,,...,,,,,,,,,,
4,ad8a5a196d515ef09dfdaf082bdc37c4,2020-02-06 14:27:48 +0000,B082KXNM7X,https://www.amazon.in/Monte-Carlo-Cotton-Colla...,Monte Carlo Grey Solid Cotton Blend Polo Colla...,https://images-na.ssl-images-amazon.com/images...,https://images-na.ssl-images-amazon.com/images...,https://images-na.ssl-images-amazon.com/images...,1968070000.0,,...,,,,,,,,,,


In [None]:
data.shape

(30000, 33)

In [None]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30000 entries, 0 to 29999
Data columns (total 33 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   uniq_id                        30000 non-null  object 
 1   crawl_timestamp                30000 non-null  object 
 2   asin                           30000 non-null  object 
 3   product_url                    30000 non-null  object 
 4   product_name                   30000 non-null  object 
 5   image_urls__small              29998 non-null  object 
 6   medium                         29998 non-null  object 
 7   large                          28841 non-null  object 
 8   browsenode                     29480 non-null  float64
 9   brand                          21857 non-null  object 
 10  sales_price                    27110 non-null  float64
 11  weight                         30000 non-null  object 
 12  rating                         30000 non-null 

In [None]:
data['rating'][0]

5.0