### Imports

Since we this will be some sort of EDA of the previously scraped data, we will only be using analysis tools such as pandas, numpy and some visualization tools such as matplotlib and seaborn. 

In [7]:
import pandas as pd #We will be using pandas to analyze and do data transformation
import numpy as np #Useful in almost every project
import sqlalchemy #SQL Database connector
from sqlalchemy import create_engine
import config #Database credentials
import datetime as dt

import warnings
warnings.filterwarnings('ignore') 

pd.options.display.max_columns = 300
pd.options.display.max_rows = 300
pd.options.display.max_colwidth = 400

In [8]:
dbtype = config.database_new['dbtype']
user = config.database_new['user']
password = config.database_new['password']
ip = config.database_new['ip']
port = config.database_new['port']
name = config.database_new['name']

engine = create_engine(f'{dbtype}://{user}:{password}@{ip}:{port}/{name}')

### Thinking before starting to explore

What can we do with this data?
- We can pretend to be a brand trying to launch a new product and we need to do a market research for us to know the proper price by looking into similar brands. 
- With the images and product types we can make a clothing article classifier that can be useful for recommendation algorithms later on.
- Make an analysis to know which item per shop is the best seller. We can assume that the ecommerce team knows how to optimize sales and by that assumption we can plot compared at price values over time and see which product has the less discounts. 
- Get color palettes by category or by store. 

There are probably more things we can do with the shopify data, but lets stick with these points for this EDA. 

In [9]:
df = pd.read_sql_table('competitor_products', engine)

In [10]:
print(f'The shape of the Dataset is: {df.shape}')

The shape of the Dataset is: (9513, 14)


In [11]:
df['vendor'].value_counts().head(8)

edmmond.com         1961
Pompeii Brand       1221
ALOHAS              1214
The Brubaker        1075
BlueBanana Brand    1063
scalperscompany     1027
Materia              291
Marathon             258
Name: vendor, dtype: int64

### Splitting data into dataframes just in case we want to do specific store analysis later on.

In [13]:
scalpers_df = df[df['vendor']=='scalperscompany']
pompeii_df = df[df['vendor'].str.contains('ompeii')]
edmmond_df = df[df['vendor']=='edmmond.com']
brubaker_df = df[df['vendor'].str.contains('ruba')]
aloha_df = df[df['vendor']=='ALOHAS']
bluebanana_df = df[df['vendor']=='BlueBanana Brand']

print(f'Scalpers database shape is: {scalpers_df.shape}')
print(f'Pompeii´s database shape is: {pompeii_df.shape}')
print(f'Edmmond´s database shape is: {edmmond_df.shape}')
print(f'The Brubaker´s database shape is: {brubaker_df.shape}')
print(f'The Aloha´s database shape is: {aloha_df.shape}')
print(f'The Bluebanana Brand´s database shape is: {bluebanana_df.shape}')

Scalpers database shape is: (1027, 14)
Pompeii´s database shape is: (1271, 14)
Edmmond´s database shape is: (1961, 14)
The Brubaker´s database shape is: (1175, 14)
The Aloha´s database shape is: (1214, 14)
The Bluebanana Brand´s database shape is: (1063, 14)


In [11]:
new = scalpers_df['handle'].str.rsplit('-', n = 3, expand = True) 

scalpers_df["producto"]= new[0]
scalpers_df["target"]= new[1]
scalpers_df["collection"] = new[2]
scalpers_df["color"] = new[3]

In [12]:
scalpers_df.head()

Unnamed: 0,title,handle,created,updated,product_type,vendor,price,compare_at_price,sku,available,image,require_shipping,position,producto,target,collection,color
0,PUFFY SCARF,26290-puffy-scarf-aw2021-khaki,2020-06-23 14:16:23+00:00,2020-10-22 10:59:47+00:00,Bufanda,scalperscompany,49.9,49.9,8445279010114,True,https://cdn.shopify.com/s/files/1/0015/0942/5197/products/26290-KHAKI-S-1.jpg?v=1603364373,True,1,26290-puffy,scarf,aw2021,khaki
1,PUFFY SCARF,26290-puffy-scarf-aw2021-burgundy,2020-06-23 14:16:23+00:00,2020-10-22 10:59:20+00:00,Bufanda,scalperscompany,49.9,49.9,8445279010107,True,https://cdn.shopify.com/s/files/1/0015/0942/5197/products/26290-BURGUNDY-S-1.jpg?v=1603364347,True,1,26290-puffy,scarf,aw2021,burgundy
2,PUFFY SCARF,26290-puffy-scarf-aw2021-black,2020-06-23 14:16:23+00:00,2020-10-22 14:51:47+00:00,Bufanda,scalperscompany,49.9,49.9,8445279010121,True,https://cdn.shopify.com/s/files/1/0015/0942/5197/products/26290-BLACK-S-1.jpg?v=1603364317,True,1,26290-puffy,scarf,aw2021,black
3,BLAZER BRILLO,26047-shiny-blazer-aw2021-black,2020-06-23 14:15:56+00:00,2020-10-22 11:51:55+00:00,Blazer,scalperscompany,149.0,149.0,8433740165661,True,https://cdn.shopify.com/s/files/1/0015/0942/5197/products/26047-BLACK-S.jpg?v=1599754254,True,1,26047-shiny,blazer,aw2021,black
4,BLAZER BRILLO,26047-shiny-blazer-aw2021-black,2020-06-23 14:15:56+00:00,2020-10-22 11:51:55+00:00,Blazer,scalperscompany,149.0,149.0,8433740165678,True,https://cdn.shopify.com/s/files/1/0015/0942/5197/products/26047-BLACK-S.jpg?v=1599754254,True,2,26047-shiny,blazer,aw2021,black


In [14]:
scalpers_item_df = scalpers_df.groupby(['title', 'product_type', 'color']).agg({'position':['min', 'max', 'mean'],
                                                                                'price':['max'],
                                                                                'compare_at_price':['min', 'mean']}).reset_index()

In [24]:
scalpers_items_images = scalpers_df[['title', 'product_type', 'color', 'image']]

In [None]:
[scalpers_items_images['image'].str.contains('Sin-titulo')]

In [33]:
scalpers_items_images['image'].sample(20)

719                                             https://cdn.shopify.com/s/files/1/0015/0942/5197/products/25274-BURGUNDY-S.jpg?v=1602168095
4677                                             https://cdn.shopify.com/s/files/1/0015/0942/5197/products/25575-SKYBLUE-5.jpg?v=1602234710
5238                                               https://cdn.shopify.com/s/files/1/0015/0942/5197/products/26221-BLACK-S.jpg?v=1601549398
4463                      https://cdn.shopify.com/s/files/1/0015/0942/5197/products/1_bdc845de-8f5a-4af9-ad38-9f9ec5818178.jpg?v=1602851773
4434                                                https://cdn.shopify.com/s/files/1/0015/0942/5197/products/23230-NAVY-S.jpg?v=1603119028
854                                                https://cdn.shopify.com/s/files/1/0015/0942/5197/products/25816-KHAKI-S.jpg?v=1601459952
4406                                                https://cdn.shopify.com/s/files/1/0015/0942/5197/products/25264-NAVY-S.jpg?v=1598344470
189     https://cdn.

In [None]:
scalpers_items_images[scalpers_items_images['title']=='ALASKA BT PARKA']['image']

In [None]:
scalpers_item_df.columns = ['%s%s' % (a, '_%s' % b if b else '') for a, b in scalpers_item_df.columns]

In [None]:
scalpers_item_df

In [None]:
scalpers_df.columns

In [None]:
scalpers_df[scalpers_df['available']==True]['image'].head(8)

In [None]:
response = requests.get("https://cdn.shopify.com/s/files/1/0015/0942/5197/products/25641-KHAKI-S.jpg?v=1603292942")

file = open("sample_image.png", "wb")
file.write(response.content)
file.close()

In [None]:
scalpers_df['color'].value_counts()

In [None]:
scalpers_df['collection'].value_counts()

In [None]:
a[a['collection']=='pant']

In [None]:
a[a['handle'].str.contains('24879')]['handle'].value_counts().to_frame()

In [None]:
a

In [None]:
a['other'].value_counts()

In [None]:
a['categoria'].value_counts()

In [None]:
df['created'] = pd.to_datetime(df['created'], utc = True)
df['updated'] = pd.to_datetime(df['updated'], utc = True)