# Questão de Negócio

A empresa Star Jeans! Eduardo e Marcelo são dois brasileiros, amigos e sócios de empreendimento. 

Depois de vários negócio bem sucedidos, eles estão planejando entrar no mercado de moda
dos USA como um modelo de negócio do tipo E-commerce.

A idéia inicial é entrar no mercado com apenas um produto e para um público específico, no caso
o produto seria calças Jenas para o público masculino. O objetivo é manter o custo de operação
baixo e escalar a medida que forem conseguindo clientes.

Porém, mesmo com o produto de entrada e a audiência definidos, os dois sócios não tem experiência
nesse mercado de moda e portanto não sabem definir coisas básicas como preço, o tipo de calça e
o material para a fabricação de cada peça.

Assim, os dois sócios contrataram uma consultoria de Ciência de Dados para responder as seguintes
perguntas: 
1. Qual o melhor preço de venda para as calças? 
2. Quantos tipos de calças e suas cores para o produto inicial? 
3. Quais as matérias-prima necessárias para confeccionar as calças?

As principais concorrentes da empresa Start Jeans são as americadas H&M e Macys.

# 0.0 Imports


In [1]:
import requests
import numpy as np
import pandas as pd
import seaborn as sns

from datetime import datetime
from bs4 import BeautifulSoup
from matplotlib import pyplot as plt
from IPython.core.display import HTML

## 0.1 Helper Functions 

In [2]:
def jupyter_settings():
    %matplotlib inline
    plt.style.use('bmh')
    plt.rcParams['figure.figsize'] = [25,12]
    plt.rcParams['font.size'] = 24
    plt.rcParams['figure.dpi'] = 100
    
    display( HTML( '<style>.container{width:100% !important; }</style>'))
    pd.set_option('display.float_format', lambda x: '%.2f' % x)
    
    # ignora future warnings
    #warnings.filterwarnings('ignore')
    
    sns.set()

In [3]:
jupyter_settings()

# 1.0 Extração de dados em HTML

In [4]:
# make request
url = 'https://www2.hm.com/en_us/men/products/jeans.html'

headers = {'User-Agent': 'Mozilla/5.0 {Macintosh; Intel Mac Os X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36}'}
page = requests.get(url, headers=headers)

In [5]:
soup = BeautifulSoup(page.text, "html.parser")

In [6]:
# extract all products
products = soup.find('ul', class_='products-listing small')

In [7]:
product_list = products.find_all('article', class_='hm-product-item')

# product id
product_id = [p.get('data-articlecode') for p in product_list]

# product_category
product_category = [p.get('data-category') for p in product_list]

In [8]:
# product name
product_list = products.find_all('a', class_='link')
product_name = [p.get_text() for p in product_list]

In [9]:
# price
product_list = products.find_all('span', class_='price regular')
product_price = [p.get_text() for p in product_list]

In [10]:
# product color

In [11]:
# product composition

In [12]:
# merge scrapy into data frame
data = pd.DataFrame([product_id, product_category, product_name, product_price]).T
data.columns = ['product_id', 'product_category', 'product_name', 'product_price']

# scrapy datetime
data['scrapy_datetime'] = datetime.now().strftime('%Y-%m-%d %H:%M:%S')

In [13]:
data.head()

Unnamed: 0,product_id,product_category,product_name,product_price,scrapy_datetime
0,690449022,men_jeans_ripped,Skinny Jeans,$ 39.99,2022-07-25 19:53:45
1,1013317006,men_jeans_joggers,Hybrid Regular Tapered Joggers,$ 44.99,2022-07-25 19:53:45
2,690449043,men_jeans_ripped,Skinny Jeans,$ 39.99,2022-07-25 19:53:45
3,1013317002,men_jeans_joggers,Hybrid Regular Tapered Joggers,$ 44.99,2022-07-25 19:53:45
4,979945001,men_jeans_loose,Loose Jeans,$ 39.99,2022-07-25 19:53:45


# Request all products

In [14]:
# collect all pages from products

total_item = soup.find_all('h2', class_='load-more-heading')[0].get('data-total')
total_item

'73'

In [15]:
page_number = np.ceil(int(total_item)/36)
page_number

3.0

In [16]:
url02 = url + '?page-size' + str(int(page_number*36))
url02

'https://www2.hm.com/en_us/men/products/jeans.html?page-size108'

# Collect product collor

In [17]:
#Api Request
url03 = 'https://www2.hm.com/en_us/productpage.0690449022.html'

page03 = requests.get(url03, headers=headers)

# Beautiful Soup
soup03 = BeautifulSoup(page03.text, 'html.parser')

In [43]:
# color name
product_list = soup03.find_all('a', {'class':['filter-option miniature', 'filter-option miniature active']} )
color_name = [p.get('data-color') for p in product_list]

# product id
product_id = [p.get('data-articlecode') for p in product_list]

df_color = pd.DataFrame( [product_id, color_name]).T
df_color.columns = ['product_id', 'color_name']

# generate style id + color id
df_color['style_id'] = df_color['product_id'].apply(lambda x: x[:-3])
df_color['color_id'] = df_color['product_id'].apply(lambda x: x[-3:])

In [44]:
# composition

product_composition_list = soup03.find_all('div', class_='details-attributes-list-item')
product_composition = [list(filter(None, p.get_text().split('\n'))) for p in product_composition_list]

# rename dataframe
df_composition = pd.DataFrame(product_composition).T
df_composition.columns = df_composition.iloc[0]

# delete first row
df_composition = df_composition.iloc[1:].fillna(method='ffill')

# generate style id + color id
df_composition['style_id'] = df_composition['Art. No.'].apply(lambda x: x[:-3])
df_composition['color_id'] = df_composition['Art. No.'].apply(lambda x: x[-3:])

# merge data color + decomposition
data_sku = pd.merge(df_color, df_composition[['style_id', 'Fit', 'Composition']], how='left', on='style_id')

In [46]:
data_sku

Unnamed: 0,product_id,color_name,style_id,color_id,Fit,Composition
0,0690449001,Light denim blue/trashed,0690449,001,Skinny fit,"Cotton 98%, Spandex 2%"
1,0690449001,Light denim blue/trashed,0690449,001,Skinny fit,Lining: Polyester 100%
2,0690449001,Light denim blue/trashed,0690449,001,Skinny fit,Lining: Polyester 100%
3,0690449001,Light denim blue/trashed,0690449,001,Skinny fit,Lining: Polyester 100%
4,0690449001,Light denim blue/trashed,0690449,001,Skinny fit,Lining: Polyester 100%
...,...,...,...,...,...,...
109,0690449059,Denim blue,0690449,059,Skinny fit,Lining: Polyester 100%
110,0690449059,Denim blue,0690449,059,Skinny fit,Lining: Polyester 100%
111,0690449059,Denim blue,0690449,059,Skinny fit,Lining: Polyester 100%
112,0690449059,Denim blue,0690449,059,Skinny fit,Lining: Polyester 100%
