# Challenge: Promotions

In this challenge, you'll develop codes to parse and analyze data returned from another API on Zalando such as [Promos homme (Men's Promotions)
](https://www.zalando.fr/promo-homme/) or [Promos femme (Women's Promotions)](https://www.zalando.fr/promo-femme/). The workflow is almost the same as in the guided lesson but you'll work with different data.

## Obtaining the link

Wrote your codes in the cell below to obtain the data from the API endpoint you choose. A recap of the workflow:

1. Examine the webpages and choose one that you want to work with.

1. Use Google Chrome's DevTools to inspect the XHR network requests. Find out the API endpoint that serves data to the webpage.

1. Test the API endpoint in the browser to verify its data.

1. Change the page number offset of the API URL to test if it's working.

## Reading the data

In the next cell, use Python to obtain data from the API endpoint you chose in the previous step. Workflow:

1. Import libraries.

1. Define the initial API endpoint URL.

1. Make request to obtain data of the 1st page. Flatten the data and store it in an empty object variable.

1. Find out the total page count in the 1st page data.

1. Use a FOR loop to make requests for the additional pages from 2 to page count. Append the data of each additional page to the flatterned data object.

1. Print and review the data you obtained.

In [10]:
import json
import requests
import pandas as pd
from pandas.io.json import json_normalize

In [11]:
url='https://www.zalando.fr/api/catalog/articles?categories=promo-homme&limit=84&offset=84&sort=sale'

In [12]:
hdr = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"}

def my_get(url):
    return requests.get(url, headers=hdr)

#response = requests.get(url, headers=hdr)
response = my_get(url)

In [13]:
#response = requests.get(url)
results = response.json()
flattened_data = json_normalize(results)

flattened_data1 = json_normalize(flattened_data.articles[0])
flattened_data1.head(2)

Unnamed: 0,brand_name,family_articles,flags,is_premium,media,name,outfits,price.has_different_original_prices,price.has_different_prices,price.has_different_promotional_prices,price.has_discount_on_selected_sizes_only,price.original,price.promotional,product_group,sizes,sku,url_key
0,Nike Sportswear,"[{'sku': 'NI122E04W-Q12', 'url_key': 'nike-spo...","[{'key': 'discountRate', 'value': 'Jusqu’à -25...",False,[{'path': 'NI/12/2E/04/WQ/12/NI122E04W-Q12@11....,PANT TRIBUTE - Pantalon de survêtement - black...,,False,True,True,False,"49,95 €","37,45 €",clothing,"[S, M, L, XL, XXL, 3XL]",NI122E04W-Q12,nike-sportswear-pant-oh-tribute-pantalon-de-su...
1,Hollister Co.,"[{'sku': 'H0422O054-K12', 'url_key': 'holliste...","[{'key': 'discountRate', 'value': '-20%', 'tra...",False,[{'path': 'H0/42/2O/05/4K/12/H0422O054-K12@4.j...,MUSCLE FIT CREW - T-shirt basique - cobalt,,False,False,False,False,"17,95 €","14,35 €",clothing,"[XS, S, M, L, XL, XXL]",H0422O054-K12,hollister-co-t-shirt-basique-cobalt-h0422o054-k12


In [14]:
import warnings

def fxn():
    warnings.warn("deprecated", DeprecationWarning)

with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    fxn()

In [15]:
# Get the total number of pages

total_pages=results['pagination']['page_count']

# Your code
df=pd.DataFrame()
for i in range(total_pages):
    k=84*i
    if i%100==0:
        print(i)
    url=f'https://www.zalando.fr/api/catalog/articles?categories=promo-homme&limit=84&offset={k}&sort=sale'
    #response = requests.get(url)promo-enfant&limit=84&offset={k}&sort=sale
    response=my_get(url)
    results = response.json()
    flattened_data = json_normalize(results)
    flattened_data1 = json_normalize(flattened_data.articles[0])
    flattened_data1=flattened_data1.set_index('sku')
    df = df.append(flattened_data1)
df.head(20)

0


of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  sort=sort)


100
200
300
400
500


KeyError: 'sku'

## Bonus

Extract the following information from the data:

* The trending brand.

* The product(s) with the highest discount.

* The sum of discounts of all goods (sum_discounted_prices divided by sum_original_prices).

In [None]:
# Display the trending brand in DataFrame
df.brand_name.value_counts().index[0]

In [9]:
df.columns

Index(['amount', 'brand_name', 'family_articles', 'flags', 'is_premium',
       'media', 'name', 'outfits', 'price.base_price',
       'price.has_different_original_prices', 'price.has_different_prices',
       'price.has_different_promotional_prices',
       'price.has_discount_on_selected_sizes_only', 'price.original',
       'price.promotional', 'product_group', 'sizes',
       'tracking_information.impression_beacon',
       'tracking_information.metrigo_impression_urls',
       'tracking_information.source', 'url_key'],
      dtype='object')

In [10]:
df.drop(['url_key','price.has_different_original_prices','price.has_different_prices','price.has_different_promotional_prices','price.has_discount_on_selected_sizes_only','tracking_information.impression_beacon','tracking_information.metrigo_impression_urls','tracking_information.source','amount'],axis=1,inplace=True)
#droping the columns I don't need

In [11]:
df.columns

Index(['brand_name', 'family_articles', 'flags', 'is_premium', 'media', 'name',
       'outfits', 'price.base_price', 'price.original', 'price.promotional',
       'product_group', 'sizes'],
      dtype='object')

In [12]:
df.dtypes

brand_name           object
family_articles      object
flags                object
is_premium             bool
media                object
name                 object
outfits              object
price.base_price     object
price.original       object
price.promotional    object
product_group        object
sizes                object
dtype: object

In [13]:
#THIS IS AN EXAMPLE OF THE LECON. Display the brand with maximal total discount (sum of discounts on all goods

#Our data is still text. Convert prices into numbers:
df['price.original']=df['price.original'].str.extract('(\d*,\d*)')
df['price.promotional']=df['price.promotional'].str.extract('(\d*,\d*)')

df['price.original'] = [x.replace(',', '.') for x in df['price.original']]
df['price.promotional'] = [x.replace(',', '.') for x in df['price.promotional']]

df['discount_amount']=df['price.original'].astype(float)-df['price.promotional'].astype(float)
df_discount=df.copy()

total_disc=df_discount.groupby(['brand_name']).sum().discount_amount
total_disc.sort_values(ascending=False).index[0]
# Display the brand with maximal total discount

'Polo Ralph Lauren'

In [14]:
# df['price.original'] = df['price.original'].astype('float')
# df['price.original'].dtype
#  there were issues with type between float and string method


The product(s) with the highest discount.

The sum of discounts of all goods (sum_discounted_prices divided by sum_original_prices).

In [16]:
# df1 is a copy of df but with new column discount amount
df_discount.head(2)

Unnamed: 0_level_0,brand_name,family_articles,flags,is_premium,media,name,outfits,price.base_price,price.original,price.promotional,product_group,sizes,discount_amount
sku,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
YO122O0CW-Q11,YOURTURN,"[{'sku': 'YO122O0CW-Q11', 'url_key': 'yourturn...","[{'key': 'sponsored', 'value': 'Sponsorisé', '...",False,[{'path': 'YO/12/2O/0C/WQ/11/YO122O0CW-Q11@10....,3-PACK - Débardeur - black,,,19.99,12.0,clothing,"[XXS, XS, S, M, L]",7.99
TH342G04S-E11,The North Face,"[{'sku': 'TH342G04S-E11', 'url_key': 'the-nort...","[{'key': 'sponsored', 'value': 'Sponsorisé', '...",False,[{'path': 'TH/34/2G/04/SE/11/TH342G04S-E11@2.1...,Sweatshirt - yellow,,,84.95,76.46,clothing,"[S, M, L, XL, XXL]",8.49


In [17]:
df_discount['discount_amount'].max()

686.0

In [18]:
df_discount['discount_amount'].idxmax()

'6MI22Q003-T11'

In [25]:
max_disc=df_discount.groupby('sku').max().discount_amount
max_disc.sort_values(ascending=False).index[0]

'6MI22Q003-T11'

In [20]:
max_disc.sort_values(ascending=False).head(3)
#Here is the product with highest discount

sku
6MI22Q003-T11    686.0
6MI22E000-K11    672.0
MT022T00P-C11    648.0
Name: discount_amount, dtype: float64

In [32]:
df_discount.rename(columns={"price.promotional": "price_promo", "price.original": "price_orig"})

Unnamed: 0_level_0,brand_name,family_articles,flags,is_premium,media,name,outfits,price.base_price,price_orig,price_promo,product_group,sizes,discount_amount
sku,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
YO122O0CW-Q11,YOURTURN,"[{'sku': 'YO122O0CW-Q11', 'url_key': 'yourturn...","[{'key': 'sponsored', 'value': 'Sponsorisé', '...",False,[{'path': 'YO/12/2O/0C/WQ/11/YO122O0CW-Q11@10....,3-PACK - Débardeur - black,,,19.99,12.00,clothing,"[XXS, XS, S, M, L]",7.99
TH342G04S-E11,The North Face,"[{'sku': 'TH342G04S-E11', 'url_key': 'the-nort...","[{'key': 'sponsored', 'value': 'Sponsorisé', '...",False,[{'path': 'TH/34/2G/04/SE/11/TH342G04S-E11@2.1...,Sweatshirt - yellow,,,84.95,76.46,clothing,"[S, M, L, XL, XXL]",8.49
AD542D1NQ-Q11,adidas Performance,"[{'sku': 'AD542D1NQ-Q11', 'url_key': 'adidas-p...","[{'key': 'sponsored', 'value': 'Sponsorisé', '...",False,[{'path': 'AD/54/2D/1N/QQ/11/AD542D1NQ-Q11@11....,RUN TEE - T-shirt imprimé - black,,,24.95,17.45,clothing,"[XS, S, M, L, XL, XXL]",7.50
NI122O0CE-C11,Nike Sportswear,"[{'sku': 'NI122O0CE-C11', 'url_key': 'nike-spo...","[{'key': 'discountRate', 'value': '-10%', 'tra...",False,[{'path': 'NI/12/2O/0C/EC/11/NI122O0CE-C11@4.j...,CLUB TEE - T-shirt basique - dark grey heather...,,,19.95,17.95,clothing,"[XS, S, M, L, XL, XXL]",2.00
L0642D008-Q11,Lacoste Sport,"[{'sku': 'L0642D008-Q11', 'url_key': 'lacoste-...","[{'key': 'discountRate', 'value': 'Jusqu’à -31...",False,[{'path': 'L0/64/2D/00/8Q/11/L0642D008-Q11@16....,HERREN - T-shirt imprimé - black,,,45.00,31.00,clothing,"[XS, M, L, XL, XXL, 3XL, 4XL]",14.00
LA222D01Y-503,Lacoste,"[{'sku': 'LA222D01Y-503', 'url_key': 'lacoste-...","[{'key': 'discountRate', 'value': 'Jusqu’à -30...",False,[{'path': 'LA/22/2D/01/Y5/03/LA222D01Y-503@20....,T-shirt basique - navy blue,,,40.00,28.00,clothing,"[XS, S, M, L, XL, XXL, 3XL]",12.00
TO122P00J-K12,Tommy Hilfiger,"[{'sku': 'TO122P00J-K12', 'url_key': 'tommy-hi...","[{'key': 'discountRate', 'value': '-50%', 'tra...",False,[{'path': 'TO/12/2P/00/JK/12/TO122P00J-K12@15....,PERFORMANCE REGULAR FIT - Polo - cloud heather,,,69.95,34.95,clothing,"[M, L, XL, XXL, 3XL]",35.00
JA222E0FG-O11,Jack & Jones,"[{'sku': 'JA222E0FG-O11', 'url_key': 'jack-and...","[{'key': 'discountRate', 'value': 'Jusqu’à -50...",False,[{'path': 'JA/22/2E/0F/GO/11/JA222E0FG-O11@3.1...,Chino - olive night,,,39.95,20.00,clothing,"[27x30, 27x32, 28x30, 28x32, 29x30, 29x32, 29x...",19.95
NI122E044-Q11,Nike Sportswear,"[{'sku': 'NI122E044-Q11', 'url_key': 'nike-spo...","[{'key': 'discountRate', 'value': 'Jusqu’à -60...",False,[{'path': 'NI/12/2E/04/4Q/11/NI122E044-Q11@10....,PANT - Pantalon de survêtement - black/black,,,79.95,32.00,clothing,"[XS, S, M, L, XL, XXL]",47.95
YO122E00V-B11,YOURTURN,"[{'sku': 'YO122E00V-B11', 'url_key': 'your-tur...","[{'key': 'discountRate', 'value': '-40%', 'tra...",False,[{'path': 'YO/12/2E/00/VB/11/YO122E00V-B11@17....,Pantalon cargo - camel,,,34.99,21.00,clothing,"[30, 32, 33, 34, 36]",13.99


In [33]:
df_discount['price_promo'].sum()
pp

KeyError: 'price_promo'

In [None]:
sum_discount=pp/po