**Instructions:**       Please   read   all   of   the   instructions   and   the   entire   prompt   below.      Please   return   to us   any   code   you   write   and   any   files   that   your   code   outputs.      If   you   write   any   code,   Python   is   the preferred   language   but   you   may   use   any   language   you   like.      Feel   free   to   use   any   tools,   data sources,   or   references   but   do   document   anything   you   use.      This   exercise   is   representative   of the   kind   of   work   you   will   be   doing   here   at   Acorns.      Please   return   your   response   to   this   exercise within   72   hours   of   receiving   it.

**Prompt:**    What   are   all   the   items   that   Macy's   sells   online?   Please   make   a   list   of   items   you   would be   able   to   buy   at   www.macys.com.   Explain   how   your   approach   could   be   generalized   to   other merchants.   How   would   you   assess   how   accurate   you   are?
Please   submit   a   Jupyter   notebook   and   include   any   third   party   data   you   have   used   with references   to   substantiate   your   methodology.
Preferences   are   given   to   the   simplest   answers   based   on   sound   principles   that   is   most   scalable in   its   implementation.

**Two different approaches:** 
    1. Build a webscraper with a library like Beautiful Soup 4. 
    2. Directly access the application programming interface (API). This would be the most accurate and fastest to implement, assuming API access. 

## Macys API Available Methods 
This is a subset of availabe methods
 - **Category Index v4** This service provides a list of all the active categories (or whichever categories are specified) in a hierarchical tree that can be navigated in either direction.
 - **Category Brand Index v4** This API allows users to specify Category(s) and retrieve the complete list of Brands for that Category(s).  This is called the Brand Index.
 - **Product Detail (Using Product ID) v4** 
	The V4 Product service can retrieve highly customizable detailed descriptions of a particular product by using Product IDs. The URL and response have changed greatly from V3 to V4 in that it follows a nested object logical pattern.

The following will download the catalog 
curl -X GET -H "x-macys-webservice-client-id: xxxxxxxxxxx" -H "Accept: application/json" "http://api.macys.com/v3/catalog/category/index"

In [27]:
import pandas as pd
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
import os
# Loads private Macys API key 
api_key=os.environ["MACYS_API_KEY"]

In [28]:
my_url ='https://www.newegg.com/Desktop-Graphics-Cards/SubCategory/ID-48?Tid=7709'

In [29]:
# Open connection, grab page 
uClient = uReq(my_url)

In [30]:
# Save code to variable and close the client 
page_html = uClient.read()
uClient.close()

In [31]:
# Call soup function, html parsing 
page_soup = soup(page_html, 'html.parser')

In [32]:
# Call header 
page_soup.h1

<h1 class="page-title-text">Desktop Graphics Cards</h1>

In [33]:
# Call p tag
page_soup.p

<p>Newegg.com - A great place to buy computers, computer parts, electronics, software, accessories, and DVDs online. With great prices, fast shipping, and top-rated customer service - once you know, you Newegg.</p>

In [34]:
page_soup.body.span

<span class="noCSS">Skip to:</span>

In [41]:
# Find all item-container div items, grabs each product
containers = page_soup('div', {'class':'item-container'})

In [42]:
len(containers)

37

In [43]:
# Copy to jsbeautifer. 
# The 'title' contains the brand name
# All of the items seem to have product name and brand
containers[0]

<div class="item-container is-feature-item ">
<!--product image-->
<a class="item-img" href="https://www.newegg.com/Product/Product.aspx?Item=N82E16814126186">
<img alt="ASUS ROG GeForce GTX 1080 Ti DirectX 12 STRIX-GTX1080TI-O11G-GAMING 11GB 352-Bit GDDR5X PCI Express 3.0 HDCP Ready Video Card" src="//images10.newegg.com/NeweggImage/ProductImageCompressAll300/14-126-186-V01.jpg" title="ASUS ROG GeForce GTX 1080 Ti DirectX 12 STRIX-GTX1080TI-O11G-GAMING 11GB 352-Bit GDDR5X PCI Express 3.0 HDCP Ready Video Card"/>
</a>
<div class="item-info">
<!--brand info-->
<div class="item-branding">
<a class="item-brand" href="https://www.newegg.com/ASUS/BrandStore/ID-1315">
<img alt="ASUS" src="//images10.newegg.com/brandimage//Brand1315.gif" title="ASUS"/>
</a>
<!--rating info-->
<a class="item-rating" href="https://www.newegg.com/Product/Product.aspx?Item=N82E16814126186&amp;SortField=0&amp;SummaryType=0&amp;PageSize=10&amp;SelectedRating=-1&amp;VideoOnlyMark=False&amp;ignorebbr=1&amp;IsFeedback

In [44]:
container = containers[0]

In [48]:
# .a grabs 'item-img' not very useful, div grabs 
container.div.div

<div class="item-branding">
<a class="item-brand" href="https://www.newegg.com/ASUS/BrandStore/ID-1315">
<img alt="ASUS" src="//images10.newegg.com/brandimage//Brand1315.gif" title="ASUS"/>
</a>
<!--rating info-->
<a class="item-rating" href="https://www.newegg.com/Product/Product.aspx?Item=N82E16814126186&amp;SortField=0&amp;SummaryType=0&amp;PageSize=10&amp;SelectedRating=-1&amp;VideoOnlyMark=False&amp;ignorebbr=1&amp;IsFeedbackTab=true#scrollFullInfo" title="Rating + 5"><i class="rating rating-5"></i><span class="item-rating-num">(128)</span></a>
</div>

In [49]:
container.div.div.a.img['title']

'ASUS'

In [57]:
container.findAll('li', {'class': 'price-ship'})[0].text.strip()


'$4.99 Shipping'

In [62]:
filename = 'products.csv'
headers = 'brand, product_name, shipping'
f = open(filename, 'w')
f.write(headers)

29

In [64]:
for container in containers:
    brand = container.div.div.a.img['title']
    title_container = container.findAll('a', {'class': 'item-title'})
    product_name = title_container[0].text
    shipping_container = container.findAll('li', {'class': 'price-ship'})[0].text.strip()
    
    print("brand: " + brand)
    print("product_name: " + product_name)
    print("shipping_container: " + shipping_container)
    f.write(brand + ',' + product_name.replace(',', '|') + ',' + shipping_container + "\n")
    
f.close()    
    

brand: ASUS
product_name: ASUS ROG GeForce GTX 1080 Ti DirectX 12 STRIX-GTX1080TI-O11G-GAMING 11GB 352-Bit GDDR5X PCI Express 3.0 HDCP Ready Video Card
shipping_container: $4.99 Shipping
brand: GIGABYTE
product_name: GIGABYTE GeForce GTX 1060 DirectX 12 GV-N1060G1 GAMING-6GD REV 2.0 6GB 192-Bit GDDR5 PCI Express 3.0 x16 ATX Video Card
shipping_container: $4.99 Shipping
brand: ASUS
product_name: ASUS GeForce GTX 1070 Ti TURBO-GTX1070TI-8G 8GB 256-Bit GDDR5 PCI Express 3.0 HDCP Ready SLI Support Video Card
shipping_container: $4.99 Shipping
brand: Sapphire Tech
product_name: Sapphire Radeon NITRO+ RX 580 4GB GDDR5 PCI-E Dual HDMI / DVI-D / Dual DP w/ Backplate (UEFI)
shipping_container: Free Shipping
brand: EVGA
product_name: EVGA GeForce GTX 1080 Ti SC2 GAMING, 11G-P4-6593-KR, 11GB GDDR5X, iCX Technology - 9 Thermal Sensors & RGB LED G/P/M
shipping_container: Free Shipping
brand: ASUS
product_name: ASUS GeForce GTX 1080 TURBO-GTX1080-8G 8GB 256-Bit GDDR5X PCI Express 3.0 HDCP Ready SLI 