# Scraping Microcenter for Graphic Card Availability

TODO:
- Introduction about web scraping
- Introduction to microcenter
- Explain tools used(Python, requests, Beautiful Soup, Pandas)



Here are the outline we'll follow:
- Setup jupyter notebook
- I'm going to scrape https://www.microcenter.com/ for graphic card data which includes name, stock status, and card URL.
- Use functions to take the data and create a dictionary for each url. Combine dictionaries into one dataframe
- Execute code using predetermined urls
- Create csv file from the dataframe
- List references and next steps

### Setup jupyter notebook
Here we will setup the jupyter notebook so the notebook can be run anywhere. We will install and import the required libraries. The jovian library is used for uploading to www.jovian.ai

In [1]:
!pip install jovian --upgrade --quiet

In [2]:
import jovian

In [22]:
# Execute this to save new versions of the notebook.
# Not required to run the notebook. Used for saving.
jovian.commit(project="scraping-microcenter-video-cards")

<IPython.core.display.Javascript object>

[jovian] Attempting to save notebook..
[jovian] Updating notebook "ethan-morrison/scraping-microcenter-video-cards" on https://jovian.ai/
[jovian] Uploading notebook..
[jovian] Capturing environment..
[jovian] Committed successfully! https://jovian.ai/ethan-morrison/scraping-microcenter-video-cards


'https://jovian.ai/ethan-morrison/scraping-microcenter-video-cards'

In [27]:
jovian.submit(assignment="zerotoanalyst-project1")

<IPython.core.display.Javascript object>

[jovian] Attempting to save notebook..
[jovian] Updating notebook "ethan-morrison/scraping-microcenter-video-cards" on https://jovian.ai/
[jovian] Uploading notebook..
[jovian] Capturing environment..
[jovian] Committed successfully! https://jovian.ai/ethan-morrison/scraping-microcenter-video-cards
[jovian] Submitting assignment..
[jovian] Verify your submission at https://jovian.ai/learn/zero-to-data-analyst-bootcamp/assignment/project-1-web-scraping-with-python


In [4]:
!pip install pandas --quiet

In [5]:
import pandas as pd

In [6]:
!pip install requests --upgrade --quiet

In [7]:
import requests

In [8]:
!pip install beautifulsoup4 --upgrade --quiet

In [9]:
from bs4 import BeautifulSoup 

### Scraping Micro Center
Micro Center is an electronics retailer here in the United States. They have many physical stores where you can purchase computer parts and electronic without waiting around for items to ship.

We need to define a dataframe. This notebook is designed to pull multiple pages of video cards. Each page is appended to the cvs file. The dataframe is created outside the other code as to not be overwritten with each url.

In [10]:
# DF created with the columns we will later fill.
cards_df = pd.DataFrame(columns = ['Video_Card', 'In_Stock_Status', 'Card_Url'])

Now we will defined a function using the libraries requests and Beautiful Soup.

Requests will download our web page html code and write it to a text file. Beautiful Soup is used to pull data out of the html code text file.

In [11]:
def get_card_page(page_url):
    #download the page
    response = requests.get(page_url)
    #check for successful response
    if response.status_code != 200:
        raise Exception('Failed to load page {}'.format(page_url))
    # Parse using Beautiful soup
    doc = BeautifulSoup(response.text, 'html.parser')
    return doc

Now that we have the variable 'doc' containing html code we will pull the required data with more defined functions.

First the card names and URL address. In the html code the card name is coded in the 'a' tag.  

In [12]:
def get_title(doc):
    # Get the a tags containing video card title 
    selection_data = 'Video Cards'
    title_tags = doc.find_all('a', {'data-category': selection_data})
    # This method scrapes duplicate video card entry so we remove duplicate
    title_tags = [v for i, v in enumerate(title_tags) if i % 2 == 1]
    video_cards = []
    for tag in title_tags:
        video_cards.append(tag.text)
    return video_cards 


Second we will pull url information. The url is found in the 'href' tag.

In [13]:
def get_url(doc):
    selection_data = 'Video Cards'
    title_tags = doc.find_all('a', {'data-category': selection_data})
    # This method scrapes duplicate video card entry so we remove duplicate
    title_tags = [v for i, v in enumerate(title_tags) if i % 2 == 1]
    # Get URL and add to end of base url
    base_urls = 'https//www.microcenter.com'
    video_card_urls = []
    for tag in title_tags:
        video_card_urls.append(base_urls + tag['href'])
    return video_card_urls

Third we pull the stock status. This code brings the store location as well as the in stock amount if listed on the website. The stock status was found in the 'div' tag. 

In [14]:
def get_stock(doc): 
    # Get stock tags
    selection_data2 = 'stock'
    status = []
    stock_tags = doc.find_all('div', {'class': selection_data2})
    for tag in stock_tags:
        status.append(tag.get_text("|", strip=True))
    return status

### Multiple URL to Dataframe
With the relevant information in variables we will combine everything in a dataframe from dictionaries created from multiple urls.

In [15]:
def get_availability():
    
    doc = get_card_page(page_url)
    cards_dict = {
        'Video_Card': get_title(doc),
        'In_Stock_Status': get_stock(doc),
        'Card_Url': get_url(doc)
    }
    cards_df = pd.DataFrame(cards_dict)
    return cards_df

### Execute Code
We will use the created get_availability to complete the scraping process.

Two custom urls are used for this notebook. They were predetermined from microcenter.com. They are set to use the store located in St. Louis Park, Minnesota USA. Also they are set to display 96 items per page. 

In [16]:
#page 1 from microcenter
page_url = 'https://www.microcenter.com/search/search_results.aspx?Ntk=all&sortby=match&N=4294966937&myStore=false&rpp=96'

We call get_availability with df.append to add each page to the dataframe. 

In [17]:
cards_df = cards_df.append(get_availability())

In [18]:
#page 2 from microcenter
page_url = 'https://www.microcenter.com/search/search_results.aspx?Ntk=all&sortby=match&N=4294966937&myStore=false&rpp=96&page=2'

In [19]:
cards_df = cards_df.append(get_availability())

Final dataframe has now been created. 

Uncomment and run the following code if you need to clear the dataframe. 

In [20]:
# cards_df = pd.DataFrame(columns = ['Video_Card', 'In_Stock_Status', 'Card_Url'])

### Create CSV
Creating a CSV from a dataframe is a simple task. The file Video_Card_Availability.cvs will be created on your computer. The file contains all video cards on microcenter.com with their availability and url. 

In [21]:
cards_df.to_csv('Video_Card_Availability.cvs', index=None)

### References and Next Step Ideas
This web scraping project was a guided project. The references used were "Zero to Data Analyst" by Jovian(https://www.jovian.ai/data-analyst-bootcamp)and the youtube video "Let's Build a Python Web Scraping Project from Scratch | Hands-On Tutorial" by Jovian(https://www.youtube.com/watch?v=RKsLLG-bzEY). The documentation for each python library were also referenced. 

Future changes to the code could include some sort of loop to run both url links with one call to the get_availability function. Another change would be to find a way to create the dataframe and append to it without running separate code. 