In [1]:
import requests
import pandas as pd

In this notebook, you'll see some other ways to interact with web pages to scrape data. Specifically, you'll see how to use your browser's developer tools to determine how to scrape certain types of data.

This directions below are for the Firefox Web Browser, but this process can work with other browsers. You'll just have to open the developer tools and find the correct tabs within.

Our goal will be to scrape data on Pokemon card prices from this page: https://www.pricecharting.com/console/pokemon-base-set

First, navigate in your browser to this url so you can see what you can expect to get back from a request. We can try and retrieve the results using a regular request to the url.

In [6]:
URL = 'https://www.pricecharting.com/console/pokemon-base-set'

response = requests.get(URL)

Reading in the results into a DataFrame shows a portion of the main table.

In [9]:
pd.read_html(str(response.text))[0]

Unnamed: 0,Card,Ungraded,Grade 9,PSA 10,Unnamed: 4,Unnamed: 5,Unnamed: 6
0,Charizard [1st Edition] #4,"$2,570.00","$24,300.00","$137,350.60",+ Collection In One Click + Collection With...,,
1,Charizard #4,$187.41,"$1,213.00","$13,000.00",+ Collection In One Click + Collection With...,,
2,Machamp [1st Edition] #8,$6.50,$166.76,"$1,399.99",+ Collection In One Click + Collection With...,,
3,Blastoise #2,$56.75,$466.24,"$4,024.50",+ Collection In One Click + Collection With...,,
4,Mewtwo #10,$8.02,$172.60,$895.02,+ Collection In One Click + Collection With...,,
5,Charizard [Shadowless] #4,$473.76,"$5,200.27","$30,100.00",+ Collection In One Click + Collection With...,,
6,Blastoise [1st Edition] #2,$740.96,"$5,346.02","$31,334.00",+ Collection In One Click + Collection With...,,
7,Pikachu [1st Edition] #58,$54.64,$360.83,"$4,315.00",+ Collection In One Click + Collection With...,,
8,Mewtwo [1st Edition] #10,$332.00,"$2,476.63","$14,643.48",+ Collection In One Click + Collection With...,,
9,Pikachu [1st Edition Red Cheeks] #58,$188.13,$993.50,"$6,822.33",+ Collection In One Click + Collection With...,,


However, if you compare the results from the response to what is on the webpage, you'll see that we are missing a large number of rows from our table.

Let's see if we can figure out what is going on. To determine what we need to do, go under More Tools -> Web Developer Tools and then click on the Network tab. Refresh the page.

Now scroll down on the web page through the table and watch how as you scroll, more requests pop up.

Click on the first new one that has popped up. In the righthand panel, click on the Response tab and you'll see that it has a JSON response.

JSON stands for **J**ava**S**cript **O**bject **N**otation. It uses human-readable text in a form that looks like a Python dictionary or list of dictionaries (or dictionaries of lists containing dictionaries, etc.) and is a very popular way to serialize data, especially in web development.

An example of what JSON can look like is below.

```
{
    "firstName": "Jane",
    "lastName": "Doe",
    "hobbies": ["running", "sky diving", "singing"],
    "age": 35,
    "children": [
        {
            "firstName": "Alice",
            "age": 6
        },
        {
            "firstName": "Bob",
            "age": 8
        }
    ]
}
```

Now, let's see what happens if we do a get request on this url.

In [10]:
URL = 'https://www.pricecharting.com/console/pokemon-base-set?sort=popularity&cursor=50&format=json'

In [11]:
response = requests.get(URL)

Since this request returns a json object, we'll access the results using the `.json()` method.

In [12]:
response.json()

{'cursor': '100',
 'products': [{'consoleUri': 'pokemon-base-set',
   'hasProduct': False,
   'id': '630422',
   'price1': '$4.43',
   'price2': '$610.28',
   'price3': '$96.00',
   'priceChange': 0,
   'priceChangePercentage': '0.0',
   'priceChangeSign': '',
   'productName': 'Magneton #9',
   'productUri': 'magneton-9',
   'showCollectionLinks': True,
   'wishlistHasProduct': False},
  {'consoleUri': 'pokemon-base-set',
   'hasProduct': False,
   'id': '630456',
   'price1': '$1.25',
   'price2': '$71.00',
   'price3': '$33.97',
   'priceChange': 11,
   'priceChangePercentage': '9.6',
   'priceChangeSign': '+',
   'productName': 'Abra #43',
   'productUri': 'abra-43',
   'showCollectionLinks': True,
   'wishlistHasProduct': False},
  {'consoleUri': 'pokemon-base-set',
   'hasProduct': False,
   'id': '2021481',
   'price1': '$17.17',
   'price2': '$561.73',
   'price3': '$180.00',
   'priceChange': 608,
   'priceChangePercentage': '26.2',
   'priceChangeSign': '-',
   'productName':

This has two pieces:
* A `cursor` which is useful for paging through results. These requests return only 50 rows at a time, so this indicates when there are more results available.
* The `products`, a list of dictionaries containing the content of the table.

We can easily convert the results to a DataFrame.

In [13]:
pd.DataFrame(response.json()['products'])

Unnamed: 0,consoleUri,hasProduct,id,price1,price2,price3,priceChange,priceChangePercentage,priceChangeSign,productName,productUri,showCollectionLinks,wishlistHasProduct
0,pokemon-base-set,False,630422,$4.43,$610.28,$96.00,0,0.0,,Magneton #9,magneton-9,True,False
1,pokemon-base-set,False,630456,$1.25,$71.00,$33.97,11,9.6,+,Abra #43,abra-43,True,False
2,pokemon-base-set,False,2021481,$17.17,$561.73,$180.00,608,26.2,-,Pikachu [Shadowless Red Cheeks] #58,pikachu-shadowless-red-cheeks-58,True,False
3,pokemon-base-set,False,715631,$44.50,"$1,175.46",$275.00,1050,30.9,+,Wartortle [1st Edition] #42,wartortle-1st-edition-42,True,False
4,pokemon-base-set,False,715623,$19.98,$500.00,$103.15,242,13.8,+,Machoke [1st Edition] #34,machoke-1st-edition-34,True,False
5,pokemon-base-set,False,715612,$35.41,$744.28,$206.06,541,18.0,+,Arcanine [1st Edition] #23,arcanine-1st-edition-23,True,False
6,pokemon-base-set,False,2336244,$218.63,,,71,0.3,+,Pikachu [PokeTour 1999] #58,pikachu-poketour-1999-58,True,False
7,pokemon-base-set,False,715637,$15.50,$283.93,$100.19,50,3.3,+,Abra [1st Edition] #43,abra-1st-edition-43,True,False
8,pokemon-base-set,False,715641,$12.50,$299.00,$79.50,300,19.4,-,Machop [1st Edition] #52,machop-1st-edition-52,True,False
9,pokemon-base-set,False,715609,$35.25,"$1,225.00",$192.50,0,0.0,,Electabuzz [1st Edition] #20,electabuzz-1st-edition-20,True,False


To pull in all results, we can make use of a while loop to page through the results.

In [15]:
cursor = '0'        # Start at the beginning of the results set.
products = []

while cursor:        # If the previous iteration had a cursor result, continue. Otherwise, you have retrieved all of the pages.
    URL = 'https://www.pricecharting.com/console/pokemon-base-set?sort=popularity&cursor={}&format=json'.format(cursor)
    response = requests.get(URL)
    products.extend(response.json()['products'])      # Add the current results to our existing results set
    
    # If the response has a cursor, use it to fetch the next set of results. Otherwise, we are at the end of the results.
    cursor = response.json().get('cursor')         

In [16]:
pd.DataFrame(products)

Unnamed: 0,consoleUri,hasProduct,id,price1,price2,price3,priceChange,priceChangePercentage,priceChangeSign,productName,productUri,showCollectionLinks,wishlistHasProduct
0,pokemon-base-set,False,715593,"$2,570.00","$137,350.60","$24,300.00",140351,35.3,-,Charizard [1st Edition] #4,charizard-1st-edition-4,True,False
1,pokemon-base-set,False,630417,$187.41,"$13,000.00","$1,213.00",4586,32.4,+,Charizard #4,charizard-4,True,False
2,pokemon-base-set,False,715597,$6.50,"$1,399.99",$166.76,228,26.0,-,Machamp [1st Edition] #8,machamp-1st-edition-8,True,False
3,pokemon-base-set,False,630415,$56.75,"$4,024.50",$466.24,105,1.9,+,Blastoise #2,blastoise-2,True,False
4,pokemon-base-set,False,630423,$8.02,$895.02,$172.60,148,15.6,-,Mewtwo #10,mewtwo-10,True,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...
302,pokemon-base-set,False,2386883,,,,0,0.0,,Staryu [Trainer Deck B] #65,staryu-trainer-deck-b-65,True,False
303,pokemon-base-set,False,2386895,,,,0,0.0,,Super Potion [Trainer Deck A] #90,super-potion-trainer-deck-a-90,True,False
304,pokemon-base-set,False,2386897,,,,0,0.0,,Switch [Trainer Deck A] #95,switch-trainer-deck-a-95,True,False
305,pokemon-base-set,False,2365262,,$382.00,,0,0.0,,Wartortle [Trainer Deck B] #42,wartortle-trainer-deck-b-42,True,False
