## Load JSON data

In [None]:
import matplotlib.pyplot as plt
from urllib.request import urlretrieve

dhs_url = "https://assets.datacamp.com/production/repositories/4412/datasets/29672336a81043615a204add894bb2bc1b289b16/dhs_counts.json"
urlretrieve(dhs_url, "dhs_daily_report.json")

# Print JSON file to console
with open("dhs_daily_report.json") as f:
  text = f.read()
  print(text)

In [None]:
# Load pandas as pd
import pandas as pd

# Load the daily report to a data frame
pop_in_shelters = pd.read_json("dhs_daily_report.json")

# View summary stats about pop_in_shelters
print(pop_in_shelters.describe())

<p>Many open data portals make available JSONs datasets that are particularly easy to parse. They can be accessed directly via URL. Each object is a record, all objects have the same set of attributes, and none of the values are nested objects that themselves need to be parsed.</p>
<p>The New York City Department of Homeless Services Daily Report is such a dataset, containing years' worth of homeless shelter population counts. You can view it in the console before loading it to a data frame with <code>pandas</code>'s <code>read_json()</code> function.</p>

<ul>
<li>Get a sense of the contents of <code>dhs_daily_report.json</code>, which are printed in the console.</li>
<li>Load <code>pandas</code> as <code>pd</code>.</li>
<li>Use <code>read_json()</code> to load <code>dhs_daily_report.json</code> to a data frame, <code>pop_in_shelters</code>.</li>
<li>View summary statistics about <code>pop_in_shelters</code> with the data frame's <code>describe()</code> method.</li>
</ul>

<ul>
<li>Make sure the JSON file was correctly passed to <code>read_json()</code>, with no typos or keyword arguments.</li>
<li><code>describe()</code> is a DataFrame method, not a <code>pandas</code> function -- be sure to use dot notation with the data frame name.</li>
</ul>

## Work with JSON orientations

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from urllib.request import urlretrieve

dhs_split_url = "https://assets.datacamp.com/production/repositories/4412/datasets/d58e20d4eff661e6c01d89a105c5e01a137c6b6d/dhs_counts_split.json"
urlretrieve(dhs_split_url, 'dhs_report_reformatted.json')

<p>JSON isn't a tabular format, so <code>pandas</code> makes assumptions about its orientation when loading data. Most JSON data you encounter will be in orientations that <code>pandas</code> can automatically transform into a data frame.</p>
<p>Sometimes, like in this modified version of the Department of Homeless Services Daily Report, data is oriented differently. To reduce the file size, it has been <code>split</code> formatted. You'll see what happens when you try to load it normally versus with the <code>orient</code> keyword argument. The <code>try/except</code> block will alert you if there are errors loading the data.</p>
<p><code>pandas</code> has been loaded as <code>pd</code>.</p>

## Get data from an API

In [None]:
import pandas as pd
#import requests
from urllib.request import urlretrieve
url = "https://assets.datacamp.com/production/repositories/4412/datasets/7b4d21edba07766a12341ed66b99d25670ad432c/yelp_api_key.txt"
urlretrieve(url, "yelp_api_key.txt")

with open("yelp_api_key.txt", "r") as f:
  api_key = f.readlines()[0]

headers = {"Authorization": "Bearer {}".format(api_key)}
params = {"term": "cafe",
       	  "location": "NYC"}

############################################################
# Set up MockRequests
#===========================================================
import urllib.request
from mock_request.requests import MockRequests

# First, download the dat files relevant containing dictionaries
exp_path = "https://assets.datacamp.com/production/repositories/4412/datasets/6960241321d7d91fbfe16f74d3216b58074427f9/nyc_cafes.pkl"
avr_path = "https://assets.datacamp.com/production/repositories/4412/datasets/67a5ee3539b2916df0e6ebc63216c6f396693137/available_requests.dat"
err_path = "https://assets.datacamp.com/production/repositories/4412/datasets/cfc89bc158082f77e30cfa230890bc60b2e68df5/404_error.pkl"
erp_path = "https://assets.datacamp.com/production/repositories/4412/datasets/aaf775650bf3386e156f27c8b0198fcc94b4d9f4/request_errors.dat"

# Download the file from URL path and save it locally
urllib.request.urlretrieve(exp_path, "nyc_cafes.pkl")
urllib.request.urlretrieve(avr_path, "available_requests.dat")
urllib.request.urlretrieve(err_path, "404_error.pkl")
urllib.request.urlretrieve(erp_path, "request_errors.dat")

# Then, instantiate the MockRequests object and save it as requests
requests = MockRequests("available_requests.dat", "request_errors.dat")

# Use requests.get as normal
############################################################

In [None]:
api_url = "https://api.yelp.com/v3/businesses/search"

# Get data about NYC cafes from the Yelp API
response = requests.get(api_url, 
                        headers=headers, 
                        params=params)

# Extract JSON data from the response
data = response.json()

# Load data to a data frame
cafes = pd.DataFrame(data["businesses"])

# View the data's dtypes
print(cafes.dtypes)

<p>In this exercise, you'll use <code>requests.get()</code> to query the Yelp Business Search API for cafes in New York City. <code>requests.get()</code> needs a URL to get data from. The Yelp API also needs search parameters and authorization headers passed to the <code>params</code> and <code>headers</code> keyword arguments, respectively.</p>
<p>You'll need to extract the data from the response with its <code>json()</code> method, and pass it to <code>pandas</code>'s <code>DataFrame()</code> function to make a data frame. Note that the necessary data is under the dictionary key <code>"businesses"</code>.</p>
<p><code>pandas</code> (as <code>pd</code>) and <code>requests</code> have been loaded. Authorization data is in the dictionary <code>headers</code>, and the needed API parameters are stored as <code>params</code>.</p>

<ul>
<li>Get data about New York City cafes from the Yelp API (<code>api_url</code>) with <code>requests.get()</code>. The necessary <code>params</code> and <code>headers</code> information has been provided.</li>
<li>Extract the JSON data from the response with its <code>json()</code> method, and assign it to <code>data</code>.</li>
<li>Load the cafe listings to the data frame <code>cafes</code> with <code>pandas</code>'s <code>DataFrame()</code> function. The listings are under the <code>"businesses"</code> key in <code>data</code>.</li>
<li>Print the data frame's <code>dtypes</code> to see what information you're getting.</li>
</ul>

<ul>
<li><code>json()</code> doesn't take any additional arguments.</li>
<li>Remember to preface <code>DataFrame()</code> with its library alias, just as you would with a function like <code>read_json()</code>.</li>
<li><code>DataFrame()</code> takes the data to load as an argument. Check that you're providing just the business data from the response.</li>
<li>The syntax to get a dictionary value by its key is <code>dictionary["key"]</code>.</li>
</ul>

## Set API parameters

In [None]:
import pandas as pd
#import requests
from urllib.request import urlretrieve
url = "https://assets.datacamp.com/production/repositories/4412/datasets/7b4d21edba07766a12341ed66b99d25670ad432c/yelp_api_key.txt"
urlretrieve(url, "yelp_api_key.txt")

with open("yelp_api_key.txt", "r") as f:
  api_key = f.readlines()[0]

api_url = "https://api.yelp.com/v3/businesses/search"
headers = {"Authorization": "Bearer {}".format(api_key)}


############################################################
# Set up MockRequests
#===========================================================
import urllib.request
from mock_request.requests import MockRequests

# First, download the dat files relevant containing dictionaries
exp_path = "https://assets.datacamp.com/production/repositories/4412/datasets/6960241321d7d91fbfe16f74d3216b58074427f9/nyc_cafes.pkl"
avr_path = "https://assets.datacamp.com/production/repositories/4412/datasets/67a5ee3539b2916df0e6ebc63216c6f396693137/available_requests.dat"
err_400_path = "https://assets.datacamp.com/production/repositories/4412/datasets/ff9363ccbabbd036db806f7279c4d2e54f0684a8/400_error.pkl"
err_404_path = "https://assets.datacamp.com/production/repositories/4412/datasets/cfc89bc158082f77e30cfa230890bc60b2e68df5/404_error.pkl"
erp_path = "https://assets.datacamp.com/production/repositories/4412/datasets/aaf775650bf3386e156f27c8b0198fcc94b4d9f4/request_errors.dat"

# Download the file from URL path and save it locally
urllib.request.urlretrieve(exp_path, "nyc_cafes.pkl")
urllib.request.urlretrieve(avr_path, "available_requests.dat")
urllib.request.urlretrieve(err_400_path, "400_error.pkl")
urllib.request.urlretrieve(err_404_path, "404_error.pkl")
urllib.request.urlretrieve(erp_path, "request_errors.dat")

# Then, instantiate the MockRequests object and save it as requests
requests = MockRequests("available_requests.dat", "request_errors.dat")

# Use requests.get as normal
############################################################

In [None]:
# Create dictionary to query API for cafes in NYC
parameters = {"term": "cafe",
          	  "location": "NYC"}

# Query the Yelp API with headers and params set
response = requests.get(api_url, 
                        headers=headers, 
                        params=parameters)

# Extract JSON data from response
data = response.json()

# Load "businesses" values to a data frame and print head
cafes = pd.DataFrame(data["businesses"])
print(cafes.head())

<p>Formatting parameters to get the data you need is an integral part of working with APIs. These parameters can be passed to the <code>get()</code> function's <code>params</code> keyword argument as a dictionary.</p>
<p>The Yelp API requires the <code>location</code> parameter be set. It also lets users supply a <code>term</code> to search for. You'll use these parameters to get data about cafes in NYC, then process the result to create a data frame.</p>
<p><code>pandas</code> (as <code>pd</code>) and <code>requests</code> have been loaded. The API endpoint is stored in the variable <code>api_url</code>. Authorization data is stored in the dictionary <code>headers</code>.</p>

<ul>
<li>Create a dictionary, <code>parameters</code>, with the <code>term</code> and <code>location</code> parameters set to search for <code>"cafe"</code>s in <code>"NYC"</code>.</li>
<li>Query the Yelp API (<code>api_url</code>) with <code>requests</code>'s <code>get()</code> function and the <code>headers</code> and <code>params</code> keyword arguments set. Save the result as <code>response</code>.</li>
<li>Extract the JSON data from <code>response</code> with the appropriate method. Save the result as <code>data</code>.</li>
<li>Load the <code>"businesses"</code> values in <code>data</code> to the data frame <code>cafes</code> and print the head.</li>
</ul>

<ul>
<li>The method to extract JSON data from a response is <code>json()</code>.</li>
<li>Remember that only <code>data["businesses"]</code> should be loaded to the data frame, not all of <code>data</code>.</li>
</ul>

## Set request headers

In [None]:
import pandas as pd
from urllib.request import urlretrieve
url = "https://assets.datacamp.com/production/repositories/4412/datasets/7b4d21edba07766a12341ed66b99d25670ad432c/yelp_api_key.txt"
urlretrieve(url, "yelp_api_key.txt")

with open("yelp_api_key.txt", "r") as f:
  api_key = f.readlines()[0]

api_url = "https://api.yelp.com/v3/businesses/search"
params = {"term": "cafe",
          "location": "NYC",
          "sort_by": "rating"}

############################################################
# Set up MockRequests
#===========================================================
import urllib.request
from mock_request.requests import MockRequests

# First, download the dat files relevant containing dictionaries
exp_path = "https://assets.datacamp.com/production/repositories/4412/datasets/ef0fd6cedb81a1c6fa83240ab9337311d31150cc/top_nyc_cafes.pkl"
avr_path = "https://assets.datacamp.com/production/repositories/4412/datasets/67a5ee3539b2916df0e6ebc63216c6f396693137/available_requests.dat"
err_path = "https://assets.datacamp.com/production/repositories/4412/datasets/cfc89bc158082f77e30cfa230890bc60b2e68df5/404_error.pkl"
erp_path = "https://assets.datacamp.com/production/repositories/4412/datasets/aaf775650bf3386e156f27c8b0198fcc94b4d9f4/request_errors.dat"

# Download the file from URL path and save it locally
urllib.request.urlretrieve(exp_path, "top_nyc_cafes.pkl")
urllib.request.urlretrieve(avr_path, "available_requests.dat")
urllib.request.urlretrieve(err_path, "404_error.pkl")
urllib.request.urlretrieve(erp_path, "request_errors.dat")

# Then, instantiate the MockRequests object and save it as requests
requests = MockRequests("available_requests.dat", "request_errors.dat")

# Use requests.get as normal
############################################################

In [None]:
# Create dictionary that passes Authorization and key string
headers = {"Authorization": "Bearer {}".format(api_key)}

# Query the Yelp API with headers and params set
response = requests.get(api_url, 
                        headers=headers, 
                        params=params)

# Extract JSON data from response
data = response.json()

# Load "businesses" values to a data frame and print names
cafes = pd.DataFrame(data["businesses"])
print(cafes.name)

<p>Many APIs require users provide an API key, obtained by registering for the service. Keys typically are passed in the request header, rather than as parameters.</p>
<p>The <a href="https://www.yelp.com/developers/documentation/v3/authentication">Yelp API documentation</a> says "To authenticate API calls with the API Key, set the <code>Authorization</code> HTTP header value as <code>Bearer API_KEY</code>." </p>
<p>You'll set up a dictionary to pass this information to <code>get()</code>, call the API for the highest-rated cafes in NYC, and parse the response.</p>
<p><code>pandas</code> (as <code>pd</code>) and <code>requests</code> have been loaded. The API endpoint is stored as <code>api_url</code>, and the key is <code>api_key</code>. Parameters are in the dictionary <code>params</code>.</p>

<ul>
<li>Create a dictionary, <code>headers</code>, that passes the formatted key string to the <code>"Authorization"</code> header value.</li>
<li>Query the Yelp API (<code>api_url</code>) with <code>get()</code> and the necessary headers and parameters. Save the result as <code>response</code>.</li>
<li>Extract the JSON data from <code>response</code>. Save the result as <code>data</code>.</li>
<li>Load the <code>"businesses"</code> values in <code>data</code> to the data frame <code>cafes</code> and print the <code>names</code> column.</li>
</ul>

<ul>
<li>The string <code>format()</code> method is used to embed variable values, like <code>api_key</code>, in a string.</li>
<li><code>response.get()</code> should have <code>headers</code> and <code>params</code> arguments specified.</li>
<li>A response's <code>json()</code> method will return just the response data.</li>
</ul>

## Flatten nested JSONs

In [None]:
import pandas as pd
from urllib.request import urlretrieve
url = "https://assets.datacamp.com/production/repositories/4412/datasets/7b4d21edba07766a12341ed66b99d25670ad432c/yelp_api_key.txt"
urlretrieve(url, "yelp_api_key.txt")

with open("yelp_api_key.txt", "r") as f:
  api_key = f.readlines()[0]

api_url = "https://api.yelp.com/v3/businesses/search"
headers = {"Authorization": "Bearer {}".format(api_key)}
parameters = {"term": "cafe",
          	  "location": "NYC"}

############################################################
# Set up MockRequests
#===========================================================
import urllib.request
from mock_request.requests import MockRequests

# First, download the dat files relevant containing dictionaries
exp_path = "https://assets.datacamp.com/production/repositories/4412/datasets/6960241321d7d91fbfe16f74d3216b58074427f9/nyc_cafes.pkl"
avr_path = "https://assets.datacamp.com/production/repositories/4412/datasets/67a5ee3539b2916df0e6ebc63216c6f396693137/available_requests.dat"
err_path = "https://assets.datacamp.com/production/repositories/4412/datasets/cfc89bc158082f77e30cfa230890bc60b2e68df5/404_error.pkl"
erp_path = "https://assets.datacamp.com/production/repositories/4412/datasets/aaf775650bf3386e156f27c8b0198fcc94b4d9f4/request_errors.dat"

# Download the file from URL path and save it locally
urllib.request.urlretrieve(exp_path, "nyc_cafes.pkl")
urllib.request.urlretrieve(avr_path, "available_requests.dat")
urllib.request.urlretrieve(err_path, "404_error.pkl")
urllib.request.urlretrieve(erp_path, "request_errors.dat")

# Then, instantiate the MockRequests object and save it as requests
requests = MockRequests("available_requests.dat", "request_errors.dat")

# Use requests.get as normal
############################################################


response = requests.get(api_url, 
                        headers=headers, 
                        params=parameters)



In [None]:
# Load json_normalize()
from pandas.io.json import json_normalize

# Isolate the JSON data from the API response
data = response.json()

# Flatten business data into a data frame, replace separator
cafes = json_normalize(data["businesses"],
                       sep="_")

# View data
print(cafes.head())

<p>A feature of JSON data is that it can be nested: an attribute's value can consist of attribute-value pairs. This nested data is more useful unpacked, or flattened, into its own data frame columns. The <code>pandas.io.json</code> submodule has a function, <code>json_normalize()</code>, that does exactly this.</p>
<p>The Yelp API response data is nested. Your job is to flatten out the next level of data in the <code>coordinates</code> and <code>location</code> columns.</p>
<p><code>pandas</code> (as <code>pd</code>) and <code>requests</code> have been imported. The results of the API call are stored as <code>response</code>.</p>

<ul>
<li>Load the <code>json_normalize()</code> function from <code>pandas</code>' <code>io.json</code> submodule.</li>
<li>Isolate the JSON data from <code>response</code> and assign it to <code>data</code>.</li>
<li>Use <code>json_normalize()</code> to flatten and load the businesses data to a data frame, <code>cafes</code>. Set the <code>sep</code> argument to use underscores (<code>_</code>), rather than periods.</li>
<li>Print the <code>data</code> head.</li>
</ul>

<ul>
<li>When importing a function from a module, you do not need to include the parentheses in the name.</li>
<li>The method to get JSON data from a <code>response</code> object is <code>json()</code>.</li>
</ul>

## Handle deeply nested data

In [None]:
import pandas as pd
from urllib.request import urlretrieve
from pandas.io.json import json_normalize
url = "https://assets.datacamp.com/production/repositories/4412/datasets/7b4d21edba07766a12341ed66b99d25670ad432c/yelp_api_key.txt"
urlretrieve(url, "yelp_api_key.txt")

with open("yelp_api_key.txt", "r") as f:
  api_key = f.readlines()[0]

# Set variables for API call
api_url = "https://api.yelp.com/v3/businesses/search"
headers = {"Authorization": "Bearer {}".format(api_key)}
parameters = {"term": "cafe",
          	  "location": "NYC"}

############################################################
# Set up MockRequests
#===========================================================
import urllib.request
from mock_request.requests import MockRequests

# First, download the dat files relevant containing dictionaries
exp_path = "https://assets.datacamp.com/production/repositories/4412/datasets/6960241321d7d91fbfe16f74d3216b58074427f9/nyc_cafes.pkl"
avr_path = "https://assets.datacamp.com/production/repositories/4412/datasets/67a5ee3539b2916df0e6ebc63216c6f396693137/available_requests.dat"
err_path = "https://assets.datacamp.com/production/repositories/4412/datasets/cfc89bc158082f77e30cfa230890bc60b2e68df5/404_error.pkl"
erp_path = "https://assets.datacamp.com/production/repositories/4412/datasets/aaf775650bf3386e156f27c8b0198fcc94b4d9f4/request_errors.dat"

# Download the file from URL path and save it locally
urllib.request.urlretrieve(exp_path, "nyc_cafes.pkl")
urllib.request.urlretrieve(avr_path, "available_requests.dat")
urllib.request.urlretrieve(err_path, "404_error.pkl")
urllib.request.urlretrieve(erp_path, "request_errors.dat")

# Then, instantiate the MockRequests object and save it as requests
requests = MockRequests("available_requests.dat", "request_errors.dat")

# Use requests.get as normal
############################################################

# Get API data
response = requests.get(api_url, 
                        headers=headers, 
                        params=parameters)
data = response.json()

<p>Last exercise, you flattened data nested down one level. Here, you'll unpack more deeply nested data.</p>
<p>The <code>categories</code> attribute in the Yelp API response contains lists of objects. To flatten this data, you'll employ <code>json_normalize()</code> arguments to specify the path to <code>categories</code> and pick other attributes to include in the data frame. You should also change the separator to facilitate column selection and prefix the other attributes to prevent column name collisions. We'll work through this in steps.</p>
<p><code>pandas</code> (as <code>pd</code>) and <code>json_normalize()</code> have been imported. JSON-formatted Yelp data on cafes in NYC is stored as <code>data</code>.</p>

## Append data frames

In [None]:
import pandas as pd
from pandas.io.json import json_normalize

from urllib.request import urlretrieve
url = 'https://assets.datacamp.com/production/repositories/4412/datasets/7b4d21edba07766a12341ed66b99d25670ad432c/yelp_api_key.txt'
urlretrieve(url, 'yelp_api_key.txt')


with open('yelp_api_key.txt', 'r') as f:
  api_key = f.readlines()[0]
api_url = 'https://api.yelp.com/v3/businesses/search'
headers = {'Authorization': 'Bearer %s' % api_key}

top_50_cafes = pd.read_json('https://assets.datacamp.com/production/repositories/4412/datasets/7b31d59aecf48bff27dd1743f6de83147fc3f23b/top_50_cafes.json')

############################################################
# Set up MockRequests
#===========================================================
import urllib.request
from mock_request.requests import MockRequests

# First, download the dat files relevant containing dictionaries
exp_path = "https://assets.datacamp.com/production/repositories/4412/datasets/1dd811c7b392767be29ca33877113f976afe3053/next_50_cafes.pkl"
avr_path = "https://assets.datacamp.com/production/repositories/4412/datasets/67a5ee3539b2916df0e6ebc63216c6f396693137/available_requests.dat"
err_path = "https://assets.datacamp.com/production/repositories/4412/datasets/cfc89bc158082f77e30cfa230890bc60b2e68df5/404_error.pkl"
erp_path = "https://assets.datacamp.com/production/repositories/4412/datasets/aaf775650bf3386e156f27c8b0198fcc94b4d9f4/request_errors.dat"

# Download the file from URL path and save it locally
urllib.request.urlretrieve(exp_path, "next_50_cafes.pkl")
urllib.request.urlretrieve(avr_path, "available_requests.dat")
urllib.request.urlretrieve(err_path, "404_error.pkl")
urllib.request.urlretrieve(erp_path, "request_errors.dat")

# Then, instantiate the MockRequests object and save it as requests
requests = MockRequests("available_requests.dat", "request_errors.dat")

# Use requests.get as normal
############################################################

In [None]:
# Add an offset parameter to get cafes 51-100
params = {"term": "cafe", 
          "location": "NYC",
          "sort_by": "rating", 
          "limit": 50,
          "offset": 50}

result = requests.get(api_url, headers=headers, params=params)
next_50_cafes = json_normalize(result.json()["businesses"])

# Append the results, setting ignore_index to renumber rows
cafes = top_50_cafes.append(next_50_cafes, ignore_index=True)

# Print shape of cafes
print(cafes.shape)

<p>In this exercise, you’ll practice appending records by creating a dataset of the 100 highest-rated cafes in New York City according to Yelp.</p>
<p>APIs often limit the amount of data returned, since sending large datasets can be time- and resource-intensive. The Yelp Business Search API limits the results returned in a call to 50 records. However, the <code>offset</code> parameter lets a user retrieve results starting after a specified number. By modifying the offset, we can get results 1-50 in one call and 51-100 in another. Then, we can append the data frames.</p>
<p><code>pandas</code> (as <code>pd</code>), <code>requests</code>, and <code>json_normalize()</code> have been imported. The 50 top-rated cafes are already in a data frame, <code>top_50_cafes</code>.</p>

<ul>
<li>Add an <code>"offset"</code> parameter to <code>params</code> so that the Yelp API call will get cafes 51-100.</li>
<li>Append the results of the API call to <code>top_50_cafes</code>, setting <code>ignore_index</code> so rows will be renumbered.</li>
<li>Print the shape of the resulting data frame, <code>cafes</code>, to confirm there are 100 records.</li>
</ul>

<ul>
<li>If using an offset of 0 gets results 1-50, what offset will get results starting at 51?</li>
<li>Recall that <code>append()</code> is a DataFrame method like  <code>head()</code>, not a function like <code>read_json()</code>.</li>
<li><code>ignore_index</code> takes <code>True</code>/<code>False</code> as an argument. Did you set the right value?</li>
</ul>

## Merge data frames

In [None]:
import pandas as pd
from urllib.request import urlretrieve

crosswalk_url = 'https://assets.datacamp.com/production/repositories/4412/datasets/ef15460ec3234617ebebb227119282bd44d49c8c/zip_to_puma.csv'
pop_data_url = 'https://assets.datacamp.com/production/repositories/4412/datasets/e2a8074a29a987eacfe9ee49043094eefb909fcc/2016acs5yr_puma.xlsx'
cafes_url = 'https://assets.datacamp.com/production/repositories/4412/datasets/2c7576fc1a19d7a62df027a8ed662b0db9b7f16d/cafes.json'
urlretrieve(crosswalk_url, 'zip_to_puma.csv')
urlretrieve(pop_data_url, 'pop_data.xlsx')
urlretrieve(cafes_url, 'cafes.json')

crosswalk = pd.read_csv('zip_to_puma.csv', dtype={'zipcode':'str', 'puma':'str'})
pop_data = pd.read_excel('pop_data.xlsx', dtype={'puma': 'str'})
cafes = pd.read_json('cafes.json',orient='records', dtype={'location_zip_code':'str'}, precise_float=True)

<p>In the last exercise, you built a dataset of the top 100 cafes in New York City according to Yelp. Now, you'll combine that with demographic data to investigate which neighborhood has the most good cafes per capita.</p>
<p>To do this, you'll merge two datasets with the DataFrame <code>merge()</code> method. The first,<code>crosswalk</code>, is a crosswalk between ZIP codes and Public Use Micro Data Sample Areas (PUMAs), which are aggregates of census tracts and correspond roughly to NYC neighborhoods. Then, you'll merge in <code>pop_data</code>, which contains 2016 population estimates for each PUMA.</p>
<p><code>pandas</code> (as <code>pd</code>) has been imported, as has the <code>cafes</code> data frame from last exercise.</p>