# Scraping Data from REST APIs

Many websites make their data readily available using a REST API. Instead of scraping the HTML from a webpage,

- you specify the data that you want by issuing an HTTP request to a URL (called an **endpoint**)
- the server responds with the data in a convenient format (called a **representation**), such as JSON or XML

To see how this works, we are going to practice with the [REST API](https://rapidapi.com/apidojo/api/tasty) for Tasty, a recipes website.

## Preparation (before section)

You will have time to work on these exercises in section. All you need to do before section is:

1. [create an account](https://rapidapi.com/apidojo/api/tasty) and subscribe to the "Basic" (free) plan
2. log in and copy the X-RapidAPI-Key, which is a long string of letters and digits
3. paste this key in the `headers` below

If you did everything correctly, then running the cell below should return a JSON object containing all the tags recognized by the Tasty API.

In [None]:
import requests

domain = "https://tasty.p.rapidapi.com"
endpoint = "tags/list"
url = f"{domain}/{endpoint}"

# TODO: Update the `headers` with your X-RapidAPI-Key.
headers = {
	"X-RapidAPI-Key": "MY_API_KEY",
	"X-RapidAPI-Host": "tasty.p.rapidapi.com"
}

# Make an HTTP request to the REST API, get the JSON response.
response = requests.get(url, headers=headers)
response.json()

{'count': 556,
 'results': [{'display_name': 'German',
   'id': 64450,
   'name': 'german',
   'parent_tag_name': 'european',
   'root_tag_type': 'cuisine',
   'type': 'european'},
  {'display_name': 'Indian',
   'id': 64452,
   'name': 'indian',
   'parent_tag_name': 'asian',
   'root_tag_type': 'cuisine',
   'type': 'asian'},
  {'display_name': 'Japanese',
   'id': 64454,
   'name': 'japanese',
   'parent_tag_name': 'asian',
   'root_tag_type': 'cuisine',
   'type': 'asian'},
  {'display_name': 'Central & South American',
   'id': 64456,
   'name': 'central_south_american',
   'parent_tag_name': 'cuisine',
   'root_tag_type': 'cuisine',
   'type': 'cuisine'},
  {'display_name': 'Dairy-Free',
   'id': 64463,
   'name': 'dairy_free',
   'parent_tag_name': 'dietary',
   'root_tag_type': 'dietary',
   'type': 'dietary'},
  {'display_name': 'Vegan',
   'id': 64468,
   'name': 'vegan',
   'parent_tag_name': 'dietary',
   'root_tag_type': 'dietary',
   'type': 'dietary'},
  {'display_name':

You will need to pass these `headers` with every HTTP request to the API. The API key (X-RapidAPI-Key) is how the server keeps track of how many requests you have made.

Please have a look at [the documentation](https://rapidapi.com/apidojo/api/tasty). Click on some of the different endpoints. Notice that this brings up a form that you can fill in, which generates the corresponding code. (By default, it's set to Node.js. You should set it to Python.)

## Question 1

You would like to cook the daikon (an Asian radish) in your fridge. You would like to find recipes that involve daikon.

Which endpoint would you use? Issue an HTTP request to the endpoint, and convert the JSON response to a `DataFrame` using `json_normalize`.

In [None]:
import requests

domain = "https://tasty.p.rapidapi.com"
endpoint = "recipes/list"
url = f"{domain}/{endpoint}"

# TODO: Update the `headers` with your X-RapidAPI-Key.
headers = {
	"X-RapidAPI-Key": "MY_API_KEY",
	"X-RapidAPI-Host": "tasty.p.rapidapi.com"
}

query_string = {"from":"0","size":"40","q":"daikon"}

response = requests.get(url, headers=headers, params=query_string)

# download the JSON response to a file
with open("data/daikon_recipes.json", "w") as f:
    f.write(response.text)

In [3]:
import json

with open("data/daikon_recipes.json", "r") as f:
    daikon_recipes = json.load(f)

# check the keys of the JSON response
daikon_recipes.keys()

dict_keys(['count', 'results'])

In [4]:
# check the keys of the first recipe
daikon_recipes["results"][0].keys()

dict_keys(['approved_at', 'aspect_ratio', 'beauty_url', 'brand', 'brand_id', 'buzz_id', 'canonical_id', 'compilations', 'cook_time_minutes', 'country', 'created_at', 'credits', 'description', 'draft_status', 'facebook_posts', 'id', 'inspired_by_url', 'instructions', 'is_app_only', 'is_one_top', 'is_shoppable', 'is_subscriber_content', 'keywords', 'language', 'name', 'num_servings', 'nutrition', 'nutrition_visibility', 'original_video_url', 'prep_time_minutes', 'price', 'promotion', 'renditions', 'sections', 'seo_path', 'seo_title', 'servings_noun_plural', 'servings_noun_singular', 'show', 'show_id', 'slug', 'tags', 'thumbnail_alt_text', 'thumbnail_url', 'tips_and_ratings_enabled', 'tips_summary', 'topics', 'total_time_minutes', 'total_time_tier', 'updated_at', 'user_ratings', 'video_ad_content', 'video_id', 'video_url', 'yields'])

In [5]:
import pandas as pd

# convert the list of recipes to a DataFrame
df_daikon_recipes = pd.json_normalize(daikon_recipes['results'])
df_daikon_recipes[['name', 'description', 'tags', 'nutrition.calories', 'nutrition.fat', 'nutrition.protein']]

Unnamed: 0,name,description,tags,nutrition.calories,nutrition.fat,nutrition.protein
0,How To Make Vegan Pho,Warm up your soul with this aromatic Vegan Pho...,"[{'display_name': 'Vietnamese', 'id': 64461, '...",179.0,5.0,9.0
1,Banh Mi Meatball Sandwich,This Vietnamese-inspired sandwich features ten...,"[{'display_name': 'Vietnamese', 'id': 64461, '...",462.0,26.0,31.0
2,Instant Pot Beef Bulgogi,Choose your own dinner adventure with this bee...,"[{'display_name': 'Korean', 'id': 64455, 'name...",1137.0,70.0,76.0
3,How To Make Vegan Kimchi,Unleash your inner fermentation guru with this...,"[{'display_name': 'Korean', 'id': 64455, 'name...",170.0,0.0,5.0
4,Low-Carb Pad Thai,Experience the flavors of Thailand with this L...,"[{'display_name': 'Thai', 'id': 64460, 'name':...",310.0,16.0,23.0
5,Top Chef Junior Pork Bánh Mì Burger,Top Chef Junior pork banh mi burger is a fusio...,"[{'display_name': 'Vietnamese', 'id': 64461, '...",905.0,48.0,41.0
6,Vegan Tofu Bao Buns With Pickled Vegetables,Love vegetables and a satisfying challenge?! T...,"[{'display_name': 'Chinese', 'id': 64448, 'nam...",320.0,9.0,8.0
7,Bibimbap By Chef Esther Choi,This bibimbap recipe by Chef Esther Choi is a ...,"[{'display_name': 'Korean', 'id': 64455, 'name...",1382.0,51.0,40.0
8,Japanese Omelette,"If you’ve never tried a Japanese omelette, kno...","[{'display_name': 'Japanese', 'id': 64454, 'na...",412.0,22.0,28.0
9,Grilled Lemongrass Pork Bánh Mì,This recipe makes it easy to recreate a classi...,"[{'display_name': 'Vietnamese', 'id': 64461, '...",1224.0,80.0,63.0


## Question 2

Find recipes containing avocado. Among recipes with over 500 reviews, which one had the highest score (proportion of positive reviews)?

There are hundreds of results, but the API only returns 20 results by default (and a maximum of 40 results, even if you specify the `size=` parameter). You will need to use a `for` loop, incrementing the `from=` parameter. Be sure to respect the API's rate limits (or you may be blocked!)

In [None]:
import requests

domain = "https://tasty.p.rapidapi.com"
endpoint = "recipes/list"
url = f"{domain}/{endpoint}"

# TODO: Update the `headers` with your X-RapidAPI-Key.
headers = {
	"X-RapidAPI-Key": "MY_API_KEY",
	"X-RapidAPI-Host": "tasty.p.rapidapi.com"
}

offset = 0
size = 40
query_string = {"from":offset, "size":size, "q":"avocado"}

# get the result count from the first query
response = requests.get(url, headers=headers, params=query_string)
result_count = response.json().get("count", 0)

# loop through the results, 40 at a time
all_avocado_recipes = pd.DataFrame()
while offset < result_count:
    query_string = {"from":offset, "size":size, "q":"avocado"}
    response = requests.get(url, headers=headers, params=query_string)
    # write the each response to a separate file
    with open(f"data/avocado_recipes_{offset}.json", "w") as f:
        f.write(response.text)
    # read the file and append to the list
    with open(f"data/avocado_recipes_{offset}.json", "r") as f:
        avocado_recipes = json.load(f)
    avocado_recipes_df = pd.json_normalize(avocado_recipes['results'])
    all_avocado_recipes = pd.concat([all_avocado_recipes, avocado_recipes_df], ignore_index=True)
    offset += size

  all_avocado_recipes = pd.concat([all_avocado_recipes, avocado_recipes_df], ignore_index=True)
  all_avocado_recipes = pd.concat([all_avocado_recipes, avocado_recipes_df], ignore_index=True)
  all_avocado_recipes = pd.concat([all_avocado_recipes, avocado_recipes_df], ignore_index=True)
  all_avocado_recipes = pd.concat([all_avocado_recipes, avocado_recipes_df], ignore_index=True)
  all_avocado_recipes = pd.concat([all_avocado_recipes, avocado_recipes_df], ignore_index=True)


In [7]:
all_avocado_recipes

Unnamed: 0,approved_at,aspect_ratio,beauty_url,brand,brand_id,buzz_id,canonical_id,compilations,cook_time_minutes,country,...,user_ratings.score,tips_summary.by_line,tips_summary.content,tips_summary.header,total_time_tier,brand.id,brand.image_url,brand.name,brand.slug,recipes
0,1553195044,1:1,,,,,recipe:4704,"[{'approved_at': 1553197578, 'aspect_ratio': '...",0.0,US,...,0.918605,,,,,,,,,
1,1498084698,1:1,,,,4743339.0,recipe:449,"[{'approved_at': 1516131836, 'aspect_ratio': N...",4.0,US,...,0.987034,Powered By Botatouille,• Reduce the amount of salt in the avocado sal...,Highlights,,,,,,
2,1495581883,1:1,,,,,recipe:56,"[{'approved_at': 1541095392, 'aspect_ratio': '...",15.0,US,...,0.896691,Powered By Botatouille,• Add more salt for a tastier dish 🧂\n• Double...,Highlights,,,,,,
3,1501004125,1:1,,,,,recipe:1340,"[{'approved_at': 1501004143, 'aspect_ratio': '...",12.0,ZZ,...,0.980076,Powered By Botatouille,• Reduce the salt for a more balanced flavor 🤤...,Highlights,,,,,,
4,1497032696,1:1,,,,,recipe:657,"[{'approved_at': 1552488567, 'aspect_ratio': '...",12.0,ZZ,...,0.934959,Powered By Botatouille,• Reduce the amount of chili and cayenne peppe...,Highlights,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
470,1601148018,16:9,,,,,recipe:6625,[],25.0,US,...,,,,,,,,,,
471,1623943193,16:9,,,,,recipe:7430,[],30.0,US,...,0.000000,,,,,,,,,
472,1598467230,16:9,,,,,recipe:6487,[],17.0,US,...,,,,,,,,,,
473,1596675129,16:9,,,,,recipe:6455,[],20.0,US,...,,,,,,,,,,


In [8]:
all_avocado_recipes.columns

Index(['approved_at', 'aspect_ratio', 'beauty_url', 'brand', 'brand_id',
       'buzz_id', 'canonical_id', 'compilations', 'cook_time_minutes',
       'country', 'created_at', 'credits', 'description', 'draft_status',
       'facebook_posts', 'id', 'inspired_by_url', 'instructions',
       'is_app_only', 'is_one_top', 'is_shoppable', 'is_subscriber_content',
       'keywords', 'language', 'name', 'num_servings', 'nutrition_visibility',
       'original_video_url', 'prep_time_minutes', 'promotion', 'renditions',
       'sections', 'seo_path', 'seo_title', 'servings_noun_plural',
       'servings_noun_singular', 'show_id', 'slug', 'tags',
       'thumbnail_alt_text', 'thumbnail_url', 'tips_and_ratings_enabled',
       'topics', 'total_time_minutes', 'updated_at', 'video_ad_content',
       'video_id', 'video_url', 'yields', 'nutrition.calories',
       'nutrition.carbohydrates', 'nutrition.fat', 'nutrition.fiber',
       'nutrition.protein', 'nutrition.sugar', 'nutrition.updated_at',
   

In [9]:
# print 10 best avocado recipes with more than 500 ratings
rating_over_500 = all_avocado_recipes[all_avocado_recipes['user_ratings.count_positive'] + all_avocado_recipes['user_ratings.count_negative'] > 500]
best_avocado_recipes = rating_over_500[['name', 'description', 'user_ratings.count_positive', 'user_ratings.count_negative', 'user_ratings.score']].sort_values(by='user_ratings.score', ascending=False).head(10)
best_avocado_recipes

Unnamed: 0,name,description,user_ratings.count_positive,user_ratings.count_negative,user_ratings.score
130,Weekday Meal-Prep Turkey Taco Bowls,Say hello to your new favorite lunchtime savio...,1361.0,17.0,0.987663
1,Grilled Salmon With Avocado Salsa,This grilled salmon with avocado salsa is a fr...,1827.0,24.0,0.987034
74,Chicken Enchilada-Stuffed Zucchini Boats,These chicken enchilada stuffed zucchini boats...,763.0,12.0,0.984516
161,Carne Asada by Gabriel Barajas (aka Mr. TacosWay),These carne asada tacos are a true showstopper...,869.0,17.0,0.980813
129,Honey Mustard Chicken Salad,Serve up this salad for a fresh and satisfying...,1395.0,28.0,0.980323
3,Avocado Lime Salmon,This avocado lime salmon is a healthy and flav...,2066.0,42.0,0.980076
138,One-Pan Southwestern Chicken Quinoa,This one-pan wonder is a flavor explosion! Loa...,1029.0,21.0,0.98
81,Chopped Mediterranean Salad,"Bursting with vibrant flavors, this Chopped Me...",1418.0,29.0,0.979959
190,Pulled Pork Nachos,These crowd-pleasing nachos feature the most f...,516.0,11.0,0.979127
133,Chicken Tortilla Soup,Dive into a mouthwatering medley of shredded c...,953.0,21.0,0.978439


## Question 3

Take the avocado JSON data from above (you do not need to read in the data from the REST API again). How many recipes are vegetarian? You should be able to identify this from the "tags" attribute.

_Hint:_ Try flattening the data so that there is one row for each tag.

In [10]:
all_avocado_recipes["tags"]

0      [{'display_name': 'North American', 'id': 6444...
1      [{'display_name': 'North American', 'id': 6444...
2      [{'display_name': 'North American', 'id': 6444...
3      [{'display_name': 'North American', 'id': 6444...
4      [{'display_name': 'North American', 'id': 6444...
                             ...                        
470    [{'display_name': 'North American', 'id': 6444...
471    [{'display_name': 'North American', 'id': 6444...
472    [{'display_name': 'Indian', 'id': 64452, 'name...
473    [{'display_name': 'Indian', 'id': 64452, 'name...
474    [{'display_name': 'Italian', 'id': 64453, 'nam...
Name: tags, Length: 475, dtype: object

In [11]:
# Flatten the tags column for each recipe
avocado_recipes_exploded = all_avocado_recipes.explode('tags').reset_index(drop=True)
avocado_recipes_exploded['tags']

0        {'display_name': 'North American', 'id': 64444...
1        {'display_name': 'Comfort Food', 'id': 64462, ...
2        {'display_name': 'Gluten-Free', 'id': 64465, '...
3        {'display_name': 'Healthy', 'id': 64466, 'name...
4        {'display_name': 'Low-Carb', 'id': 64467, 'nam...
                               ...                        
12746    {'display_name': 'Cooking Style', 'id': 929581...
12747    {'display_name': 'Meal', 'id': 9295813, 'name'...
12748    {'display_name': 'Chicken', 'id': 9299514, 'na...
12749    {'display_name': 'Pasta', 'id': 9299522, 'name...
12750    {'display_name': 'Dairy', 'id': 10623608, 'nam...
Name: tags, Length: 12751, dtype: object

In [12]:
# normalize the tags column
avocado_recipes_tags = pd.json_normalize(avocado_recipes_exploded['tags'])
# add a prefix to the column names to avoid name clashes
avocado_recipes_tags = avocado_recipes_tags.add_prefix('tag_')
avocado_recipes_tags

Unnamed: 0,tag_display_name,tag_id,tag_name,tag_parent_tag_name,tag_root_tag_type,tag_type
0,North American,64444.0,north_american,cuisine,cuisine,cuisine
1,Comfort Food,64462.0,comfort_food,cooking_style,cooking_style,cooking_style
2,Gluten-Free,64465.0,gluten_free,dietary,dietary,dietary
3,Healthy,64466.0,healthy,,healthy,healthy
4,Low-Carb,64467.0,low_carb,healthy,healthy,healthy
...,...,...,...,...,...,...
12746,Cooking Style,9295810.0,cooking_style,,cooking_style,cooking_style
12747,Meal,9295813.0,meal,,meal,meal
12748,Chicken,9299514.0,chicken,dinner,meal,dinner
12749,Pasta,9299522.0,pasta,dinner,meal,dinner


In [13]:
# concatenate with the original DataFrame
avocado_recipes_with_tags = pd.concat([avocado_recipes_exploded, avocado_recipes_tags], axis=1).drop(columns=['tags'])

In [14]:
# get the vegetarian recipes
vegetarian_recipes = avocado_recipes_with_tags[avocado_recipes_with_tags['tag_name'] == 'vegetarian']
vegetarian_recipes[['name', 'description', 'tag_name']]

Unnamed: 0,name,description,tag_name
172,Chocolate Avocado Brownies,These chocolate avocado brownies are the perfe...,vegetarian
266,"Cucumber, Tomato, And Avocado Salad",,vegetarian
321,"Egg, Avocado, & Tomato Toast",The Egg Avocado Tomato Toast is a delicious an...,vegetarian
346,Avocado Quinoa Power Salad,Fuel your day with this vibrant Avocado Quinoa...,vegetarian
446,Roasted Chickpea And Avocado Salad,This roasted chickpea and avocado salad is a h...,vegetarian
...,...,...,...
12566,Chickpea And Veggie Avocado Wrap,Try these healthy and delicious veggie wraps m...,vegetarian
12617,Easy Vegan Asian Tofu Nourish Bowl,Savor the vibrant flavors of this Easy Vegan A...,vegetarian
12637,Sheet Pan Sweet Potato Tacos,,vegetarian
12686,Instant Pot Pav Bhaji,,vegetarian


Now you can use the Tasty API to help you decide what's for dinner!