<a href="https://colab.research.google.com/github/JyotiKuber/CodeDivision-Python/blob/main/Copy_of_Working_with_JSON_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Getting data from APIs and working with the JSON format and Python dictionaries
---
\
Often, the data we need to work with is updated at regular intervals.  If this is the case, then we need to be able to access it, process it and store it repeatedly.  This type of data often comes from APIs and it is in its 'raw' form, not organised into tables ready for a spreadsheet or for a pandas dataframe.

Data from APIs is requested and often turns up in JSON format.  This is a structured format, very much like the structure of a python dictionary, and is stored as text.  This makes it quite small and, therefore, economic and fast to transfer.

The exercises below will get you to look at the data requested from an API (from Punk Beer), turn it into a dictionary, and manipulate it to get useful information from it.

Towards the end you will turn some of the data into a pandas dataframe, which is sometimes the required outcome of processing - to turn the data into table format for analysis in a spreadsheet or a pandas dataframe.

The pandas library provides some useful functions for converting dictionary data (which has been created from the JSON data) into a data table, and expanding it to increase the amount of records (sometimes creating repetition) so that you can perform common functions, like averages, linear regressions, categorisation, etc, where appropriate to the data)

Two useful functions for this worksheet are:

`df = pd.DataFrame(json_data)` 

This will take data in dictionary format and create a table from the top level keys (you will be able to see this when you get to Exercises 8 and 9).  When working with JSON data, converted to dictionaries, you should always inspect the data to see how it is organised and to see which parts of it are useful and which can be discarded.  You will then be able to create the dataframe from the useful data, where this makes sense as a table.

**Run the code** below to get an idea of how the dataframe with a list might look.

In [1]:
import pandas as pd
json_data_dict = {"furniture":[{"item":"sofa", "colours":["red", "green","lilac"]},{"item":"armchair","colours":["black","beige","green"]}]}
df = pd.DataFrame(json_data_dict["furniture"])
display(df)

Unnamed: 0,item,colours
0,sofa,"[red, green, lilac]"
1,armchair,"[black, beige, green]"


We can expand (or explode) this dataframe so that it has one row for each column.

`df = df.explode(column_name)`

This will take a dataframe with a column that contains lists of items (often the case where data is from a JSON source) and will create a row for each item in the list, containing all the other data that was associated with that list.  So the example dataframe will be exploded to a dataframe with 6 rows instead of 2, but with a fair amount of repetition.

**Run the code** below to see how the dataframe would look if the colours column is exploded into separate rows for each item in the list.

In [2]:
exploded_df = df.explode("colours")
display(exploded_df)

Unnamed: 0,item,colours
0,sofa,red
0,sofa,green
0,sofa,lilac
1,armchair,black
1,armchair,beige
1,armchair,green


You will notice that the index has been kept from the original 2 row dataframe.  For this dataframe to be used, it may be necessary to re-index.  This can be done easily using the parameter `ignore_index=True` in the df.explode(column_name) instruction, as shown when you **run the code** below:

In [3]:
exploded_df = df.explode("colours", ignore_index=True)
display(exploded_df)

Unnamed: 0,item,colours
0,sofa,red
1,sofa,green
2,sofa,lilac
3,armchair,black
4,armchair,beige
5,armchair,green


### Exercise 1 - getting the data
---
\
APIs can sometimes be charged, so minimising the number of times you call them is important.  This one is free but will be blocked if requested too many times in a short period.  So you will call it once each time you open this worksheet, then will use the variable `beer_data` to access the data.

**Run the code** below to request the data from the API. This particular API will return a list of dictionaries (as you will see from the printout). 

The code will assign the data from the API to a variable called **beer_data** so that we don't keep calling the API. 

In [4]:
import requests
import json
beer_data = requests.get("https://api.punkapi.com/v2/beers").json()
print(beer_data)

[{'id': 1, 'name': 'Buzz', 'tagline': 'A Real Bitter Experience.', 'first_brewed': '09/2007', 'description': 'A light, crisp and bitter IPA brewed with English and American hops. A small batch brewed only once.', 'image_url': 'https://images.punkapi.com/v2/keg.png', 'abv': 4.5, 'ibu': 60, 'target_fg': 1010, 'target_og': 1044, 'ebc': 20, 'srm': 10, 'ph': 4.4, 'attenuation_level': 75, 'volume': {'value': 20, 'unit': 'litres'}, 'boil_volume': {'value': 25, 'unit': 'litres'}, 'method': {'mash_temp': [{'temp': {'value': 64, 'unit': 'celsius'}, 'duration': 75}], 'fermentation': {'temp': {'value': 19, 'unit': 'celsius'}}, 'twist': None}, 'ingredients': {'malt': [{'name': 'Maris Otter Extra Pale', 'amount': {'value': 3.3, 'unit': 'kilograms'}}, {'name': 'Caramalt', 'amount': {'value': 0.2, 'unit': 'kilograms'}}, {'name': 'Munich', 'amount': {'value': 0.4, 'unit': 'kilograms'}}], 'hops': [{'name': 'Fuggles', 'amount': {'value': 25, 'unit': 'grams'}, 'add': 'start', 'attribute': 'bitter'}, {'nam

### Exercise 2 - `beer_data` is a list of dictionaries.  Get the first item in the list and print its keys and values.
---
\
Write a function that will:
*  get the first dictionary in the list of dictionaries in `beer_data`
*  get the dictionary's keys as a list
*  loop through the list of keys and print the key and its corresponding value 

**Expected output**:  
id  -  1  
name  -  Buzz  
tagline  -  A Real Bitter Experience.  
first_brewed  -  09/2007  
description  -  A light, crisp and bitter IPA brewed with English and American hops. A small batch brewed only once.  
image_url  -  https://images.punkapi.com/v2/keg.png  
abv  -  4.5  
ibu  -  60  
target_fg  -  1010  
target_og  -  1044  
ebc  -  20  
srm  -  10  
ph  -  4.4  
attenuation_level  -  75  
volume  -  {'value': 20, 'unit': 'litres'}  
boil_volume  -  {'value': 25, 'unit': 'litres'}  
method  -  {'mash_temp': [{'temp': {'value': 64, 'unit': 'celsius'}, 'duration': 75}], 'fermentation': {'temp': {'value': 19, 'unit': 'celsius'}}, 'twist': None}  
ingredients  -  {'malt': [{'name': 'Maris Otter Extra Pale', 'amount': {'value': 3.3, 'unit': 'kilograms'}}, {'name': 'Caramalt', 'amount': {'value': 0.2, 'unit': 'kilograms'}}, {'name': 'Munich', 'amount': {'value': 0.4, 'unit': 'kilograms'}}], 'hops': [{'name': 'Fuggles', 'amount': {'value': 25, 'unit': 'grams'}, 'add': 'start', 'attribute': 'bitter'}, {'name': 'First Gold', 'amount': {'value': 25, 'unit': 'grams'}, 'add': 'start', 'attribute': 'bitter'}, {'name': 'Fuggles', 'amount': {'value': 37.5, 'unit': 'grams'}, 'add': 'middle', 'attribute': 'flavour'}, {'name': 'First Gold', 'amount': {'value': 37.5, 'unit': 'grams'}, 'add': 'middle', 'attribute': 'flavour'}, {'name': 'Cascade', 'amount': {'value': 37.5, 'unit': 'grams'}, 'add': 'end', 'attribute': 'flavour'}], 'yeast': 'Wyeast 1056 - American Ale™'}  
food_pairing  -  ['Spicy chicken tikka masala', 'Grilled chicken quesadilla', 'Caramel toffee cake']  
brewers_tips  -  The earthy and floral aromas from the hops can be overpowering. Drop a little Cascade in at the end of the boil to lift the profile with a bit of citrus.  
contributed_by  -  Sam Mason <samjbmason>  

In [5]:
def print_key_value_pairs(data):
  # add your code here
  first_item = data[0]
  key_list = list(first_item.keys())
  for key in key_list:
    print(key, first_item[key])

print_key_value_pairs(beer_data)

id 1
name Buzz
tagline A Real Bitter Experience.
first_brewed 09/2007
description A light, crisp and bitter IPA brewed with English and American hops. A small batch brewed only once.
image_url https://images.punkapi.com/v2/keg.png
abv 4.5
ibu 60
target_fg 1010
target_og 1044
ebc 20
srm 10
ph 4.4
attenuation_level 75
volume {'value': 20, 'unit': 'litres'}
boil_volume {'value': 25, 'unit': 'litres'}
method {'mash_temp': [{'temp': {'value': 64, 'unit': 'celsius'}, 'duration': 75}], 'fermentation': {'temp': {'value': 19, 'unit': 'celsius'}}, 'twist': None}
ingredients {'malt': [{'name': 'Maris Otter Extra Pale', 'amount': {'value': 3.3, 'unit': 'kilograms'}}, {'name': 'Caramalt', 'amount': {'value': 0.2, 'unit': 'kilograms'}}, {'name': 'Munich', 'amount': {'value': 0.4, 'unit': 'kilograms'}}], 'hops': [{'name': 'Fuggles', 'amount': {'value': 25, 'unit': 'grams'}, 'add': 'start', 'attribute': 'bitter'}, {'name': 'First Gold', 'amount': {'value': 25, 'unit': 'grams'}, 'add': 'start', 'attrib

### Exercise 3 - Create a new list with the name and ABV of only the beers that are under 5% ABV.
---
\
Write a function that will take a copy of `beer_data` as a parameter and will:
*  loop through the list and copy any beer that has an ABV less than 5% (a dictionary containing name and ABV only) into a new list
*  loop through the new list and print the name of each beer and the abv one per line

**Expected output**: 

Buzz - 4.5  
Trashy Blonde - 4.1  
Berliner Weisse With Yuzu - B-Sides - 4.2  
Fake Lager - 4.7  
Movember - 4.5  
Alpha Dog - 4.5  
Vice Bier - 4.3  
Bad Pixie - 4.7  



In [6]:
def get_low_ABV_list(data):
  filtered_data = []
  my_dict = {}
  # add your code below here
  for item in data:
   check_abv = item['abv']
   if check_abv < 5:
       my_dict.update ({item['name']:item['abv']})
  filtered_data = list(my_dict.items())
  for key,val in filtered_data:
    print(key,"-",val)
  return filtered_data
# Testing the function
actual = len(get_low_ABV_list(beer_data))
expected = 8
print()
if actual == expected:
  print("Passed! got {}".format(actual))
else: 
  print("Failed! got {} when expected output is {}".format(actual, expected))

Buzz - 4.5
Trashy Blonde - 4.1
Berliner Weisse With Yuzu - B-Sides - 4.2
Fake Lager - 4.7
Movember - 4.5
Alpha Dog - 4.5
Vice Bier - 4.3
Bad Pixie - 4.7

Passed! got 8


### Exercise 4 - Print a label for each bottle of beer using the data. 
---
\
Write a function that will:
*  loop through the beers
*  for each beer, print a 'label' in the form shown below (*must include [name, tagline, first_brewed, description, abv]*)

**Expected output (all labels to look like this first one):**  
Buzz - A Real Bitter Experience.  
Created on 09/2007  
A light, crisp and bitter IPA brewed with English and American hops. A small batch brewed only once.  
4.5% alc. by vol.  

In [7]:
def print_labels(data):
  # add your code here
  for item in data:
    print(item['name'],"-",item['tagline'])
    print("Created on",item['first_brewed'])
    print(item['description'])
    print(item['abv'],"% alc. by vol.\n")
     
print_labels(beer_data)

Buzz - A Real Bitter Experience.
Created on 09/2007
A light, crisp and bitter IPA brewed with English and American hops. A small batch brewed only once.
4.5 % alc. by vol.

Trashy Blonde - You Know You Shouldn't
Created on 04/2008
A titillating, neurotic, peroxide punk of a Pale Ale. Combining attitude, style, substance, and a little bit of low self esteem for good measure; what would your mother say? The seductive lure of the sassy passion fruit hop proves too much to resist. All that is even before we get onto the fact that there are no additives, preservatives, pasteurization or strings attached. All wrapped up with the customary BrewDog bite and imaginative twist.
4.1 % alc. by vol.

Berliner Weisse With Yuzu - B-Sides - Japanese Citrus Berliner Weisse.
Created on 11/2015
Japanese citrus fruit intensifies the sour nature of this German classic.
4.2 % alc. by vol.

Pilsen Lager - Unleash the Yeast Series.
Created on 09/2013
Our Unleash the Yeast series was an epic experiment into th

### Exercise 5 - Print the beer name and the brewing ingredients list for each beer.
---
\
Write a function that will:
*  loop through the data
*  for each, print the name and the ingredients list in the format shown below

**Expected output:**  
Buzz

malt:  
Maris Otter Extra Pale  
Caramalt  
Munich  

hops:  
Fuggles - start  
First Gold - start  
Fuggles - middle  
First Gold - middle  
Cascade - end  

yeast:  
Wyeast 1056 - American Ale™

In [75]:
def print_ingredients(data):
  # add your code below here
    for item in data:
      print()
      print(item['name'])
      ingredients_data = item['ingredients']
      ingredients_keys_list = list(ingredients_data.keys())

      for key in ingredients_keys_list:
        print(key,":")
        if isinstance(ingredients_data[key], list):
          for ingredient in ingredients_data[key]:
            if key == 'hops':
              print(ingredient['name'] ,"-", ingredient['add'])  
            else:
              print(ingredient['name'])                        
        else:
          print(ingredients_data[key])    
     
print_ingredients(beer_data)


Buzz
malt :
Maris Otter Extra Pale
Caramalt
Munich
hops :
Fuggles - start
First Gold - start
Fuggles - middle
First Gold - middle
Cascade - end
yeast :
Wyeast 1056 - American Ale™

Trashy Blonde
malt :
Maris Otter Extra Pale
Caramalt
Munich
hops :
Amarillo - start
Simcoe - start
Amarillo - end
Motueka - end
yeast :
Wyeast 1056 - American Ale™

Berliner Weisse With Yuzu - B-Sides
malt :
Propino Pale Malt
Wheat Malt
Propino Pale Malt for kettle souring
Acidulated Malt for kettle souring
hops :
Bramling Cross - middle
yeast :
Wyeast 1056 - American Ale™

Pilsen Lager
malt :
Extra Pale
Caramalt
Dark Crystal
Munich
hops :
Centennial - start
Amarillo - start
Centennial - middle
Amarillo - middle
Centennial - end
Amarillo - end
yeast :
Wyeast 2007 - Pilsen Lager™

Avery Brown Dredge
malt :
Lager Malt
Wheat
hops :
Saaz - start
Saaz - middle
Saaz - end
yeast :
Wyeast 2007 - Pilsen Lager™

Electric India
malt :
Extra Pale
Munich
Wheat
hops :
Amarillo - start
Nelson Sauvin - middle
Amarillo - mi

### Exercise 6 - Create a pandas dataframe from the data and create a copy of just the first 14 columns. 
---
\
Write a function that will:
*  convert the `beer_data` into a pandas dataframe
*  create a second dataframe containing only the first 14 columns of the original
*  return the second dataframe

In [64]:
import pandas as pd
def get_14_col_dataframe(data):
  # add your code below here
  df = pd.DataFrame(data)
  df = df[df.columns[:14]]
  return df

# Testing the function
actual = get_14_col_dataframe(beer_data)
expected = 14

if len(actual.columns) == expected:
  print("Passed! got {}".format(len(actual.columns)))
else:
  print("Failed! got {} expected {}".format(len(actual.columns), expected))

Passed! got 14


### Exercise 7 - create a dataframe with the id, name and all the method information.
---
\
Write a function that will:
*  create dataframe made from just the id, name and method columns in the `beer_data`
*  return the new dataframe

In [83]:
def get_method_dataframe(data):
  #  add your code below here
  df= pd.DataFrame(data)
  new_df = df[['id','name','method']].copy()
  return new_df

# Testing the function
new_df = get_method_dataframe(beer_data)
actual = list(new_df.columns)
expected = ['id', 'name', 'method']

if actual == expected:
  print("Passed! got {}".format(actual))
else:
  print("Failed! got {} expected {}".format(actual, expected))

Passed! got ['id', 'name', 'method']


### Exercise 8 - create a dataframe with id, name and food pairing.
---
\
Write a function that will:
*  create dataframe made from just the id, name and method columns in the `beer_data`
*  return the new dataframe

In [84]:
def get_food_pairing_dataframe(data):
  # add your code below here
  df= pd.DataFrame(data)
  new_df = df[['id','name','method']].copy()
  return new_df
# Testing the function
actual = get_food_pairing_dataframe(beer_data)
expected = 3

if len(actual.columns) == expected:
  print("Passed! got {}".format(len(actual.columns)))
else:
  print("Failed! got {} expected {}".format(len(actual.columns), expected))

Passed! got 3


### Exercise 9 - Find all beers that pair with Smoked chicken wings.
---
\
Write a function that will take the `beer_data` and a food as parameters and will:
*  return a list of all the beers that pair well with that food.

_Hint: you can process this as a dictionary (original `beer_data`) or convert to pandas, explode on the `food_pairing` column and filter_

In [117]:
def get_beers_food_pair(data, food):
  # add your code below here
  df= pd.DataFrame(data)
  food_df = df[['name','food_pairing']].copy()
  new_df = food_df.explode('food_pairing', ignore_index = True)
  paired_df = new_df[new_df['food_pairing'] == food]
  return paired_df
# Testing the function
actual = get_beers_food_pair(beer_data, "Smoked chicken wings")
expected = 1

if len(actual) == expected:
  print("Passed! got {}".format(len(actual)))
else:
  print("Failed! got {} expected {}".format(len(actual), expected))


Passed! got 1


### Exercise 10 - Calculate the average ph value and then print all beers above the average (less acidic) with their ph value.
---
\
Write a function that will take `beer_data` as a parameter and will:
*  create a dataframe with just the columns `id, name, ph`
*  calculate the average ph
*  filter for all beers that have an above average ph value (so they are less acidic)
*  display and return the filtered data

In [122]:
def get_less_acidic(data):
  # add your code below here
  df = pd.DataFrame(data)
  ph_df = df[['id','name','ph']].copy()
  mean_avg = ph_df['ph'].mean()
  less_acidic_df = ph_df[(ph_df['ph'] > mean_avg)]
  return less_acidic_df
  # Testing the function
actual = get_less_acidic(beer_data)
expected_shape = (4,3)

if actual.shape == expected_shape:
  print("Passed! got {} rows and {} columns".format(actual.shape[0], actual.shape[1]))
else:
  print("Failed! got {} rows and {} columns, expected {} rows and {} columns".format(actual.shape[0], actual.shape[1], expected_shape[0], expected_shape[1]))

Passed! got 4 rows and 3 columns
