## Wrangling 102:
### Moving beyond tabular data

There are many types of data, but we will be looking at one in particular: JSON data, which corresponds to a dictionary in Python

In [1]:
import json
import numpy as np
import pandas
import random
import requests

### Dictionaries

Store data in Python, but it doesn't have to be tabular.  That means that it can have more items under one heading.

In [2]:
##set up some dinner suggestions:



fish_stew = {"name" : "fish stew",
    "ingredients":["fish", "white wine", "shrimp", "canned tomatoes", "garlic"], 
            "active_time": 20, "passive_time": 5}

sausage = {"name" : "sausage with rice and green veg",
    "ingredients":["sausage", "rice", "green veggie"], 
            "active_time": 10, "passive_time": 40}

pasta = {"name" : "Spaghetti with meat sauce",
    "ingredients":["ground beef", "canned tomatoes", "garlic", "onion", "tomato sauce", "tomato paste", "pasta"], 
            "active_time": 20, "passive_time": 10}


## can my dictionary contain dictionaries?  Yes! Yes it can.
## These are nested:

meal_list = {"fish stew":fish_stew, 
             "sausage": sausage,
            "pasta with meat sauce":pasta}

In [3]:
## Explore:

#print(pasta)

#print("To make pasta I will need:", pasta["ingredients"])


## drill down several layers:
meal_list["sausage"]["ingredients"]

['sausage', 'rice', 'green veggie']

In [4]:
## This is just for fun.
## What should I have for dinner?
##choose a random item from a list of the meal list keys:

supper = random.choice(list(meal_list.keys()))

print("Supper is", meal_list[supper]["name"])

print("Grocery List:", meal_list[supper]["ingredients"])

print("It will take", meal_list[supper]["active_time"]+ meal_list[supper]["passive_time"], "minutes", "but only",  
      meal_list[supper]["active_time"], "is active."  )

Supper is sausage with rice and green veg
Grocery List: ['sausage', 'rice', 'green veggie']
It will take 50 minutes but only 10 is active.


In [5]:
hi =sausage.get("ingredients")

print(hi)

['sausage', 'rice', 'green veggie']


### What is happening here?  Let's look more carefully at the dictionary:



In [6]:
french = {"Bonjour": "hello",
         "Au revoir": "goodbye",
         "Merci": "thanks",
         "Chat": "cat",
         "Chien": "dog"}

A dictionary is made up of ordered items:
* key: value

Here, "Bonjour" is the key, and "hello" is the value.

The key is a character string, but:

A value can be:
* a string
* a number
* a list
* another dictionary.
* any object!



In [7]:
french["Merci"]

'thanks'

## JSON

When we store nontabular data, it often looks like a dictionary.  We are going to look at JSON in particular.  Look at the data below:


#### from: https://pokeapi.co/api/v2/ability/9/

```
{
  "id": 1,
  "name": "stench",
  "is_main_series": true,
  "generation": {
    "name": "generation-iii",
    "url": "https://pokeapi.co/api/v2/generation/3/"
  },
  "names": [
    {
      "name": "Stench",
      "language": {
        "name": "en",
        "url": "https://pokeapi.co/api/v2/language/9/"
      }
    }
  ],
  "effect_entries": [
    {
      "effect": "This Pokémon's damaging moves have a 10% chance to make the target [flinch]{mechanic:flinch} with each hit if they do not already cause flinching as a secondary effect.\n\nThis ability does not stack with a held item.\n\nOverworld: The wild encounter rate is halved while this Pokémon is first in the party.",
      "short_effect": "Has a 10% chance of making target Pokémon [flinch]{mechanic:flinch} with each hit.",
      "language": {
        "name": "en",
        "url": "https://pokeapi.co/api/v2/language/9/"
      }
    }
  ],
  "effect_changes": [
    {
      "version_group": {
        "name": "black-white",
        "url": "https://pokeapi.co/api/v2/version-group/11/"
      },
      "effect_entries": [
        {
          "effect": "Has no effect in battle.",
          "language": {
            "name": "en",
            "url": "https://pokeapi.co/api/v2/language/9/"
          }
        }
      ]
    }
  ],
  "flavor_text_entries": [
    {
      "flavor_text": "è‡­ãã¦ã€€ç›¸æ‰‹ãŒ\nã²ã‚‹ã‚€ã€€ã“ã¨ãŒã‚ã‚‹ã€‚",
      "language": {
        "name": "ja-kanji",
        "url": "https://pokeapi.co/api/v2/language/11/"
      },
      "version_group": {
        "name": "x-y",
        "url": "https://pokeapi.co/api/v2/version-group/15/"
      }
    }
  ],
  "pokemon": [
    {
      "is_hidden": true,
      "slot": 3,
      "pokemon": {
        "name": "gloom",
        "url": "https://pokeapi.co/api/v2/pokemon/44/"
      }
    }
  ]
}
```

In [8]:
## Let's use an API to learn about a pokemon:
## You can change the name to any pokemon that you would like.

import requests
r = requests.get('https://pokeapi.co/api/v2/pokemon/pikachu')



In [9]:
## Inspect:
r.json()

{'abilities': [{'ability': {'name': 'static',
    'url': 'https://pokeapi.co/api/v2/ability/9/'},
   'is_hidden': False,
   'slot': 1},
  {'ability': {'name': 'lightning-rod',
    'url': 'https://pokeapi.co/api/v2/ability/31/'},
   'is_hidden': True,
   'slot': 3}],
 'base_experience': 112,
 'cries': {'latest': 'https://raw.githubusercontent.com/PokeAPI/cries/main/cries/pokemon/latest/25.ogg',
  'legacy': 'https://raw.githubusercontent.com/PokeAPI/cries/main/cries/pokemon/legacy/25.ogg'},
 'forms': [{'name': 'pikachu',
   'url': 'https://pokeapi.co/api/v2/pokemon-form/25/'}],
 'game_indices': [{'game_index': 84,
   'version': {'name': 'red', 'url': 'https://pokeapi.co/api/v2/version/1/'}},
  {'game_index': 84,
   'version': {'name': 'blue', 'url': 'https://pokeapi.co/api/v2/version/2/'}},
  {'game_index': 84,
   'version': {'name': 'yellow',
    'url': 'https://pokeapi.co/api/v2/version/3/'}},
  {'game_index': 25,
   'version': {'name': 'gold', 'url': 'https://pokeapi.co/api/v2/version

In [10]:
## It's a dictionary!
pikachu = r.json()
type(pikachu)

dict

In [11]:
## we can learn about pikachu

pikachu['abilities']

[{'ability': {'name': 'static', 'url': 'https://pokeapi.co/api/v2/ability/9/'},
  'is_hidden': False,
  'slot': 1},
 {'ability': {'name': 'lightning-rod',
   'url': 'https://pokeapi.co/api/v2/ability/31/'},
  'is_hidden': True,
  'slot': 3}]

In [12]:
## what can I learn about pikachu?
pikachu.keys()

dict_keys(['abilities', 'base_experience', 'cries', 'forms', 'game_indices', 'height', 'held_items', 'id', 'is_default', 'location_area_encounters', 'moves', 'name', 'order', 'past_abilities', 'past_types', 'species', 'sprites', 'stats', 'types', 'weight'])

In [13]:
pikachu["weight"]

60

In [14]:
## I can drill down in abilities:
pikachu["abilities"][0].keys()
#pikachu["abilities"][1].keys()

pikachu["abilities"][0]

{'ability': {'name': 'static', 'url': 'https://pokeapi.co/api/v2/ability/9/'},
 'is_hidden': False,
 'slot': 1}

## XML

This is also a popular non-tabular data format, but it's designed to be a bit more human readable than json.

It's a markup langauge, much like html.  Here's a small example, from https://www.w3schools.com/xml/

```
<?xml version="1.0" encoding="UTF-8"?>
<breakfast_menu>
<food>
    <name>Belgian Waffles</name>
    <price>$5.95</price>
    <description>
   Two of our famous Belgian Waffles with plenty of real maple syrup
   </description>
    <calories>650</calories>
</food>
<food>
    <name>Strawberry Belgian Waffles</name>
    <price>$7.95</price>
    <description>
    Light Belgian waffles covered with strawberries and whipped cream
    </description>
    <calories>900</calories>
</food>
</breakfast_menu>
```

It's easier to see what's happening here, but it's harder to convert to a dictionary