# Lecture 22 – File Formats and Dictionaries

## Data 6, Summer 2022

In [None]:
from datascience import * 
import numpy as np

## File Formats

While we can save information in our Python programs or Jupyter Notebooks by assigning a value to a name, we may also want to store information to files (or to import information _from_ files). Files come in a variety of different types, including both tabular and non-tabular data structures.

We've already seen one type of tabular files in the form of `.csv` files, which we use to generate a lot of our Python tables. The table method `Table.read_csv(filepath)` converts the `.csv` file at the `filepath` location into a table that we can work with in Python

In [None]:
... # Load in the 'covid.csv' file as a Table

If we want to store data that doesn't fit neatly into a table, we may want to use a **Dictionary** or other data structure. A dictionary is an example of a non-tabular data structure/type.

In [None]:
grandma_tree = {
    "name": "Grandma",
    "children": [
        {"name": "Dad",
         "children": [
            {"name": "Me"},
            {"name": "Brother"}
             ]
            },
        {"name": "my aunt",
         "children": [
            {"name": "Cousin 1"},
            {"name": "Cousin 2",
             "children": [
                 {"name": "Cousin 2 Jr."}
                 ]
              }
            ]
        }
    ]
}

In [None]:
grandma_tree

The `.JSON` filetype supports non-tabular data like dictionaries.

In [None]:
# Just run this cell
import json
def read_json(path):
    return json.load(open(path, 'r'))

In [None]:
read_json('data/family.json')

In the example above, note that the `path` to the `family.json` file also includes `'data/'`. This tells Python that the `family.json` file is located in the `data` folder.

## Dictionaries

In order to store multiple values in a single name, we have mostly used arrays up until this point.

In [None]:
names = make_array('bill', 'sarah', 'cal', 'nina')
names

Arrays are great for storing data that have a particular ordering to them. Arrays allow us to get a particular element from an array using the element's **index**.

In [None]:
... # Retrieve the item 'cal' by using its index

But not all collections of data have a set order. For example, what's to say that the four names above should be sorted in the order: cal, bill, nina, sarah ?

When we don't care about the order of our values, we may want to use a **dictionary**. Dictionaries store data in **key-value pairs**, meaning that we can retrieve a corresponding value for each key that we give the dictionary. You can think of keys as words in a physical dictionary and values as the word definitions that you are trying to find.

In [None]:
dog = {'name': 'Junior', 
       'age': 11, 
       4: ['kibble', 'treat']}

For example, given a dictionary called `dog`, I can find the name of the dog by using the key `'name'`.

In [None]:
... # Get the name of the dog

I can also find the dog's age:

In [None]:
... # Get the age of the dog

And retrieve the values at key `4`:

In [None]:
... # Find the value assigned to the key `4`

But remember that dictionaries don't have indices, so the following code will cause an error because there is no key `0` in the dictionary

In [None]:
dog[0]

### Quick Check 1

After defining `bears`, what are the values of:
* `bears['polar']['hungry']`
* `bears[None][1]`
* `bears['weight_range']`

Try to answer the question before running the code below.

In [None]:
bears = {
    'polar': {
        'color': 'white',
        'weight_range': [175, 700],
        'hungry': True
    },
    'grizzly': {
        'color': 'brown',
        'weight_range': [130, 360],
        'endangered': False
    },
    None: ['koala', 'panda']
}

In [None]:
bears['polar']['hungry']

In [None]:
bears[None][1]

In [None]:
bears['weight_range']

### Modifying Dictionary Entries

We can modify the entries in a dictionary by re-assigning the key to a different value.

In [None]:
# {} creates a new empty dictionary
slang = {}
slang

In [None]:
slang['btw'] = 'by the way'
slang

In [None]:
slang['nw'] = 'no worries'
slang

In [None]:
slang

In [None]:
... # Update the value of the key 'btw'

In [None]:
... # Update the value of the key 'nw'

### Quick Check 2

After running the following four lines of code, what are the values of `numbers['1']`, `numbers['five']`, `numbers[1]`, and `numbers[2]`?

Try to answer before actually running the code yourself.

In [None]:
two = 1
numbers = {'1': 2}
numbers['five'] = 5
numbers[two] = numbers['1']
numbers[2] = numbers[1] + numbers['five']

In [None]:
numbers['1']

In [None]:
numbers['five']

In [None]:
numbers[1]

In [None]:
numbers[2]

### Dictionary Iteration

In [None]:
more_slang = {
    'haha': 'that was not funny',
    'smh': 'shake my head',
    'lol': 'laugh out loud',
    'GOAT': 'greatest of all time'
}

In [None]:
# Number of key-value pairs
len(more_slang)

In [None]:
# We will do this often!
list(more_slang.keys())

In [None]:
# We will not do this often
list(more_slang.values())

In [None]:
# Checks if 'smh' is a key
'smh' in more_slang

In [None]:
# Checks if 'shake my head' is a key
# It is not – it is a value
'shake my head' in more_slang

In [None]:
more_slang

We can iterate through all keys in a dictionary like this:

In [None]:
for abb in more_slang.keys():
    print(more_slang[abb])

In [None]:
for abb in more_slang.keys():
    print(abb, more_slang[abb])

### Example: Deciphering Gen Z Lingo

In [None]:
# Define the function replace_slang(text) to
# replace all abbreviations in text that are defined in more_slang with their full forms

def replace_slang(text):
    ...

In [None]:
replace_slang('smh, I did not lol')

In [None]:
replace_slang('serena is the GOAT')