# Dictionaries

*Note: You can explore the [associated workbook](https://mybinder.org/v2/gh/melaniewalsh/Intro-Cultural-Analytics/master?urlpath=lab/tree/book/02-Python/Workbooks/11.5-Dictionaries-WORKBOOK.ipynb) for this chapter in the cloud.*

In this lesson, we're going to learn about Python dictionaries by drawing on Anelise Shrout's [Bellevue Almshouse Dataset](https://www.nyuirish.net/almshouse/the-almshouse-records/), excerpted below.

**Preview The Bellevue Almshouse Dataset**

In [1]:
import pandas
pandas.read_csv("../data/bellevue_almshouse_modified.csv").head(20)

Unnamed: 0,date_in,first_name,last_name,age,disease,profession,gender,children
0,1847-04-17,Mary,Gallagher,28.0,recent emigrant,married,f,Child Alana 10 days
1,1847-04-08,John,Sanin (?),19.0,recent emigrant,laborer,m,Catherine 2 mo
2,1847-04-17,Anthony,Clark,60.0,recent emigrant,laborer,m,Charles Riley afed 10 days
3,1847-04-08,Lawrence,Feeney,32.0,recent emigrant,laborer,m,Child
4,1847-04-13,Henry,Joyce,21.0,recent emigrant,,m,Child 1 mo
5,1847-04-14,Bridget,Hart,20.0,recent emigrant,spinster,f,Child
6,1847-04-14,Mary,Green,40.0,recent emigrant,spinster,f,And child 2 months
7,1847-04-19,Daniel,Loftus,27.0,destitution,laborer,m,
8,1847-04-10,James,Day,35.0,recent emigrant,laborer,m,
9,1847-04-10,Margaret,Farrell,30.0,recent emigrant,widow,f,


```{margin} The Bellevue Almshouse Dataset 
The Bellevue Almshouse Dataset includes information about Irish-born immigrants who were admitted to the almshouse in the 1840s. The Bellevue Almshouse was part of New York City's public health system, a place where poor, sick, homeless, and otherwise marginalized people were sent — sometimes voluntarily and sometimes forcibly. This dataset was transcribed from the almshouse's own admissions records by Anelise Shrout.
```

We're using the [Bellevue Almshouse Dataset](https://www.nyuirish.net/almshouse/the-almshouse-records/) to practice dictionaries because we want to think deeply about the consequences of reducing human life to data even at this early stage in our Python journey. This immigration data, as Shrout argues in her essay ["(Re)Humanizing Data: Digitally Navigating the Bellevue Almshouse,"](https://crdh.rrchnm.org/essays/v01-10-(re)-humanizing-data/) was "produced with the express purpose of reducing people to bodies; bodies to easily quantifiable aspects; and assigning value to those aspects which proved that the marginalized people to who they belonged were worth less than their elite counterparts."

___

## Dictionary

When we used lists with the Bellevue Almshouse data, it was easier than individually assigning individual variables. We could put multiple names into a single list and multiple ages in a single list.

By using a Python data collection type called a *dictionary*, we can go even further and group each person's name, age, and profession into a single collection.

**Indivudal Variables**

In [None]:
person1_name = 'Mary Gallagher'
person2_name = 'John Sanin (?)'
person1_age = 18
person2_age = 19

**Lists**

In [9]:
names = ['Mary Gallagher', 'John Sanin(?)', 'Anthony Clark', 'Margaret Farrell']
ages = [28, 19, 60, 30]
professions = ['married', 'laborer', 'laborer', 'widow']

**Dictionary**

In [31]:
person1 = {"name": "Mary Gallagher",
             "age": 28,
             "profession": "married"}

In [2]:
person1 = {"name": "Mary Gallagher",
             "age": 28,
             "profession": "married"}
type(person1)

dict

In [3]:
person2 = {"name": "John Sanin(?)",
             "age": 19,
             "profession": "laborer"}
type(person2)

dict

## Key-Value

A dictionary is made up of "key"-"value" pairs, which are separated by a colon `:` and separated from other key-value pairs by a comma `,`. A dictionary is always enclosed by curly brackets `{}`. 

In [31]:
person1 = {"name": "Mary Gallagher",
             "age": 28,
             "profession": "married"}

You can check all the keys in a dictionary by using the `.keys()` method or all the values in a dictionary by using the `.values()` method.

In [28]:
person1.keys()

dict_keys(['name', 'age', 'profession'])

In [29]:
person1.values()

dict_values(['Mary Gallagher', 28, 'married'])

## Access Items

You can access a value in a dictionary by using square brackets `[]` and its key name (kind of like how we indexed a string or a list).

In [8]:
person1["name"]

'Mary Gallagher'

In [5]:
person1["age"]

28

In [9]:
person1["profession"]

'married'

## Change Item

You can change a value in a dictionary by re-assigning a new value to a dictionary key.

In [34]:
person1["age"] = 100

In [35]:
person1

{'name': 'Mary Gallagher', 'age': 100, 'profession': 'married'}

In [37]:
person1['profession'] = 'spinster'

In [38]:
person1

{'name': 'Mary Gallagher', 'age': 100, 'profession': 'spinster'}

## Nested Dictionary

You can also nest a dictionary inside another dictionary.

In [65]:
bellevue_people = {
                "person1":
                  {"name": "Mary Gallagher",
                   "age": 28,
                   "profession": "married"},
                "person2":
                  {"name": "John Sanin(?)",
                   "age": 19,
                   "profession": "laborer"}
                }

In [51]:
bellevue_people['person1']

{'name': 'Mary Gallagher', 'age': 28, 'profession': 'married'}

In [53]:
bellevue_people['person1']['name']

'Mary Gallagher'

In [52]:
bellevue_people['person2']

{'name': 'John Sanin(?)', 'age': 19, 'profession': 'laborer'}

In [None]:
bellevue_people['person2']['age']

## Iterate Through Dictionary

In [48]:
for person in bellevue_people.keys():
    print(person)

person_1
person_2


In [49]:
for person in bellevue_people.values():
    print(person)

{'name': 'Mary Gallagher', 'age': 28, 'profession': 'married'}
{'name': 'John Sanin(?)', 'age': 19, 'profession': 'laborer'}


In [60]:
for person in bellevue_people.values():
    if person['age'] > 20:
        name = person['name']
        age = person['age']
        print(f'{name} is more than 20 years old. She is {age}.')

Mary Gallagher is more than 20 years old. She is 28.


In [47]:
for person in bellevue_people.items():
    print(person)

('person_1', {'name': 'Mary Gallagher', 'age': 28, 'profession': 'married'})
('person_2', {'name': 'John Sanin(?)', 'age': 19, 'profession': 'laborer'})


## Exercise 1

In [None]:
movie = {'title': 'Selma',
         'site': 'http://www.imdb.com/title/tt1020072/',
         'country': 'US/UK',
         'year_release': 2014,
         'box_office': '$52.1M',
         'director': 'Ava DuVernay',
         'number_of_subjects': 1,
         'subject': 'Martin Luther King, Jr',
         'type_of_subject': 'Activist',
         'race_known': 'Known',
         'subject_race': 'African American',
         'person_of_color': 1,
         'subject_sex': 'Male', 
        'lead_actor_actress': 'David Oyelowo'}

Print out all the "keys" in the dictionary `movie`

In [5]:
movie = {'title': 'Selma',
         'site': 'http://www.imdb.com/title/tt1020072/',
         'country': 'US/UK',
         'year_release': 2014,
         'box_office': '$52.1M',
         'director': 'Ava DuVernay',
         'number_of_subjects': 1,
         'subject': 'Martin Luther King, Jr',
         'type_of_subject': 'Activist',
         'race_known': 'Known',
         'subject_race': 'African American',
         'person_of_color': 1,
         'subject_sex': 'Male', 
        'lead_actor_actress': 'David Oyelowo'}
movie.keys()

dict_keys(['title', 'site', 'country', 'year_release', 'box_office', 'director', 'number_of_subjects', 'subject', 'type_of_subject', 'race_known', 'subject_race', 'person_of_color', 'subject_sex', 'lead_actor_actress'])

Print out all the "values" in the dictionary `movie`

In [12]:
movie.values()

dict_values(['Selma', 'http://www.imdb.com/title/tt1020072/', 'US/UK', 2014, '$52.1M', 'Ava DuVernay', 1, 'Martin Luther King, Jr', 'Activist', 'Known', 'African American', 1, 'Male', 'David Oyelowo'])

Access the value for the key "director"

In [13]:
movie['director']

'Ava DuVernay'

## Exercise 2

By using the Python library called `pandas`, we can read in the entire "biopics.csv" data from the *538* project and make it into a list of dictionaries.

Don't worry about the `pandas` code at this point. We will get to it in a couple of weeks.

In [15]:
import pandas as pd 
biopics_df = pd.read_csv("/Users/benpost/Documents/GitHub/3844f23-writing-digital-media/book/data/02-python/biopics.csv", encoding='utf-8')
biopics_list = biopics_df.to_dict('records')

In [16]:

biopics_list

[{'title': '10 Rillington Place',
  'site': 'http://www.imdb.com/title/tt0066730/',
  'country': 'UK',
  'year_release': 1971,
  'box_office': '-0',
  'director': 'Richard Fleischer',
  'number_of_subjects': 1,
  'subject': 'John Christie',
  'type_of_subject': 'Criminal',
  'race_known': 'Unknown',
  'subject_race': nan,
  'person_of_color': 0,
  'subject_sex': 'Male',
  'lead_actor_actress': 'Richard Attenborough'},
 {'title': '12 Years a Slave',
  'site': 'http://www.imdb.com/title/tt2024544/',
  'country': 'US/UK',
  'year_release': 2013,
  'box_office': '$56.7M',
  'director': 'Steve McQueen',
  'number_of_subjects': 1,
  'subject': ' Solomon Northup',
  'type_of_subject': 'Other',
  'race_known': 'Known',
  'subject_race': 'African American',
  'person_of_color': 1,
  'subject_sex': 'Male',
  'lead_actor_actress': 'Chiwetel Ejiofor'},
 {'title': '127 Hours',
  'site': 'http://www.imdb.com/title/tt1542344/',
  'country': 'US/UK',
  'year_release': 2010,
  'box_office': '$18.3M',
 

In [10]:
type(biopics_list)

list

In [14]:

type(biopics_list[0])


[{'title': '10 Rillington Place', 'site': 'http://www.imdb.com/title/tt0066730/', 'country': 'UK', 'year_release': 1971, 'box_office': '-0', 'director': 'Richard Fleischer', 'number_of_subjects': 1, 'subject': 'John Christie', 'type_of_subject': 'Criminal', 'race_known': 'Unknown', 'subject_race': nan, 'person_of_color': 0, 'subject_sex': 'Male', 'lead_actor_actress': 'Richard Attenborough'}, {'title': '12 Years a Slave', 'site': 'http://www.imdb.com/title/tt2024544/', 'country': 'US/UK', 'year_release': 2013, 'box_office': '$56.7M', 'director': 'Steve McQueen', 'number_of_subjects': 1, 'subject': ' Solomon Northup', 'type_of_subject': 'Other', 'race_known': 'Known', 'subject_race': 'African American', 'person_of_color': 1, 'subject_sex': 'Male', 'lead_actor_actress': 'Chiwetel Ejiofor'}, {'title': '127 Hours', 'site': 'http://www.imdb.com/title/tt1542344/', 'country': 'US/UK', 'year_release': 2010, 'box_office': '$18.3M', 'director': 'Danny Boyle', 'number_of_subjects': 1, 'subject': 

Loop through this list of dictionaries (`biopics_list`) and print out the movie title and release year for all the movies that featured an "African American" `subject`. Print out the movie *title* and *release year* with a "//" in between them.

In [41]:
import pandas as pd 
biopics_df = pd.read_csv("/Users/benpost/Documents/GitHub/3844f23-writing-digital-media/book/data/02-python/biopics.csv", encoding='utf-8')
biopics_list = biopics_df.to_dict('records')
for movie in biopics_list:
    if movie["subject_race"] == "African American":
        print(movie["title"] + " // " + str(movie["year_release"]))
    

12 Years a Slave // 2013
42 // 2013
Ali // 2001
American Gangster // 2007
Antwone Fisher // 2002
Baadasssss! // 2003
Blue Caprice // 2013
Cadillac Records // 2008
Cadillac Records // 2008
Frankie & Alice // 2010
Fruitvale Station // 2013
Get on Up // 2014
Get Rich or Die Tryin' // 2005
Greased Lightning // 1977
Jo Jo Dancer, Your Life Is Calling // 1986
Lady Sings the Blues // 1972
Lee Daniels' The Butler // 2013
Malcolm X // 1992
Men of Honor // 2000
Phantom Punch // 2008
Radio // 2003
Ray // 2004
Remember the Titans // 2000
Selma // 2014
Talk to Me // 2007
The Blind Side // 2009
The Express // 2008
The Great Debaters // 2007
The Greatest // 1977
The Hurricane // 1999
The Jackie Robinson Story // 1950
The Longshots // 2008
The Pursuit of Happyness // 2006
The Soloist // 2009
Why Do Fools Fall in Love // 1998


Now, choose a different **"subject_race"** and print out all the *titles* and *release years* for those movies. 

Print out the movie title and release year with a "//" in between them.

Here's a list of values to consider:

- White  
- [blank]  
- African American  
- Multi racial  
- Hispanic (Latin American)  
- Middle Eastern (White)  
- Middle Eastern  
- African  
- Hispanic (White)  
- Hispanic (Latino)  
- Asian  
- Native American  
- Asian American  
- Indian  
- Caribbean  
- Mediterranean  
- Eurasian  
- Hispanic (Latina)

In [39]:
for movie in biopics_list:
        if movie["subject_race"] == "Indian":
            print(movie["title"] + " // " + str(movie["year_release"]))

Bandit Queen // 1994
Gandhi // 1982


## Exercise 3

The Eviction Lab makes its data [available for download here](https://data-downloads.evictionlab.org/). By using the Python library called pandas, we can read in eviction data about cities in the state of New York and make it into a list of dictionaries.

Don't worry about the pandas code at this point. We will get to it in a couple of weeks.

In [4]:
import pandas as pd
cities_df = pd.read_csv('./../data/02-python/ny_cities_eviction.csv', encoding='utf-8')
cities_list = cities_df.to_dict('records')
cities_list

[{'GEOID': 3600155,
  'year': 2000,
  'name': 'Accord',
  'parent-location': 'New York',
  'population': 622.0,
  'poverty-rate': 4.15,
  'renter-occupied-households': nan,
  'pct-renter-occupied': 27.88,
  'median-gross-rent': 650.0,
  'median-household-income': 52083.0,
  'median-property-value': 98700.0,
  'rent-burden': 19.0,
  'pct-white': 90.19,
  'pct-af-am': 2.41,
  'pct-hispanic': 3.86,
  'pct-am-ind': 0.64,
  'pct-asian': 1.45,
  'pct-nh-pi': 0.0,
  'pct-multiple': 1.45,
  'pct-other': 0.0,
  'eviction-filings': nan,
  'evictions': nan,
  'eviction-rate': nan,
  'eviction-filing-rate': nan,
  'low-flag': 0,
  'imputed': 0,
  'subbed': 0},
 {'GEOID': 3600155,
  'year': 2001,
  'name': 'Accord',
  'parent-location': 'New York',
  'population': 622.0,
  'poverty-rate': 4.15,
  'renter-occupied-households': nan,
  'pct-renter-occupied': 27.88,
  'median-gross-rent': 650.0,
  'median-household-income': 52083.0,
  'median-property-value': 98700.0,
  'rent-burden': 19.0,
  'pct-whit

Loop through this list of dictionaries and print out the *year* and *number of evictions* for the city of *Ithaca* (note the `name` key).

Print out the year and number of evictions with a custom f-string.

In [33]:
import pandas as pd
cities_df = pd.read_csv('./../data/02-python/ny_cities_eviction.csv', encoding='utf-8')
cities_list = cities_df.to_dict('records')
cities_list
for record in cities_list:
        if record["name"] == "Ithaca":
                year = record["year"]
                evictions = record["evictions"]
                print(f"year: {year}, Number of Evictions: {evictions}")


year: 2000, Number of Evictions: 0.0
year: 2001, Number of Evictions: 0.0
year: 2002, Number of Evictions: 0.0
year: 2003, Number of Evictions: 0.0
year: 2004, Number of Evictions: 0.0
year: 2005, Number of Evictions: 0.0
year: 2006, Number of Evictions: 0.0
year: 2007, Number of Evictions: 0.0
year: 2008, Number of Evictions: 0.0
year: 2009, Number of Evictions: 1.47
year: 2010, Number of Evictions: 6.2
year: 2011, Number of Evictions: 9.63
year: 2012, Number of Evictions: 11.98
year: 2013, Number of Evictions: 9.22
year: 2014, Number of Evictions: 4.47
year: 2015, Number of Evictions: 8.73
year: 2016, Number of Evictions: 6.47
