**Dictionaries** 

Objectives:

*   What is a dictionary?
*   What are the advantages of dictionaries?
*   How do I use the content of a dictionary?
  - Examining
  - Modifying
  - Iterating
  - Methods






**What is a dictionary?**

A dictionary is a collection of organized elements in pairs of keys:values.

In [None]:
poets_dict = {"name": "Forough Farrokhzad", 
            "year of birth": 1935, 
            "year of death": 1967, 
            "place of birth": "Iran", 
            "language": "Persian", 
            "works": ["Remembrance of a Day","Unison","The Shower of Your Hair","Portrait of Forough"]}

for this reason, dictionaries are also called key-value pairs.

In this example, 

the **keys** are **name, year of birth, year of death, place of birth, language and works**, and

everything after the colon are the **values** assigned to that key: **Forough Farrokhzad, 1935, 1967, Iran, Persian, and ["Remembrance of a Day","Unison","The Shower of Your Hair","Portrait of Forough"]** 

Dictionaries are defined with curly brackets holding the key-value pairs, written in the forman key:value and separated by comas. 

The general syntax of a dictionary is the following:

In [None]:
Developmental_Context = {
  "state": "NY",
  "City": "Ithaca",
  "Zip Code": 14850,
  "Colleges": ["Cornell University", "Ithaca College"]
}

**When should I use dictionaries and when should I use lists?**

If the data you are storing is complex and hierarchical, the dictionary's key / value structure is very helpful. This is the advantage of dictionaries.

Keys  must be unique (there cannot be duplicates of the dictionary with the same key) and they cannot be changed. 


Values, on the other hand, can be anything, including strings, intergers, booleans, lists of them or even other dictionaries. 

Let's see an example with different data types: strings, booleans, integers and lists.

In [None]:
Developmental_Context = {
  "City": "Ithaca",
  "Urban": False,
  "Year": 2021,
  "Colleges": ["Cornell University", "Ithaca College"]
}

Here is another example of a dictionary called *valid_dict* containing two other dictionaries *dict_nums* and *dict_ints*.

In [None]:
valid_dict = {'dict_nums':{1:'one', 2:'two', 3:'three'},
             'dict_ints':{'one':1, 'two':2, 'three':3}}

In this case *'dict_nums'* and *'dict_ints'* are both values of a dictionary, and a dictionary themselves.

While dictionaries can be values in other dictionaries, they cannot be keys:

In [None]:
invalid_dict = {{1:'one', 2:'two', 3:'three'}:'dict_nums',
             {'one':1, 'two':2, 'three':3}:'dict_ints'}

**Examining a Dictionary**

You can use the function *print* and the methods *.values* and *.keys* to see the content of your dictionary.

Let's use this on our first example:

In [None]:
print(poets_dict)

{'name': 'Forough Farrokhzad', 'year of birth': 1935, 'year of death': 1967, 'place of birth': 'Iran', 'language': 'Persian', 'works': ['Remembrance of a Day', 'Unison', 'The Shower of Your Hair', 'Portrait of Forough']}


In [None]:
print(poets_dict.values())

dict_values(['Forough Farrokhzad', 1935, 1967, 'Iran', 'Persian', ['Remembrance of a Day', 'Unison', 'The Shower of Your Hair', 'Portrait of Forough']])


In [None]:
print(poets_dict.keys())


dict_keys(['name', 'year of birth', 'year of death', 'place of birth', 'language', 'works'])


Now let's use these two methods on our example of dictionary containing two dictionaries.

In [None]:
print(valid_dict.keys())

dict_keys(['dict_nums', 'dict_ints'])


In [None]:
print(valid_dict.values())

dict_values([{1: 'one', 2: 'two', 3: 'three'}, {'one': 1, 'two': 2, 'three': 3}])


You may want your notebook to show you a specific element from your dictionary. Use its name to call it like so:

In [None]:
poets_dict["language"]

'Persian'

Or you may want it to show you the last element of a list in your dictionary:

In [None]:
poets_dict["works"][-1]

'Portrait of Forough'

**Modifying a Dictionary**

As we said before, keys cannot be modified.

If you try to duplicate a key, the latter one will overwrite the previous one:

In [None]:
Developmental_Context = {
  "City": "Ithaca",
  "Urban": False,
  "Year": 2021,
  "Year": 2020
  }

In [None]:
print(Developmental_Context)

{'City': 'Ithaca', 'Urban': False, 'Year': 2020}


Though one specific keys cannot be modified, more keys can be added:

In [None]:
poets_dict["gender"] = "Female"
print(poets_dict)

{'name': 'Forough Farrokhzad', 'year of birth': 1935, 'year of death': 1967, 'place of birth': 'Iran', 'language': 'Farsi', 'works': ['Remembrance of a Day', 'Unison', 'The Shower of Your Hair', 'Portrait of Forough'], 'gender': 'Female'}


The values in a dictionary instead are mutable. You can change them using the same syntax as before:

In [None]:
poets_dict["language"] = "Farsi"
print(poets_dict)

{'name': 'Forough Farrokhzad', 'year of birth': 1935, 'year of death': 1967, 'place of birth': 'Iran', 'language': 'Farsi', 'works': ['Remembrance of a Day', 'Unison', 'The Shower of Your Hair', 'Portrait of Forough']}


The name of the dictionary however can be assigned to different variables. When you do this, either variable will refere to the same dictionary.

In [None]:
my_new_developmental_context = Developmental_Context

In [None]:
print(my_new_developmental_context)

{'City': 'Ithaca', 'Urban': False, 'Year': 2020}


In [None]:
print(Developmental_Context)

{'City': 'Ithaca', 'Urban': False, 'Year': 2020}


If you have more than one variable referring to the same dictionary, either of them will compute the change:

In [None]:
Developmental_Context["year"] = 2022
print(Developmental_Context)

{'City': 'Ithaca', 'Urban': False, 'Year': 2020, 'year': 2022}


In [None]:
Developmental_Context["Urban"] = True
print(Developmental_Context)

{'City': 'Ithaca', 'Urban': True, 'Year': 2020, 'year': 2022}


**Iterating with Dictionaries**

When you need to do several changes in your dictionary, loops can help.

For example, you could use a look to have your notebook print not just one but all the keys aong with their values in your dictionary:

In [None]:
#printpairs
d = {'apples': 0.49, 'oranges': 0.99, 'pears': 1.49, 'bananas': 0.32}

for key in d.keys():
    print(key, d[key])

apples 0.49
oranges 0.99
pears 1.49
bananas 0.32


Now let's say you wish you correct all the prices in this list by adding the tax value to each:

In [None]:
#addtax
d = {'apples': 0.49, 'oranges': 0.99, 'pears': 1.49, 'bananas': 0.32}

for key in d.keys():
    d[key] = round(1.05 * d[key], 2)

print(d)

**Other methods and functons for Dictionaries**

The method *.items* can come in handy with dictionaries. What do you think it does?

In [None]:
poets_dict.items()

dict_items([('name', 'Forough Farrokhzad'), ('year of birth', 1935), ('year of death', 1967), ('place of birth', 'Iran'), ('language', 'Farsi'), ('works', ['Remembrance of a Day', 'Unison', 'The Shower of Your Hair', 'Portrait of Forough']), ('gender', 'Female')])

If you want to know how many pairs there are in a dictionary, use *len*:

In [None]:
print(len(poets_dict))

7


The in operator works both for lists and dictionaries. 

It will allow you to check if an element is contained within a list or a dictionary.

In [None]:
l = ["Afghanistan", "Canada", "Sierra Leone", "Denmark", "Japan"]
d = {'apples': 0.49, 'oranges': 0.99, 'pears': 1.49, 'bananas': 0.32}

print('Canada' in l)
print('grapefruit' in d)
print('grapefruit' not in d)

True
False
True


**Practice**

Using the dictionary below and a for loop, calculate how much it'll cost you to buy 2 pieces of each fruit.

In [None]:
d = {'apples': 0.49, 'oranges': 0.99, 'pears': 1.49, 'bananas': 0.32}

-Break-

## Files

Learning Objectives:

- "Learn the Python way of reading in files."
- "Understand how to read/write text files and csv files."

Example file:

- We are going to use an example dataset retrieved from Stack Overflow.

- Stack Overflow is a question and answer website. Users can ask and answer questions related to programming. Here is an example question and all the answers:https://stackoverflow.com/questions/4/how-to-convert-a-decimal-to-a-double-in-c

- Here is an overview of the data schema: https://meta.stackexchange.com/questions/2677/database-schema-documentation-for-the-public-data-dump-and-sede



## Reading from a file
Reading a file requires three steps:

- Opening the file: `open` function
- Reading the file: `read` function
- Closing the file: `close` function

In [None]:
from google.colab import files ## wil change this part when we move to binder environment
uploaded = files.upload()

Saving so_question_title.csv to so_question_title.csv
Saving so_question_untitle.csv to so_question_untitle.csv


In [None]:
my_file = open("so_question_title.csv", "r")
text = my_file.read()
my_file.close()

print(text)

- However, use the with open syntax and this will automatically close files for you.
- The 'r' indicates that you are reading the file, as opposed to, say, writing to it.

In [None]:
# better code
with open('so_question_title.csv', 'r') as my_file:
    text = my_file.read()
    
print(text)

#note that we are also reading the title as the first line and we will deal with this issue later

## Reading a file as a list
- Very often we want to read in a file line by line, storing those lines as a list.
- To do that, we can use a `for` loop over the file object:

In [None]:
stored = []
with open('so_question_title.csv', 'r') as my_file:
    for line in my_file:
        stored.append(line)

In [None]:
stored

Remember that the variable name can be anything. It does not have to be `line`. Files are simply always read line by line.

We can use the `strip` method to get rid of those line breaks at the end

In [None]:
stored = []
with open('so_question_title.csv', 'r') as my_file:
    for line in my_file:
        stored.append(line.strip())

In [None]:
stored

## Read certain lines in file
- we can pick certain lines to read using the enumerate function
- we can use this method to drop the title line

`Enumerate()` method adds a counter to an iterable and returns it in a form of enumerating object. This enumerated object can then be used directly for loops or converted into a list of tuples using the list() method.

In [None]:
list1 = ["Sam","Jonathan","Xiaomeng"]

for ele in enumerate(list1):
    print (ele)

In [None]:
stored = []
with open('so_question_title.csv', 'r') as my_file:
    for i, line in enumerate(my_file):  # use enumerate to count which line we are on
      if i>1:
        stored.append(line.strip())
      else:
        continue # do nothing when identify the first line (i==1), i.e. do not read the first line

In [None]:
stored # now the title line has been removed!

## Excercise

Read from line 7 to line 20

In [None]:
stored = []
with open('so_question_title.csv', 'r') as my_file:
    for i, line in enumerate(my_file):
      if _______:
        stored.append(line.strip())
      else:
        continue 

## Writing to a file
We can use the with `open` syntax for writing files as well.

In [None]:
# this is okay...
new_file = open("output.csv", "w")
bees = ['bears', 'beets', 'Battlestar Galactica']
for i in bees:
    new_file.write(i + '\n')
new_file.close()

In [None]:
# but this is better, can anyone tell why?
bees = ['bears', 'beets', 'Battlestar Galactica']
with open('output.csv', 'w') as new_file:
    for i in bees:
        new_file.write(i + '\n')

Let's take a look at the file we wrote.

## Using the CSV Module

- It is built specifically for reading csv files.
- It has nice functions such as `.DictReader`, `.DictWriter` that help read and write csv files more efficiently.

In [None]:
import csv

Let's first try the same csv file 'so_question_title', with title in the first line.

In [None]:
questions = [] # make empty list
with open('so_question_title.csv', 'r') as csvfile: # open file
    reader = csv.DictReader(csvfile) # create a reader
    for row in reader: # loop through rows
        questions.append(row) # append each row to the list

In [None]:
questions[:5] 

In [None]:
# get the keys in each dictionary
keys = questions[1].keys()
keys

How about using an untitled csv file?

In [None]:
#read csv and read into a list of dictionaries
questions = [] # make empty list
with open('so_question_untitle.csv', 'r') as csvfile: # open file
    reader = csv.DictReader(csvfile) # create a reader
    for row in reader: # loop through rows
        questions.append(row) # append each row to the list

In [None]:
questions[:5] # what are the problems?

Let's check the list of dictionaries of this untitled csv file.

In [None]:
# get the keys in each dictionary
keys = questions[1].keys()
keys

# List Comprehensions

**Learning Objectives**:
- Understand the syntax of list comprehensions and how they can make your code cleaner and more compact

## Motivation: List Comprehensions are another way of doing loops with accumulation

Recall that in Part 1 of our Python series, we talked about using the *accumulator pattern* with loops to transform list elements. As a quick reminder, the accumulator pattern lets us do something to each element in a list and store ("accumulate") the results. We specifically looked at the following example of squaring every element in a list:

In [None]:
values = [1, 2, 3, 4, 8, 9, 10]

squared_values = []
for x in values:
    squared_values.append(x**2)

print(squared_values)

[1, 4, 9, 16, 64, 81, 100]


But this is a lot of code to do something that is conceptually rather simple! It took us three lines: one to initialize the accumulator variable, one to start the `for` loop, and one for the body of the loop. Thankfully, Python offers a way to do all three of these things in a single line of code! It is called a *list comprehension*:

In [None]:
squared_values = [x**2 for x in values]

print(squared_values) # we get the same result!

[1, 4, 9, 16, 64, 81, 100]


In terms of syntax, you can see that the list comprehension looks a lot like the original `for` loop, just in a slightly different order. The computation (in this case, `x**2`) comes *before* the `for` syntax, there is no ending colon, and the whole thing is inside the square brackets.

## List comprehensions can incorporate conditional logic

Recall that another thing we did with loops was to combine them with conditionals to filter list elements. For example, the following code takes a list of ages and keeps only the elders (defined as those above 50):

In [None]:
ages = [20, 43, 12, 88, 97]
filtered = []
for age in ages:
    if age > 50: # we want to select only the elders
        filtered.append(age)
print(filtered) # filtered only contains the two elders, ages 88 and 97

[88, 97]


We can do this with a list comprehension as well! All we need to do is add the `if` condition after the `for`:

In [None]:
filtered = [age for age in ages if age > 50]

print(filtered) # we get the same result!

[88, 97]


A word of warning, however: the conditions inside a list comprehension can only involve a single `if` statement; they do **not** support `else` and `elif`. This means that while conditionals inside list comprehensions are useful for basic filtering operations, they can't be used for more complicated conditional branching; for those you will still need to stick with loops.

There are several advantages to list comprehensions, most obvious being cleaner, more readable code. Less obvious is that list comprehensions are actually calculated faster than `for` loops! In general, it is advised that you should always prefer list comprehensions when possible, and only use loops when you need to do something more complicated than what list comprehensions can handle.

##  Exercise 1: convert loops

Convert the following code to list comprehensions:

In [None]:
# Square elements greater than 4
a = [3, 4, 5]
b = []
for i in a:
    if i > 4:
        b.append(i**2)

In [None]:
# Put your solution here:


In [None]:
# Add three to all list members.
a = [3, 4, 5]
for i in range(len(a)):
    a[i] += 3

In [None]:
# Put your solution here:


## Challenge Exercise: write a list comprehension from a verbal description

Write a list comprehension that does the following: for each elder (age greater than 50) in the `ages` list, compute their *birth year*, assuming that the ages were recorded in 2022.


In [None]:
ages = [20, 43, 12, 88, 97]
# Put your solution below:


## Dictionary comprehensions

The comprehensions syntax isn't just limited to lists! You can also use a very similar syntax to build *dictionaries*! The only difference is that you need to use the key: value syntax in your computation. For example, consider the following code that uses a loop to filter a dictionary of ages:

In [None]:
ages = {'Kathy': 20, 'Karthik': 43, 'Fernando': 12, 'Lin': 88, 'Eva': 97}
elders = {}
for name in ages.keys():
    if ages[name] > 50:
        elders[name] = ages[name]
print(elders)

In comprehension syntax, it would look like this:

In [None]:
elders = {name: ages[name] for name in ages.keys() if ages[name] > 50}

print(elders)

## Exercise 2: computations using dictionary comprehensions

Earlier in this workshop, we used a `for` loop to iterate over a dictionary of prices and compute the total price with sales tax applied for each item (the code is copied again below for reference). Rewrite this code using a dictionary comprehension.

In [None]:
# code copied from earlier in the workshop for reference
d = {'apples': 0.49, 'oranges': 0.99, 'pears': 1.49, 'bananas': 0.32}

for key in d.keys():
    d[key] = round(1.05 * d[key], 2)

print(d)

In [None]:
# Put your solution here:
