# Wednesday, October 25

## Announcements and Reminders

- Chapter 10 reading: Fixed? apparently not.  sorry.
- Celebration of Mind: TONIGHT!!!
- Exercises for Chapter 10: due Friday
- Read Chapter 11 (due Monday)


## Activity: Data Wrangling with Dictionaries

Today we will continue to explore the `dictionary` type in python and see how we can use it to create a large database.

But first...

### Wrapping Up *The Raven*

Last time we saw how to use a dictionary to count the occurrences of each word in a string.  Here is what we ended with (this is slightly cleaned up from what we actually did).

In [None]:
import string

# Import text as string
with open("raven.txt", "r") as f:
  contents = f.read()

# clean up string: everything lower case; remove punctuation
contents = contents.lower()
for mark in string.punctuation:
  contents = contents.replace(mark,"")

# Convert string to a list of words
word_list = contents.split()

# Create a dictionary accumulating the counts of each word.
word_counts = {}
for word in word_list:
  if word in word_counts:
    word_counts[word] += 1
  else:
    word_counts[word] = 1

print(word_counts)

We can create the dictionary using some simpler code if we use the `.get()` method.

In [None]:
word_counts = {}

for word in word_list:
  word_counts[word] = word_counts.get(word,0)+1


print(word_counts)

If we want to find the item in the dictionary with the largest value, we have a few options:
1. Look through each key and keep track of the largest value
2. Create a list of key-value pairs and then sort them by the second component
3. Create a list of value-key pairs and sort them normally.

In [9]:
max = 0
maxword = ""
for word in word_counts:
  if max < word_counts[word]:
    max = word_counts[word]
    maxword = word

print(max, maxword)

59 the


### Creating a Movie Database

It's movie night.  You want to find a horrible scary movie about computer science to watch with your Mom.  What do you do???

#### Getting some data

Inside this week's folder you will find a `.csv` file called `imdb.csv`.  

Then you should be able to load the csv file with the code below.  We will use the `csv` library to help parse the file.

In [14]:
import csv

# Open the file
with open('imdb.csv', 'r') as f:
  # Use the csv library to read the data
  data = list(csv.reader(f))
  
  
print(data[1546])


['Shanghai Knights', '2003', 'PG-13', '07 Feb 2003', '114 min', 'Action, Adventure, Comedy', 'David Dobkin', 'Alfred Gough (characters), Miles Millar (characters), Alfred Gough, Miles Millar', 'Jackie Chan, Owen Wilson, Aaron Taylor-Johnson, Tom Fisher', "When a Chinese rebel murders Chon's estranged father and escapes to England, Chon and Roy make their way to London with revenge on their minds.", 'English, Mandarin', 'USA, Hong Kong', '4 nominations.', 'https://images-na.ssl-images-amazon.com/images/M/MV5BMTMxMTgwOTI3Nl5BMl5BanBnXkFtZTYwMTI2NDQ3._V1_SX300.jpg', 'Internet Movie Database', '6.2/10', '58', '6.2', '86,082', 'tt0300471', 'movie', '15 Jul 2003', '$60,447,592', 'Touchstone Pictures', 'http://bventertainment.go.com/movies/shanghaiknights', 'True', 'http://www.rottentomatoes.com/m/shanghai_knights/']


In [15]:
print(data[0])

['Title', 'Year', 'Rated', 'Released', 'Runtime', 'Genre', 'Director', 'Writer', 'Actors', 'Plot', 'Language', 'Country', 'Awards', 'Poster', 'Ratings.Source', 'Ratings.Value', 'Metascore', 'imdbRating', 'imdbVotes', 'imdbID', 'Type', 'DVD', 'BoxOffice', 'Production', 'Website', 'Response', 'tomatoURL']


Note that we get a `_csv.reader` object.  Each item in this iterable object is a list containing the elements in a given row of the csv file.  We can print each list one at a time, or just convert the entire data to a list of lists.

We need to do that inside the `with` context; the csv.reader object continues to read from the file, so it needs to stay open.

Now we can manipulate the movie database using lists.  Explore this below.  Find movie you might want to watch.


In [17]:
mymovie = {}
for col in data[0]:
  mymovie[col] = data[1546][data[0].index(col)]
print(mymovie)

{'Title': 'Shanghai Knights', 'Year': '2003', 'Rated': 'PG-13', 'Released': '07 Feb 2003', 'Runtime': '114 min', 'Genre': 'Action, Adventure, Comedy', 'Director': 'David Dobkin', 'Writer': 'Alfred Gough (characters), Miles Millar (characters), Alfred Gough, Miles Millar', 'Actors': 'Jackie Chan, Owen Wilson, Aaron Taylor-Johnson, Tom Fisher', 'Plot': "When a Chinese rebel murders Chon's estranged father and escapes to England, Chon and Roy make their way to London with revenge on their minds.", 'Language': 'English, Mandarin', 'Country': 'USA, Hong Kong', 'Awards': '4 nominations.', 'Poster': 'https://images-na.ssl-images-amazon.com/images/M/MV5BMTMxMTgwOTI3Nl5BMl5BanBnXkFtZTYwMTI2NDQ3._V1_SX300.jpg', 'Ratings.Source': 'Internet Movie Database', 'Ratings.Value': '6.2/10', 'Metascore': '58', 'imdbRating': '6.2', 'imdbVotes': '86,082', 'imdbID': 'tt0300471', 'Type': 'movie', 'DVD': '15 Jul 2003', 'BoxOffice': '$60,447,592', 'Production': 'Touchstone Pictures', 'Website': 'http://bventert

#### Creating a dictionary

We should create a dictionary to hold our data, so it is easier to work with.  Pick a single movie (a single list from the list of lists) and create a dictionary for it.

Actually, why don't we just do that for all the movies.  Make a list of dictionaries, where each dictionary represents a single movie.