<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

## Python Review With Movie Data (Oliver's solutions)


### 1) Load the provided list of `movies` dictionaries.

So we start with a list of dictionaries. Remember:

 - lists are ordered so `movies[0]` will always return the first and `movies[-1]` will return the last. If it has an order then it also means we can sort it too.
 - dictionaries are unordered, they just take a key `movies[0]['name']`

In [1]:
# List of movies dictionaries:

movies = [
{
"name": "Usual Suspects", 
"imdb": 7.0,
"category": "Thriller"
},
{
"name": "Hitman",
"imdb": 6.3,
"category": "Action"
},
{
"name": "Dark Knight",
"imdb": 9.0,
"category": "Adventure"
},
{
"name": "The Help",
"imdb": 8.0,
"category": "Drama"
},
{
"name": "The Choice",
"imdb": 6.2,
"category": "Romance"
},
{
"name": "Colonia",
"imdb": 7.4,
"category": "Romance"
},
{
"name": "Love",
"imdb": 6.0,
"category": "Romance"
},
{
"name": "Bride Wars",
"imdb": 5.4,
"category": "Romance"
},
{
"name": "AlphaJet",
"imdb": 3.2,
"category": "War"
},
{
"name": "Ringing Crime",
"imdb": 4.0,
"category": "Crime"
},
{
"name": "Joking muck",
"imdb": 7.2,
"category": "Comedy"
},
{
"name": "What is the name",
"imdb": 9.2,
"category": "Suspense"
},
{
"name": "Detective",
"imdb": 7.0,
"category": "Suspense"
},
{
"name": "Exam",
"imdb": 4.2,
"category": "Thriller"
},
{
"name": "We Two",
"imdb": 7.2,
"category": "Romance"
}
]

---

### 2) Filtering data by IMDb score.

#### 2.1)

Write a function that:

1) Accepts a single movie dictionary from the `movies` list as an argument.
2) Returns `True` if the IMDb score is greater than 5.5.

In [2]:
# Function takes a single argument
# We expect that the movie has a key called `imdb`
# Just check it's greater than 5.5
def movie_is_decent(movie):
    return movie['imdb'] > 5.5

Lets try it out!

I'll iterate over all the movies and check whether each one is any good. Then I'll print a string depending on the answer.

In [3]:
for movie in movies:
    if movie_is_decent(movie):
        print(movie['name'], 'is decent')
    else:
        print(movie['name'], 'is total crap')

Usual Suspects is decent
Hitman is decent
Dark Knight is decent
The Help is decent
The Choice is decent
Colonia is decent
Love is decent
Bride Wars is total crap
AlphaJet is total crap
Ringing Crime is total crap
Joking muck is decent
What is the name is decent
Detective is decent
Exam is total crap
We Two is decent


#### 2.2 [Challenge])

Write a function that:

1) Accepts the `movies` list and a specified category.
2) Returns `True` if the average score of the category is higher than the average score of all movies.

**Oliver notes**

At a high level, we are performing two operations:
1. A filter! We filter down the list to a subset of movies that matches some criteria (in this case checking the category)
2. A reduction! This means we take a list of movies and reduce it to a single number (in this case its the average score)

Remember there are always 100 ways to solve a problem. In python that's also true. In general you should chose the solution that is **the easiest to read!!!** this is the only criteria you should care about.

I'm going to show you two approaches, you can choose which you prefer.

In [4]:
# A list comprehension (my fave)
# https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions
def category_score_1(movies, category_name):
    # Create a new list by filtering the movies
    movies_in_category = [
        movie for movie in movies if movie['category'] == category_name
    ]
    
    # Create a new list just containing the scores
    scores = [movie['imdb'] for movie in movies_in_category]
    
    # Compute the average
    average_score = sum(scores) / len(scores)
    return average_score

In [5]:
category_score_1(movies, 'Romance')

6.44

In [6]:
# Append in a for loop
def category_score_2(movies, category_name):
    # Create a new list by filtering the movies
    movies_in_category = []
    for movie in movies:
        if movie['category'] == category_name:
            movies_in_category.append(movie)
    # Create a new list just containing the scores
    scores = []
    for movie in movies_in_category:
        scores.append(movie['imdb'])
        
    # Compute the average
    average_score = sum(scores) / len(scores)
    return average_score

In [7]:
category_score_2(movies, 'Romance')

6.44

**Oliver note**

Both of these approaches can be reduced even further (filter the movies and extract the imbd score in the same line). Up to you if you prefer that!

---

### 3) Creating subsets by numeric condition.

#### 3.1)

Write a function that:

1) Accepts the list of movies and a specified IMDb score.
2) Returns the sublist of movies that have scores greater than the one specified.

**Oliver note**

Have you spotted the pattern this time? It's a filter! So just create a new list with a subset of the movies based on the condition.

Here's another two options using list comprehensions and append respectively. 

In [8]:
def filter_on_score(movies, score):
    return [movie for movie in movies if movie['imdb'] > score]

# OR 

def filter_on_score(movies, score):
    movies_filtered = []
    for movie in movies:
        if movie['imdb'] > score:
            movies_filtered.append(movie)
    return movies_filtered

In [9]:
filter_on_score(movies, 7)

[{'name': 'Dark Knight', 'imdb': 9.0, 'category': 'Adventure'},
 {'name': 'The Help', 'imdb': 8.0, 'category': 'Drama'},
 {'name': 'Colonia', 'imdb': 7.4, 'category': 'Romance'},
 {'name': 'Joking muck', 'imdb': 7.2, 'category': 'Comedy'},
 {'name': 'What is the name', 'imdb': 9.2, 'category': 'Suspense'},
 {'name': 'We Two', 'imdb': 7.2, 'category': 'Romance'}]

#### 3.2 [Expert])

Write a function that:

1) Accepts the `movies` list as an argument.
2) Returns the `movies` list sorted first by category and then by movie according to category average score and individual IMDb score, respectively.


**Oliver note**

Ok this one is hard! Because the problem is more complex, there are **even more possible solutions** - so yours might look quite different. That's ok.

So in this case, I spot two different operations:
1. Enrichment! We want to add some new data to each movie. In this case it's the average of the category
2. Sort! We then want to sort the result.

I think the trick here is figuring out that first pattern. Let's try it out

In [12]:
def ranked_movies_1(movies):
    # Add a new entry to each movie that is the category score we computed earlier
    for movie in movies:
        movie['category_score'] = category_score_1(movies, movie['category'])
    
    # Sort the movies. We'll use the built in sort operation, which can take two keys
    return sorted(movies, key=lambda movie: (movie['category_score'], movie['imdb']))

Let's look at the results! Makes total sense, Dark Knight is a sick film. I need to see `AlphaJet`!

In [13]:
ranked_movies_1(movies)

[{'name': 'AlphaJet', 'imdb': 3.2, 'category': 'War', 'category_score': 3.2},
 {'name': 'Ringing Crime',
  'imdb': 4.0,
  'category': 'Crime',
  'category_score': 4.0},
 {'name': 'Exam', 'imdb': 4.2, 'category': 'Thriller', 'category_score': 5.6},
 {'name': 'Usual Suspects',
  'imdb': 7.0,
  'category': 'Thriller',
  'category_score': 5.6},
 {'name': 'Hitman', 'imdb': 6.3, 'category': 'Action', 'category_score': 6.3},
 {'name': 'Bride Wars',
  'imdb': 5.4,
  'category': 'Romance',
  'category_score': 6.44},
 {'name': 'Love', 'imdb': 6.0, 'category': 'Romance', 'category_score': 6.44},
 {'name': 'The Choice',
  'imdb': 6.2,
  'category': 'Romance',
  'category_score': 6.44},
 {'name': 'We Two',
  'imdb': 7.2,
  'category': 'Romance',
  'category_score': 6.44},
 {'name': 'Colonia',
  'imdb': 7.4,
  'category': 'Romance',
  'category_score': 6.44},
 {'name': 'Joking muck',
  'imdb': 7.2,
  'category': 'Comedy',
  'category_score': 7.2},
 {'name': 'The Help', 'imdb': 8.0, 'category': 'Dram

**Oliver note**

You might be thinking that the solution above is inefficient because we recompute the category score for every film. e.g. we compute the `Thriller` score for every film that is tagged `Thriller`.

Two things:
1. Premature optimisation is the biggest danger in programming. Until it actually creates problems for you, then don't focus on making code faster. Make it more readable!
2. However we can do this too, it involves creating a lookup table first and then enriching the records with that

In [14]:
def ranked_movies_2(movies):
    # Get the unique categories (this is how we reduce the computation)
    # I use a list comprehension to create a new list with all the categories
    # Then we turn it into a set (which only keeps the unique values)
    categories = set([movie['category'] for movie in movies])
    
    # WHOA!!! We can do dict-comprehensions too!
    # This is our lookup table. So the key is the category name
    # and the value is the category score.
    # Note we're only computing the score once per category - we got our optimisation
    category_scores = {
        category_name: category_score_1(movies, category_name)
        for category_name in categories
    }
    
    # Now we enrich the records
    for movie in movies:
        movie['category_score'] = category_scores[movie['category']]
    
    return sorted(movies, key=lambda movie: (movie['category_score'], movie['imdb']))

In [15]:
ranked_movies_2(movies)

[{'name': 'AlphaJet', 'imdb': 3.2, 'category': 'War', 'category_score': 3.2},
 {'name': 'Ringing Crime',
  'imdb': 4.0,
  'category': 'Crime',
  'category_score': 4.0},
 {'name': 'Exam', 'imdb': 4.2, 'category': 'Thriller', 'category_score': 5.6},
 {'name': 'Usual Suspects',
  'imdb': 7.0,
  'category': 'Thriller',
  'category_score': 5.6},
 {'name': 'Hitman', 'imdb': 6.3, 'category': 'Action', 'category_score': 6.3},
 {'name': 'Bride Wars',
  'imdb': 5.4,
  'category': 'Romance',
  'category_score': 6.44},
 {'name': 'Love', 'imdb': 6.0, 'category': 'Romance', 'category_score': 6.44},
 {'name': 'The Choice',
  'imdb': 6.2,
  'category': 'Romance',
  'category_score': 6.44},
 {'name': 'We Two',
  'imdb': 7.2,
  'category': 'Romance',
  'category_score': 6.44},
 {'name': 'Colonia',
  'imdb': 7.4,
  'category': 'Romance',
  'category_score': 6.44},
 {'name': 'Joking muck',
  'imdb': 7.2,
  'category': 'Comedy',
  'category_score': 7.2},
 {'name': 'The Help', 'imdb': 8.0, 'category': 'Dram

**BONUS** was it worth it?

Well it it's 10 microseconds faster on average!

But it also took us an extra 10 mins to write the new function.

We would have to run this function 600,000,000 times to make it worth our effort.

In [40]:
%%timeit
ranked_movies_1(movies)

31.3 µs ± 353 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [41]:
%%timeit
ranked_movies_2(movies)

21.6 µs ± 611 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


---

### 5) Multiple conditions.

#### 5.1)

Write a function that:

1) Accepts the `movies` list and a "search criteria" variable.
2) If the criteria variable is numeric, return a list of movie titles with a score greater than or equal to the criteria.
3) If the criteria variable is a string, return a list of movie titles that match that category (case-insensitive!). If there is no match, return an empty list and print an informative message.

#### 5.2 [Expert])

Write a function that:

1) Accepts the `movies` list and a string search criteria variable.
2) The search criteria variable can contain within it:
  - Boolean operations: `'AND'`, `'OR'`, and `'NOT'` (can have/be lowercase as well, we just capitalized for clarity).
  - Search criteria specified with the syntax `score=...`, `category=...`, and/or `title=...`, where the `...` indicates what to look for.
    - If `score` is present, it indicates scores greater than or equal to the value.
    - For `category` and `title`, the string indicates that the category or title must _contain_ the search string (case-insensitive).
3) Return the matches for the search criteria specified.

---

### 4) Creating subsets by string condition.

#### 4.1)

Write a function that:

1) Accepts the `movies` list and a category name.
2) Returns the movie names within that category (case-insensitive!).
3) If the category is not in the data, prints a message that says it does not exist and returns `None`.

Recall that, to convert a string to lowercase, you can use:

```python
mystring = 'Dumb and Dumber'
lowercase_mystring = mystring.lower()
print lowercase_mystring
'dumb and dumber'
```

#### 4.2 [Challenge])

Write a function that:

1) Accepts the `movies` list and a "search string."
2) Returns a dictionary with the keys `'category'` and `'title'` whose values are lists of categories that contain the search string and titles that contain the search string, respectively (case-insensitive!).