<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

## Python Review with Movie Data

_Author: Kiefer Katovich and Dave Yerrington | DSI-SF_

---

In this lab you'll be using the provided imdb `movies` list below as your dataset. 

This lab is designed to practice iteration and funcitons in particular. The normal questions are more gentle, and the challenge questions are suitable for advanced/expert python or programming-experienced students. 

All the questions require writing functions and also use iteration to solve. You should print out a test of each function you write.


### 1. Load the provided list of movies dictionaries.

In [1]:
# List of movies dictionaries:

movies = [
{
"name": "Usual Suspects", 
"imdb": 7.0,
"category": "Thriller"
},
{
"name": "Hitman",
"imdb": 6.3,
"category": "Action"
},
{
"name": "Dark Knight",
"imdb": 9.0,
"category": "Adventure"
},
{
"name": "The Help",
"imdb": 8.0,
"category": "Drama"
},
{
"name": "The Choice",
"imdb": 6.2,
"category": "Romance"
},
{
"name": "Colonia",
"imdb": 7.4,
"category": "Romance"
},
{
"name": "Love",
"imdb": 6.0,
"category": "Romance"
},
{
"name": "Bride Wars",
"imdb": 5.4,
"category": "Romance"
},
{
"name": "AlphaJet",
"imdb": 3.2,
"category": "War"
},
{
"name": "Ringing Crime",
"imdb": 4.0,
"category": "Crime"
},
{
"name": "Joking muck",
"imdb": 7.2,
"category": "Comedy"
},
{
"name": "What is the name",
"imdb": 9.2,
"category": "Suspense"
},
{
"name": "Detective",
"imdb": 7.0,
"category": "Suspense"
},
{
"name": "Exam",
"imdb": 4.2,
"category": "Thriller"
},
{
"name": "We Two",
"imdb": 7.2,
"category": "Romance"
}
]

---

### 2. Filtering data by IMDB score

#### 2.1 

Write a function that:

1. Accepts a single movie dictionary from the `movies` list as an argument.
2. Returns True if the IMDB score is above 5.5.



In [2]:
# Your code here.

def has_good_score(movie):
    return movie["imdb"]>5.5

In [3]:
assert has_good_score(movies[0]) == True
assert has_good_score(movies[7]) == False

#### 2.2 [Challenge] 

Write a function that:

1. Accepts the movies list and a specified category.
2. Returns True if the average score of the category is higher than the average score of all movies.

In [30]:
def check_category(movies, category):
    # calculate the category average
    movies_category_ratings = [m["imdb"] for m in movies if m["category"]==category]
    average_score_category = sum(movies_category_ratings) / len(movies_category_ratings)
    
    # calculate the overall average
    movies_all_ratings = [m["imdb"] for m in movies]
    average_score_all = sum(movies_all_ratings) / len(movies_all_ratings)
    
    print('Category avg: {}. Overall avg: {}.'.format(average_score_category,average_score_all))
    return average_score_category>average_score_all

In [31]:
check_category(movies, "Romance")

Category avg: 6.44. Overall avg: 6.486666666666667.


False

In [32]:
check_category(movies, "Suspense")

Category avg: 8.1. Overall avg: 6.486666666666667.


True

---

### 3. Creating subsets by numeric condition

#### 3.1

Write a function that:

1. Accepts the list of movies and a specified imdb score.
2. Returns the sublist of movies that have a score greater than the specified score.



In [7]:
def filter_on_score(movies, score):
    return [m for m in movies if m["imdb"]>score]

In [8]:
filter_on_score(movies, score=7.2)

[{'name': 'Dark Knight', 'imdb': 9.0, 'category': 'Adventure'},
 {'name': 'The Help', 'imdb': 8.0, 'category': 'Drama'},
 {'name': 'Colonia', 'imdb': 7.4, 'category': 'Romance'},
 {'name': 'What is the name', 'imdb': 9.2, 'category': 'Suspense'}]

#### 3.2 [Expert] 

Write a function that:

1. Accepts the movies list as an argument.
2. Returns the movies list sorted first by category and then by movie according to average score and individual score, respectively.

In [68]:
# Your code here.
import numpy as np


# create a dictionary category:avg(imdb scoring)
categories = set([m['category'] for m in movies])
categories_averages = {c:np.mean([m['imdb'] for m in movies if m['category']==c]) for c in categories}

def sorted_movies(movies):    
    return sorted(movies, key=lambda movie: (-categories_averages[movie['category']],
                                             -movie['imdb']))

In [69]:
sorted_movies(movies)

[{'name': 'Dark Knight', 'imdb': 9.0, 'category': 'Adventure'},
 {'name': 'What is the name', 'imdb': 9.2, 'category': 'Suspense'},
 {'name': 'Detective', 'imdb': 7.0, 'category': 'Suspense'},
 {'name': 'The Help', 'imdb': 8.0, 'category': 'Drama'},
 {'name': 'Joking muck', 'imdb': 7.2, 'category': 'Comedy'},
 {'name': 'Colonia', 'imdb': 7.4, 'category': 'Romance'},
 {'name': 'We Two', 'imdb': 7.2, 'category': 'Romance'},
 {'name': 'The Choice', 'imdb': 6.2, 'category': 'Romance'},
 {'name': 'Love', 'imdb': 6.0, 'category': 'Romance'},
 {'name': 'Bride Wars', 'imdb': 5.4, 'category': 'Romance'},
 {'name': 'Hitman', 'imdb': 6.3, 'category': 'Action'},
 {'name': 'Usual Suspects', 'imdb': 7.0, 'category': 'Thriller'},
 {'name': 'Exam', 'imdb': 4.2, 'category': 'Thriller'},
 {'name': 'Ringing Crime', 'imdb': 4.0, 'category': 'Crime'},
 {'name': 'AlphaJet', 'imdb': 3.2, 'category': 'War'}]

---

### 4. Creating subsets by string condition

#### 4.1

Write a function that:

1. Accepts the movies list and a category name.
2. Returns the movie names within that category (case-insensitive!)
3. If the category is not in the data, print a message that it does not exist and return None.

Recall that to convert a string to lowercase, you can use:

```python
mystring = 'Dumb and Dumber'
lowercase_mystring = mystring.lower()
print lowercase_mystring
'dumb and dumber'
```



In [12]:
def select_by_category(movies, category):
    category = category.lower()
    movies_category = [m for m in movies if m['category'].lower()==category]
    if len(movies_category)==0:
        print("no movies")
        return None
    else:
        return movies_category

In [13]:
select_by_category(movies, category='RoManCe')

[{'name': 'The Choice', 'imdb': 6.2, 'category': 'Romance'},
 {'name': 'Colonia', 'imdb': 7.4, 'category': 'Romance'},
 {'name': 'Love', 'imdb': 6.0, 'category': 'Romance'},
 {'name': 'Bride Wars', 'imdb': 5.4, 'category': 'Romance'},
 {'name': 'We Two', 'imdb': 7.2, 'category': 'Romance'}]

In [14]:
select_by_category(movies, category='')

no movies


#### 4.2 [Challenge]

Write a function that:

1. Accepts the movies list and a "search string".
2. Returns a dictionary with keys `'category'` and `'title'` whose values are lists of categories that contain the search string and titles that contain the search string, respectively (case-insensitive!)

In [15]:
# Your code here.

def select_by_string(movies, string):
    string = string.lower()
    #there can be repetitions (i.e. different movies -> same category)!
    categories = [movie['category'] for movie in movies if string in movie['category'].lower()]
    titles = [movie['name'] for movie in movies if string in movie['name'].lower()]
    return {'category':list(set(categories)),
            'title':titles}

In [16]:
select_by_string(movies, string='ar')

{'category': ['War'], 'title': ['Dark Knight', 'Bride Wars']}

---

### 5. Multiple conditions

#### 5.1

Write a function that:

1. Accepts the movies list and a "search criteria" variable.
2. If the criteria variable is numeric, return a list of movie titles with a score greater than or equal to the criteria.
3. If the criteria variable is a string, return a list of movie titles that match that category (case-insensitive!). If there is no match, return an empty list and print an informative message.



In [62]:
def is_number(n):
    try:
        float(n)
        return True
    except ValueError:
        return False


def search(movies, criteria):
    # otherwise type(criteria) == float
    if is_number(criteria):
        return [movie for movie in movies if movie['imdb']>=criteria]
    # type('aa')==str
    if isinstance(criteria, str):
        category_movies = [movie for movie in movies if criteria.lower()==movie['category'].lower()]
        if len(category_movies)==0:
            print("Error: empty list!")
        return category_movies
    
    # I reach this part only if criteria
    # is neither numberic nor string
    print("ERROR WITH INPUT!")
    return -1 # raise an exception?
        

In [63]:
search(movies, 5)

[{'name': 'Usual Suspects', 'imdb': 7.0, 'category': 'Thriller'},
 {'name': 'Hitman', 'imdb': 6.3, 'category': 'Action'},
 {'name': 'Dark Knight', 'imdb': 9.0, 'category': 'Adventure'},
 {'name': 'The Help', 'imdb': 8.0, 'category': 'Drama'},
 {'name': 'The Choice', 'imdb': 6.2, 'category': 'Romance'},
 {'name': 'Colonia', 'imdb': 7.4, 'category': 'Romance'},
 {'name': 'Love', 'imdb': 6.0, 'category': 'Romance'},
 {'name': 'Bride Wars', 'imdb': 5.4, 'category': 'Romance'},
 {'name': 'Joking muck', 'imdb': 7.2, 'category': 'Comedy'},
 {'name': 'What is the name', 'imdb': 9.2, 'category': 'Suspense'},
 {'name': 'Detective', 'imdb': 7.0, 'category': 'Suspense'},
 {'name': 'We Two', 'imdb': 7.2, 'category': 'Romance'}]

In [65]:
search(movies, 8.0)

[{'name': 'Dark Knight', 'imdb': 9.0, 'category': 'Adventure'},
 {'name': 'The Help', 'imdb': 8.0, 'category': 'Drama'},
 {'name': 'What is the name', 'imdb': 9.2, 'category': 'Suspense'}]

In [66]:
search(movies, "Suspense")

[{'name': 'What is the name', 'imdb': 9.2, 'category': 'Suspense'},
 {'name': 'Detective', 'imdb': 7.0, 'category': 'Suspense'}]

In [67]:
search(movies, "a category that doesn't exist")

Error: empty list!


[]

#### 5.2 [Expert]

Write a function that:

1. Accepts the movies list and a string search criteria variable.
2. The search criteria variable can contain within it:
  - Boolean operations: `'AND'`, `'OR'`, and `'NOT'` (can have/be lowercase as well, I just capitalized for clarity).
  - Search criteria specified with syntax `score=...`, `category=...`, and/or `title=...`, where the `...` indicates what to look for.
    - If `score` is present, it means scores greater than or equal to the value.
    - For `category` and `title`, the string indicates that the category or title must _contain_ the search string (case-insensitive).
3. Return the matches for the search criteria specified.

In [22]:
"NOT category=aa"

"score=10 AND category=aa"
"score=10 OR category=aa"
"score=10 AND NOT category=aa"


"score=10 AND ( category=aa OR category=bb )"

'score=10 AND ( category=aa OR category=bb )'

In [23]:
# Your code here.
