<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Iteration practice with movies

---

In this lab you'll be using the provided imdb `movies` list below as your dataset. 

This lab is designed to practice iteration. The normal questions are more gentle, and the challenge questions are suitable for advanced/expert python or programming-experienced students. 

All the questions require writing functions and also use iteration to solve. You should print out a test of each function you write.

---

### 1. Load the provided list of movies dictionaries.

In [1]:
# List of movies dictionaries:

movies = [
{
"name": "Usual Suspects", 
"imdb": 7.0,
"category": "Thriller"
},
{
"name": "Hitman",
"imdb": 6.3,
"category": "Action"
},
{
"name": "Dark Knight",
"imdb": 9.0,
"category": "Adventure"
},
{
"name": "The Help",
"imdb": 8.0,
"category": "Drama"
},
{
"name": "The Choice",
"imdb": 6.2,
"category": "Romance"
},
{
"name": "Colonia",
"imdb": 7.4,
"category": "Romance"
},
{
"name": "Love",
"imdb": 6.0,
"category": "Romance"
},
{
"name": "Bride Wars",
"imdb": 5.4,
"category": "Romance"
},
{
"name": "AlphaJet",
"imdb": 3.2,
"category": "War"
},
{
"name": "Ringing Crime",
"imdb": 4.0,
"category": "Crime"
},
{
"name": "Joking muck",
"imdb": 7.2,
"category": "Comedy"
},
{
"name": "What is the name",
"imdb": 9.2,
"category": "Suspense"
},
{
"name": "Detective",
"imdb": 7.0,
"category": "Suspense"
},
{
"name": "Exam",
"imdb": 4.2,
"category": "Thriller"
},
{
"name": "We Two",
"imdb": 7.2,
"category": "Romance"
}
]

---

### 2. Filtering data by IMDB score

#### 2.1 

Write a function that:

1. Accepts a single movie dictionary from the `movies` list as an argument.
2. Returns True if the IMDB score is above 5.5.

In [2]:
# 2.1:

def imdb_score_over_bad(movie):
    """Takes a single movie as an argument and returns True if its imdb is above 5.5, otherwise False."""
    if movie['imdb'] > 5.5:
        return True
    else:
        return False

print(movies[0])
print(imdb_score_over_bad(movies[0]))

{'name': 'Usual Suspects', 'imdb': 7.0, 'category': 'Thriller'}
True


#### 2.2 [Challenge] 

Write a function that:

1. Accepts the movies list and a specified category.
2. Returns True if the average score of the category is higher than the average score of all movies.

In [3]:
# 2.2

def movies_category_over_avg(movies, category):
    """ 1. Accepts the movies list and a specified category.
        2. Returns True if the average score of the category is higher than the average score of all movies."""
    overall_average = [] # Initialize empty lists for overall average
    category_average = [] # Initialize empty lists for category average
    
    # Loop through all movies
    for movie in movies:
        # Append imdb score of each movie to overall average
        overall_average.append(movie['imdb'])
        # If the movie category is equal to the desired category, append the imdb score to the category average
        if movie['category'] == category:
            category_average.append(movie['imdb'])
    
    # calculate the overall average 
    overall_average = sum(overall_average)/len(overall_average)
 
    # if there was no movie of the desired category, return false
    if len(category_average) == 0:
        print('no movies in specified category:', category)
        return False
    
    # otherwise return true if the category average is larger than the overall average
    else:
        category_average = sum(category_average)/len(category_average)
        if category_average > overall_average:
            return True
        else:
            return False

print(movies_category_over_avg(movies, 'Thriller'))
print(movies_category_over_avg(movies, 'Suspense'))

False
True


---

### 3. Creating subsets by numeric condition

#### 3.1

Write a function that:

1. Accepts the list of movies and a specified imdb score.
2. Returns the sublist of movies that have a score greater than the specified score.

In [4]:
# 3.1

def score_greater_subset(movies, score):
    """ 1. Accepts the list of movies and a specified imdb score.
        2. Returns the sublist of movies that have a score greater than the specified score."""
    subset = [] # initialize the subset list
    
    # loop over the movies and check if the imdb score is above the given score
    for movie in movies:
        if movie['imdb'] > score:
            subset.append(movie)
            
    return subset

print(score_greater_subset(movies, 8.5))

[{'name': 'Dark Knight', 'imdb': 9.0, 'category': 'Adventure'}, {'name': 'What is the name', 'imdb': 9.2, 'category': 'Suspense'}]


#### 3.2 [Challenge] 

Write a function that:

1. Accepts the movies list as an argument.
2. Returns the movies list sorted first by category and then by movie according to average score and individual score, respectively.

In [5]:
# 3.2
# See here for another example and explanation of the lambda search:
# http://stackoverflow.com/questions/3766633/how-to-sort-with-lambda-in-python
# http://stackoverflow.com/questions/14299448/sorting-by-multiple-conditions-in-python

def category_score_sorted(movies):
    """ 1. Accepts the movies list as an argument.
        2. Returns the movies list sorted first by category and then by movie according to average score and individual score, respectively."""

    category_scores = {} # Initialize dictionary for category scores
    
    # loop over all movies
    for movie in movies:
        # check if category is already contained in category score, add if not
        if not movie['category'] in category_scores:
            category_scores[movie['category']] = [movie['imdb']]
        else:
            category_scores[movie['category']].append(movie['imdb'])
    
    
    category_averages = {} # Initialize dictionary for category averages
    
    # loop over key:value pairs in category score, calculate the average score for each category and append to category average dictionary
    for cat, vals in list(category_scores.items()):
        category_averages[cat] = sum(vals)/len(vals)
#    print category_averages
    
    # Sort movies, key specifies that they should be ordered according to their category average, and then within each category according to their imdb score
    movies_sorted = sorted(movies, key=lambda x: (category_averages[x['category']],
                                                  x['imdb']), reverse=True)
    
    return movies_sorted

category_score_sorted(movies)

[{'category': 'Adventure', 'imdb': 9.0, 'name': 'Dark Knight'},
 {'category': 'Suspense', 'imdb': 9.2, 'name': 'What is the name'},
 {'category': 'Suspense', 'imdb': 7.0, 'name': 'Detective'},
 {'category': 'Drama', 'imdb': 8.0, 'name': 'The Help'},
 {'category': 'Comedy', 'imdb': 7.2, 'name': 'Joking muck'},
 {'category': 'Romance', 'imdb': 7.4, 'name': 'Colonia'},
 {'category': 'Romance', 'imdb': 7.2, 'name': 'We Two'},
 {'category': 'Romance', 'imdb': 6.2, 'name': 'The Choice'},
 {'category': 'Romance', 'imdb': 6.0, 'name': 'Love'},
 {'category': 'Romance', 'imdb': 5.4, 'name': 'Bride Wars'},
 {'category': 'Action', 'imdb': 6.3, 'name': 'Hitman'},
 {'category': 'Thriller', 'imdb': 7.0, 'name': 'Usual Suspects'},
 {'category': 'Thriller', 'imdb': 4.2, 'name': 'Exam'},
 {'category': 'Crime', 'imdb': 4.0, 'name': 'Ringing Crime'},
 {'category': 'War', 'imdb': 3.2, 'name': 'AlphaJet'}]

---

### 4. Creating subsets by string condition

#### 4.1

Write a function that:

1. Accepts the movies list and a category name.
2. Returns the movie names within that category (case-insensitive!)
3. If the category is not in the data, print a message that it does not exist and return None.

Recall that to convert a string to lowercase, you can use:

```python
mystring = 'Dumb and Dumber'
lowercase_mystring = mystring.lower()
print lowercase_mystring
'dumb and dumber'
```

In [6]:
# 4.1

def category_subset(movies, category):
    """ 1. Accepts the movies list and a category name.
        2. Returns the movie names within that category (case-insensitive!)
        3. If the category is not in the data, print a message that it does not exist and return None."""
    
    category = category.lower() # convert given category to lower case
    movies_subset = [] # initialize empty movie subset list
    
    # loop over all the movies and append to subset if the movie is in the desired category
    for movie in movies:
        movie_category = movie['category'].lower()
        if movie_category == category:
            movies_subset.append(movie)
    
    # Return movie subset if not empty
    if len(movies_subset) == 0:
        print('No movies in category:', category)
        return None
    else:
        return movies_subset
    
print(category_subset(movies, 'suspense'))
print()
print(category_subset(movies, 'sci-fi'))

[{'name': 'What is the name', 'imdb': 9.2, 'category': 'Suspense'}, {'name': 'Detective', 'imdb': 7.0, 'category': 'Suspense'}]

No movies in category: sci-fi
None


#### 4.2 [Challenge]

Write a function that:

1. Accepts the movies list and a "search string".
2. Returns a dictionary with keys `'category'` and `'title'` whose values are lists of categories that contain the search string and titles that contain the search string, respectively (case-insensitive!)

In [7]:
# 4.2

def category_title_search(movies, search_string):
    """ 1. Accepts the movies list and a "search string".
        2. Returns a dictionary with keys `'category'` and `'title'` whose values are lists of categories that contain the search string and titles that contain the search string, respectively (case-insensitive!)"""
    
    search_string = search_string.lower() # transform search string to lower case
    
    results = {'category':[], 'title':[]} # initialize dictionary for search results
   
    # loop over all the movies and extract category and title for each
    for movie in movies:
        movie_category = movie['category'].lower()
        movie_title = movie['name'].lower()
        
        # if the search string is contained in the movie category append the category to the results if it is not yet contained
        if search_string in movie_category:
            if not movie_category in results['category']:
                results['category'].append(movie_category)
          
        # if the search string is contained in the movie title, append to the results (title should be unique, so no need to check if already included)
        if search_string in movie_title:
            results['title'].append(movie_title)
            
    return results

print(category_title_search(movies, 'cr'))

{'category': ['crime'], 'title': ['ringing crime']}


---

### Multiple conditions

#### 5.1

Write a function that:

1. Accepts the movies list and a "search criteria" variable.
2. If the criteria variable is numeric, return a list of movie titles with a score greater than or equal to the criteria.
3. If the criteria variable is a string, return a list of movie titles that match that category (case-insensitive!). If there is no match, return an empty list and print an informative message.

In [8]:
# 5.1

def general_search(movies, criterion):
    """ 1. Accepts the movies list and a "search criteria" variable.
        2. If the criteria variable is numeric, return a list of movie titles with a score greater than or equal to the criteria.
        3. If the criteria variable is a string, return a list of movie titles that match that category (case-insensitive!). If there is no match, return an empty list and print an informative message."""
        
    titles_matches = [] # initialize list for title matches
    
    # check if criterion is numeric or string and then search for either score or titles
    if type(criterion) in [int, float]:
        search_for = 'score'
    elif type(criterion) == str:
        search_for = 'titles'
        criterion = criterion.lower()
    else:
        print('criterion neither string nor numeric')
        return titles_matches
    
    # loop over all movies
    for movie in movies:
        
        # if numeric criterion, filter imdb score
        if search_for == 'score':
            if movie['imdb'] > criterion:
                titles_matches.append(movie['name'])
        # if string criterion, filter category
        else:
            if movie['category'].lower() == criterion:
                titles_matches.append(movie['name'])
                
    # return matches or inform that there are no matches            
    if len(titles_matches) == 0:
        print('no matches found')
    
    return titles_matches

print(general_search(movies, 6.9))
print(general_search(movies, 'suspense'))
print(general_search(movies, 'horror'))
print(general_search(movies, {'name':'the godfather'}))

['Usual Suspects', 'Dark Knight', 'The Help', 'Colonia', 'Joking muck', 'What is the name', 'Detective', 'We Two']
['What is the name', 'Detective']
no matches found
[]
criterion neither string nor numeric
[]


#### 5.2 [Expert]

Write a function that:

1. Accepts the movies list and a string search criteria variable.
2. The search criteria variable can contain within it:
  - Boolean operations: `'AND'`, `'OR'`, and `'NOT'` (can have/be lowercase as well, I just capitalized for clarity).
  - Search criteria specified with syntax `score=...`, `category=...`, and/or `title=...`, where the `...` indicates what to look for.
    - If `score` is present, it means scores greater than or equal to the value.
    - For `category` and `title`, the string indicates that the category or title must _contain_ the search string (case-insensitive).
    - Example: 
        ```'imdb=7.0 NOT category=suspense OR NOT title=love'```
3. Return the matches for the search criteria specified.

In [9]:
# 5.2

def movie_matches_subparser(movies, movie_key, value):
    """ Take movies list, 
        a key specifying title, category or imdb
        and a value according to which to filter"""
    
    # check if search criteria specified correctly
    if movie_key == 'title':
        movie_key = 'name'
    elif movie_key not in ['category','imdb']:
        print('movie lookup key', movie_key, 'incorrect')
        return []
    
    # if criterion is imdb, try to convert value to float
    if movie_key == 'imdb':
        try:
            value = float(value)
        except:
            print('imdb', value, 'cannot become float')
            return []
        
    subset = [] # initialize the movie subset
    
    # loop through movie list
    for movie_ind, movie in enumerate(movies):
        # check value type, if numeric filter imdb scores, otherwise by title or category
        if type(value) == float:
            if movie[movie_key] >= value:
                subset.append(movie_ind)
        else:
            if value in movie[movie_key].lower():
                subset.append(movie_ind)
    
    return subset



def meets_boolean_criteria(movies, criteria_info):
    """Take movies, filter according to criteria_info and return remaining movies"""
    
    movie_inds = list(range(len(movies))) # setup movie indices
    
    full_set = set(movie_inds) # transform to sets to be able to apply set operation
    return_set = set(movie_inds)
    
    # loop over criteria info
    for boolean, movie_subset in criteria_info:
        
        movie_subset = set(movie_subset) # remove duplicates from movie subset
        
        # apply set operation required by boolean search string
        if boolean == 'and':
            return_set = return_set & movie_subset
        elif boolean == 'or':
            return_set = return_set | movie_subset
        elif boolean == 'not':
            return_set = return_set - movie_subset
        elif boolean == 'ornot':
            return_set = return_set | (full_set - movie_subset)
    
    # Obtain list of movies remaining after boolean search
    return_list = []
    for ind in list(return_set):
        return_list.append(movies[ind])
        
    return return_list  
            
                

def boolean_search(movies, search):
    """ 1. Accepts the movies list and a string search criteria variable.
        2. The search criteria variable can contain within it:
          - Boolean operations: `'AND'`, `'OR'`, and `'NOT'` (can have/be lowercase as well, I just capitalized for clarity).
          - Search criteria specified with syntax `score=...`, `category=...`, and/or `title=...`, where the `...` indicates what to look for.
            - If `score` is present, it means scores greater than or equal to the value.
            - For `category` and `title`, the string indicates that the category or title must _contain_ the search string (case-insensitive).
         - Example: 
            ```'imdb=7.0 NOT category=suspense OR NOT title=love'```
        3. Return the matches for the search criteria specified."""
        
    search = search.lower() # transform search string to lower case
    search = search.split(' ') # split search string on spaces
    
    criteria_info = [] # list for criterion
    current_boolean = 'and' # define default value for boolean
    
    # loop through search list as long as it contains elements
    while len(search) > 0:
        item = search.pop(0) # remove first item from search list
        # check if item is boolean
        if item in ['and','or','not']: 
            if (current_boolean == 'or') and (item == 'not'):
                current_boolean = 'ornot'
            else:
                current_boolean = item
            continue
        # if item is not boolean, split on = , otherwise return error     
        else:
            if '=' in item:
                item = item.split('=')
            else:
                print(item, 'syntax incorrect')
                return []
                        
            # call to above function using extracted items    
            movie_match_inds = movie_matches_subparser(movies, item[0], item[1])
            # append result together with current boolean to criteria info
            criteria_info.append([current_boolean, movie_match_inds])

    # call above function with criteria info        
    matches = meets_boolean_criteria(movies, criteria_info)
    return matches

In [10]:
# test of movie_matches_subparser

results = movie_matches_subparser(movies,'title','sus')
print([movies[i] for i in results])
print() 

results = movie_matches_subparser(movies,'imdb','9')
print([movies[i] for i in results])
print() 

results = movie_matches_subparser(movies,'category','om')
print([movies[i] for i in results])
print()

[{'name': 'Usual Suspects', 'imdb': 7.0, 'category': 'Thriller'}]

[{'name': 'Dark Knight', 'imdb': 9.0, 'category': 'Adventure'}, {'name': 'What is the name', 'imdb': 9.2, 'category': 'Suspense'}]

[{'name': 'The Choice', 'imdb': 6.2, 'category': 'Romance'}, {'name': 'Colonia', 'imdb': 7.4, 'category': 'Romance'}, {'name': 'Love', 'imdb': 6.0, 'category': 'Romance'}, {'name': 'Bride Wars', 'imdb': 5.4, 'category': 'Romance'}, {'name': 'Joking muck', 'imdb': 7.2, 'category': 'Comedy'}, {'name': 'We Two', 'imdb': 7.2, 'category': 'Romance'}]



In [11]:
# Illustration of set operations
set_1 = set(range(10))
set_2 = set(range(2,6))

print(set_1) 
print(set_1 & set_2)
print(set_1 - set_2)
print(set_1 | set_2)
print(set_1 | set_1-set_2)

{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
{2, 3, 4, 5}
{0, 1, 6, 7, 8, 9}
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}


#### Examples for applying the final function

In [12]:
boolean_search(movies, 'imdb=7.0 NOT category=suspense AND NOT title=love')

[{'category': 'Thriller', 'imdb': 7.0, 'name': 'Usual Suspects'},
 {'category': 'Adventure', 'imdb': 9.0, 'name': 'Dark Knight'},
 {'category': 'Drama', 'imdb': 8.0, 'name': 'The Help'},
 {'category': 'Romance', 'imdb': 7.4, 'name': 'Colonia'},
 {'category': 'Comedy', 'imdb': 7.2, 'name': 'Joking muck'},
 {'category': 'Romance', 'imdb': 7.2, 'name': 'We Two'}]

In [13]:
boolean_search(movies, 'imdb=8.9')

[{'category': 'Adventure', 'imdb': 9.0, 'name': 'Dark Knight'},
 {'category': 'Suspense', 'imdb': 9.2, 'name': 'What is the name'}]

In [14]:
boolean_search(movies, 'NOT imdb=8.9 AND NOT category=suspense AND NOT category=thriller')

[{'category': 'Action', 'imdb': 6.3, 'name': 'Hitman'},
 {'category': 'Drama', 'imdb': 8.0, 'name': 'The Help'},
 {'category': 'Romance', 'imdb': 6.2, 'name': 'The Choice'},
 {'category': 'Romance', 'imdb': 7.4, 'name': 'Colonia'},
 {'category': 'Romance', 'imdb': 6.0, 'name': 'Love'},
 {'category': 'Romance', 'imdb': 5.4, 'name': 'Bride Wars'},
 {'category': 'War', 'imdb': 3.2, 'name': 'AlphaJet'},
 {'category': 'Crime', 'imdb': 4.0, 'name': 'Ringing Crime'},
 {'category': 'Comedy', 'imdb': 7.2, 'name': 'Joking muck'},
 {'category': 'Romance', 'imdb': 7.2, 'name': 'We Two'}]

In [15]:
boolean_search(movies, 'imdb=notafloat')

imdb notafloat cannot become float


[]

In [16]:
boolean_search(movies, 'category=1')

[]

In [17]:
boolean_search(movies, 'category=suspense')

[{'category': 'Suspense', 'imdb': 9.2, 'name': 'What is the name'},
 {'category': 'Suspense', 'imdb': 7.0, 'name': 'Detective'}]

In [18]:
boolean_search(movies, 'category=suspense WHEN imdb=5.5')

when syntax incorrect


[]

In [19]:
boolean_search(movies, 'review_count=100')

movie lookup key review_count incorrect


[]