<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

## Python Review with Movie Data


---

In this lab you'll be using the provided imdb `movies` list below as your dataset. 

This lab is designed to practice iteration and funcitons in particular. The normal questions are more gentle, and the challenge questions are suitable for advanced/expert python or programming-experienced students. 

All the questions require writing functions and also use iteration to solve. You should print out a test of each function you write.


### 1. Load the provided list of movies dictionaries.

In [38]:
# List of movies dictionaries:



movies = [
{
"name": "Usual Suspects", 
"imdb": 7.0,
"category": "Thriller"
},
{
"name": "Hitman",
"imdb": 6.3,
"category": "Action"
},
{
"name": "Dark Knight",
"imdb": 9.0,
"category": "Adventure"
},
{
"name": "The Help",
"imdb": 8.0,
"category": "Drama"
},
{
"name": "The Choice",
"imdb": 6.2,
"category": "Romance"
},
{
"name": "Colonia",
"imdb": 7.4,
"category": "Romance"
},
{
"name": "Love",
"imdb": 6.0,
"category": "Romance"
},
{
"name": "Bride Wars",
"imdb": 5.4,
"category": "Romance"
},
{
"name": "AlphaJet",
"imdb": 3.2,
"category": "War"
},
{
"name": "Ringing Crime",
"imdb": 4.0,
"category": "Crime"
},
{
"name": "Joking muck",
"imdb": 7.2,
"category": "Comedy"
},
{
"name": "What is the name",
"imdb": 9.2,
"category": "Suspense"
},
{
"name": "Detective",
"imdb": 7.0,
"category": "Suspense"
},
{
"name": "Exam",
"imdb": 4.2,
"category": "Thriller"
},
{
"name": "We Two",
"imdb": 7.2,
"category": "Romance"
}
]

---

### 2. Filtering data by IMDB score

#### 2.1 

Write a function that:

1. Accepts a single movie dictionary from the `movies` list as an argument.
2. Returns True if the IMDB score is above 5.5.

#### 2.2 [Challenge] 

Write a function that:

1. Accepts the movies list and a specified category.
2. Returns True if the average score of the category is higher than the average score of all movies.

In [39]:
def filter21(dictionary):
    return dictionary['imdb'] > 5.5 

filter21(movies[2])

True

In [40]:
def filter22(cat, movies = movies):
    avg = sum([movie['imdb'] for movie in movies])/len(movies)
    cat_movies = [movie for movie in movies if movie['category'] == cat]
    return sum([movie['imdb'] for movie in cat_movies])/len(cat_movies) > avg

filter22('Suspense')

True

---

### 3. Creating subsets by numeric condition

#### 3.1

Write a function that:

1. Accepts the list of movies and a specified imdb score.
2. Returns the sublist of movies that have a score greater than the specified score.

#### 3.2 [Expert] 

Write a function that:

1. Accepts the movies list as an argument.
2. Returns the movies list sorted first by category and then by movie according to average score and individual score, respectively.

In [41]:
def filter31(score, movies=movies):
    return [movie['name'] for movie in movies if movie['imdb']>score]

filter31(7)

['Dark Knight',
 'The Help',
 'Colonia',
 'Joking muck',
 'What is the name',
 'We Two']

In [42]:

def movie_sorter(movies = movies):

    movie_cats = set([i['category'] for i in movies])

    cat_avg_dict = {}
    for cat in movie_cats:
        cat_scores = []
        for movie in movies:
            if movie['category'] == cat:
                cat_scores.append(movie['imdb'])
        cat_avg_dict[cat] = sum(cat_scores)/len(cat_scores)

    for movie in movies:
        for cat, avg in cat_avg_dict.items():
            if movie['category'] == cat:
                movie['cat_avg'] = avg
            else:
                continue
    def sorter(d):
        return d['cat_avg'] , d['imdb']

    return sorted(movies, key=sorter, reverse=True)

movie_sorter()

[{'cat_avg': 9.0, 'category': 'Adventure', 'imdb': 9.0, 'name': 'Dark Knight'},
 {'cat_avg': 8.1,
  'category': 'Suspense',
  'imdb': 9.2,
  'name': 'What is the name'},
 {'cat_avg': 8.1, 'category': 'Suspense', 'imdb': 7.0, 'name': 'Detective'},
 {'cat_avg': 8.0, 'category': 'Drama', 'imdb': 8.0, 'name': 'The Help'},
 {'cat_avg': 7.2, 'category': 'Comedy', 'imdb': 7.2, 'name': 'Joking muck'},
 {'cat_avg': 6.44, 'category': 'Romance', 'imdb': 7.4, 'name': 'Colonia'},
 {'cat_avg': 6.44, 'category': 'Romance', 'imdb': 7.2, 'name': 'We Two'},
 {'cat_avg': 6.44, 'category': 'Romance', 'imdb': 6.2, 'name': 'The Choice'},
 {'cat_avg': 6.44, 'category': 'Romance', 'imdb': 6.0, 'name': 'Love'},
 {'cat_avg': 6.44, 'category': 'Romance', 'imdb': 5.4, 'name': 'Bride Wars'},
 {'cat_avg': 6.3, 'category': 'Action', 'imdb': 6.3, 'name': 'Hitman'},
 {'cat_avg': 5.6,
  'category': 'Thriller',
  'imdb': 7.0,
  'name': 'Usual Suspects'},
 {'cat_avg': 5.6, 'category': 'Thriller', 'imdb': 4.2, 'name': 'Ex

---

### 4. Creating subsets by string condition

#### 4.1

Write a function that:

1. Accepts the movies list and a category name.
2. Returns the movie names within that category (case-insensitive!)
3. If the category is not in the data, print a message that it does not exist and return None.

Recall that to convert a string to lowercase, you can use:

```python
mystring = 'Dumb and Dumber'
lowercase_mystring = mystring.lower()
print lowercase_mystring
'dumb and dumber'
```

#### 4.2 [Challenge]

Write a function that:

1. Accepts the movies list and a "search string".
2. Returns a dictionary with keys `'category'` and `'title'` whose values are lists of categories that contain the search string and titles that contain the search string, respectively (case-insensitive!)

In [43]:
# Your code here.

def filter41(cat, movies = movies):
    movie_list = [movie['name'] for movie in movies if movie['category'].lower() == cat.lower()]
    if len(movie_list) >= 1:
        return movie_list
    else:
        print(f"{cat} is not a valid category")
        return None
filter41('romance')

['The Choice', 'Colonia', 'Love', 'Bride Wars', 'We Two']

In [44]:
def filter42(search, movies=movies):
    dct = {} 
    dct['title'] = [movie['name'] for movie in movies if search.lower() in movie['name'].lower()]
    dct['category'] = list(set([movie['category'] for movie in movies if search.lower()\
                                in movie['category'].lower()]))
    return dct

filter42('r')

{'category': ['Drama', 'Adventure', 'Thriller', 'War', 'Crime', 'Romance'],
 'title': ['Dark Knight', 'Bride Wars', 'Ringing Crime']}

---

### 5. Multiple conditions

#### 5.1

Write a function that:

1. Accepts the movies list and a "search criteria" variable.
2. If the criteria variable is numeric, return a list of movie titles with a score greater than or equal to the criteria.
3. If the criteria variable is a string, return a list of movie titles that match that category (case-insensitive!). If there is no match, return an empty list and print an informative message.

#### 5.2 [Expert]

Write a function that:

1. Accepts the movies list and a string search criteria variable.
2. The search criteria variable can contain within it:
  - Boolean operations: `'AND'`, `'OR'`, and `'NOT'` (can have/be lowercase as well, I just capitalized for clarity).
  - Search criteria specified with syntax `score=...`, `category=...`, and/or `title=...`, where the `...` indicates what to look for.
    - If `score` is present, it means scores greater than or equal to the value.
    - For `category` and `title`, the string indicates that the category or title must _contain_ the search string (case-insensitive).
3. Return the matches for the search criteria specified.

In [45]:
# Your code here.


In [46]:
def filter51(search, movies=movies):
    if type(search) in [int, float]:
        return [movie['name'] for movie in movies if movie['imdb'] >= search]
    titles = [movie['name'] for movie in movies if movie['category'].lower() == search.lower()]
    if len(titles) == 0:
        print(f"{search} doesn't match any movie categories")
    return titles

filter51('romance')

['The Choice', 'Colonia', 'Love', 'Bride Wars', 'We Two']

In [47]:
# Not fully built out 


def filter52(search, movies=movies):

    def names_that_meet_criteria(condition):
        left, right = condition.split('=')
        if left == 'score':
            try:
                return set([movie['name'] for movie in movies if movie['imdb'] >= float(right)])
            except:
                print('please enter a numeric value with the parameter score')
                return set([])
        elif left == 'category':
            return set([movie['name'] for movie in movies if right in movie['category'].lower()])
        else:
            return set([movie['name'] for movie in movies if right in movie['name'].lower()])

    all_names = set([movie['name'] for movie in movies])
    search = search.lower().split()
    criteria = [term for term in search if term not in ['and', 'or','not']]
    bool_conditions = [term for term in search if term in ['and', 'or', 'not']]

    current_titles = set.intersection(all_names, names_that_meet_criteria(criteria[0]))
    for criterion, condition in zip(criteria[1:], bool_conditions):
        if condition == 'and':
            current_titles = set.intersection(current_titles, names_that_meet_criteria(criterion))
        elif condition == 'or':
            current_titles = set.union(current_titles, names_that_meet_criteria(criterion))
        else:
            current_titles = set.difference(current_titles, names_that_meet_criteria(criterion))
    return current_titles

filter52('score=5 and category=romance or category=action or category=suspense not title=t')

{'Bride Wars', 'Colonia', 'Love'}

In [48]:
# 5.2

# this function is used later in the function boolean_search and may not make sense initially.
def movie_matches_subparser(movies, movie_key, value):
    # if we are assessing a title criteria
    if movie_key == 'title':
        movie_key = 'name'
    # if not a title, category or imdb, throw an error message
    elif movie_key not in ['category','imdb']:
        print('movie lookup key', movie_key, 'incorrect')
        return []
    # we are assessing a score criteria  
    if movie_key == 'imdb':
        try:
            value = float(value)
        # if score is invalid, throw an error message
        except:
            print('imdb', value, 'cannot become float')
            return []
        
    subset = []
    # assigns index values to movies and appends indexes of movies in the specified criteria
    for movie_ind, movie in enumerate(movies):
        # looks at scores
        if type(value) == float:
            if movie[movie_key] >= value:
                subset.append(movie_ind)
        # looks for strings
        else:
            if value in movie[movie_key].lower():
                subset.append(movie_ind)
    
    return subset


# this function is used later in the function boolean_search and may not make sense initially.
def meets_boolean_criteria(movies, criteria_info):
    # movie indexes = the length of movies to compare to criteria_info
    movie_inds = list(range(len(movies)))
    
    full_set = set(movie_inds)
    return_set = set(movie_inds)
    
    # take a look at our movie's indexes and their booleans.
    for boolean, movie_subset in criteria_info:
        
        #removes duplicate movies as the for loop iterates through
        movie_subset = set(movie_subset)
        
        # uses bools to add or drop movie index lists from the return set
        if boolean == 'and':
            return_set = return_set & movie_subset
        elif boolean == 'or':
            return_set = return_set | movie_subset
        elif boolean == 'not':
            return_set = return_set - movie_subset
        elif boolean == 'ornot':
            return_set = return_set | (full_set - movie_subset)
            
    return_list = []
    # uses those index values to extract the rest of the movie information
    for ind in list(return_set):
        return_list.append(movies[ind])
        
    return return_list  
            
                

def boolean_search(movies, search):
    # convert string to lower
    search = search.lower()
    # split criteria into various parts using whitespace.  
    search = search.split(' ')
    # if extra or no whitespace is used in the search criteria issues will arise
    criteria_info = []
    current_boolean = 'and'
    
    # utilize a while statement to individual assess and extract separate criteria
    while len(search) > 0:
        # pop off that first criteria
        item = search.pop(0)
        '''This if statement may seem tricky, but its trying to figure out of the 
        current criteria is a relational operator or a specified criteria'''
        
        if item in ['and','or','not']:
            if (current_boolean == 'or') and (item == 'not'):
                current_boolean = 'ornot'
            else:
                current_boolean = item
            continue
        else:
            if '=' in item:
                item = item.split('=')
            else:
                print(item, 'syntax incorrect')
                return []
            # pass the specified criteria through the movie_matches_subparser             
            movie_match_inds = movie_matches_subparser(movies, item[0], item[1])
            # now we will append the index results from the movie_match_inds with their desired bool  
            criteria_info.append([current_boolean, movie_match_inds])

    # finally compare the list of movies to the identified index values and bools        
    matches = meets_boolean_criteria(movies, criteria_info)
    return matches
        

In [49]:
boolean_search(movies, 'imdb=7.0 NOT category=suspense OR NOT title=love')

[{'cat_avg': 5.6,
  'category': 'Thriller',
  'imdb': 7.0,
  'name': 'Usual Suspects'},
 {'cat_avg': 6.3, 'category': 'Action', 'imdb': 6.3, 'name': 'Hitman'},
 {'cat_avg': 9.0, 'category': 'Adventure', 'imdb': 9.0, 'name': 'Dark Knight'},
 {'cat_avg': 8.0, 'category': 'Drama', 'imdb': 8.0, 'name': 'The Help'},
 {'cat_avg': 6.44, 'category': 'Romance', 'imdb': 6.2, 'name': 'The Choice'},
 {'cat_avg': 6.44, 'category': 'Romance', 'imdb': 7.4, 'name': 'Colonia'},
 {'cat_avg': 6.44, 'category': 'Romance', 'imdb': 5.4, 'name': 'Bride Wars'},
 {'cat_avg': 3.2, 'category': 'War', 'imdb': 3.2, 'name': 'AlphaJet'},
 {'cat_avg': 4.0, 'category': 'Crime', 'imdb': 4.0, 'name': 'Ringing Crime'},
 {'cat_avg': 7.2, 'category': 'Comedy', 'imdb': 7.2, 'name': 'Joking muck'},
 {'cat_avg': 8.1,
  'category': 'Suspense',
  'imdb': 9.2,
  'name': 'What is the name'},
 {'cat_avg': 8.1, 'category': 'Suspense', 'imdb': 7.0, 'name': 'Detective'},
 {'cat_avg': 5.6, 'category': 'Thriller', 'imdb': 4.2, 'name': 