# Lab: Using Cursor-like aggregation stages

## For this lab, you'll have to use cursor-like aggregation stages to find the answer for the following scenario.

#### The dataset for this lab can be downloaded [here](https://s3.amazonaws.com/edu-static.mongodb.com/lessons/coursera/aggregation/movies.json) for upload to your own cluster.

### Movie Night

Your organization has a movie night scheduled, and you've again been tasked with coming up with a selection.

HR has polled employees and assembled the following list of preferred actresses and actors.

favorites = [
  "Sandra Bullock",
  "Tom Hanks",
  "Julia Roberts",
  "Kevin Spacey",
  "George Clooney"
]

For movies released in the **USA** with a ``tomatoes.viewer.rating`` greater
than or equal to **3**, calculate a new field called num_favs that represets how
many **favorites** appear in the ``cast`` field of the movie.

Sort your results by ``num_favs``, ``tomatoes.viewer.rating``, and ``title``,

What is the ``title`` of the **25th** film in the aggregation result?

**Hint**: MongoDB has a great expression for quickly determining whether there are common elements in lists, ``$setIntersection``

In [49]:
import pymongo

In [50]:
course_cluster_uri = "mongodb://agg-student:agg-password@cluster0-shard-00-00-jxeqq.mongodb.net:27017,cluster0-shard-00-01-jxeqq.mongodb.net:27017,cluster0-shard-00-02-jxeqq.mongodb.net:27017/test?ssl=true&replicaSet=Cluster0-shard-0&authSource=admin"
course_client = pymongo.MongoClient(course_cluster_uri)

In [51]:
movies = course_client['aggregations']['movies']

In [52]:
favorites = [
  "Sandra Bullock",
  "Tom Hanks",
  "Julia Roberts",
  "Kevin Spacey",
  "George Clooney"
]

In [53]:
movies.find_one({})

{'_id': ObjectId('573a1390f29313caabcd4192'),
 'title': 'The Conjuring of a Woman at the House of Robert Houdin',
 'year': 1896,
 'runtime': 1,
 'cast': ["Jeanne d'Alcy", 'Georges M�li�s'],
 'plot': 'A woman disappears on stage.',
 'fullplot': 'An elegantly dressed man enters through a stage door onto a set with decorated back screen, a chair and small table. He brings a well-dressed women through the door, spreads a newspaper on the floor, and places the chair on it. She sits and fans herself; he covers her with a diaphanous cloth. She disappears; he tries to conjure her back with incomplete results. Can he go beyond the bare bones of a conjuring trick and succeed in the complete reconstitution of a the lady?',
 'lastupdated': '2015-08-26 00:05:55.493000000',
 'type': 'movie',
 'directors': ['Georges M�li�s'],
 'imdb': {'rating': 6.3, 'votes': 759, 'id': 75},
 'countries': ['France'],
 'genres': ['Short'],
 'tomatoes': {'viewer': {'rating': 3.7, 'numReviews': 59},
  'lastUpdated': dat

In [54]:
predicate = {
    "$match": {
        "title": { "$exists": True },
        "countries": { "$in": [ "USA", "$countries" ] }, 
        "cast": { "$elemMatch": { "$exists": True } },
        "tomatoes.viewer.rating": { "$gte": 3 }
    }
}

In [55]:
projection = {
    "$project": {
        "_id": 0,
        "title": 1,
        "cast": 1,
        "rating": "$tomatoes.viewer.rating",
        "num_favs":   {"$size": { "$setIntersection": [favorites, "$cast"] } }
        }
}

In [56]:
sorting = {
    "$sort": {
        "num_favs": -1,
        "rating": -1,
        "title": -1
    }
}

In [57]:
skipping = {
    "$skip": 24
}

In [58]:
limiting = {
    "$limit": 1
}

In [59]:
pipeline = [
    predicate,
     projection,
     sorting,
     skipping,
     limiting
]

display(list(movies.aggregate(pipeline)))

[{'title': 'The Heat',
  'cast': ['Sandra Bullock',
   'Melissa McCarthy',
   'Demian Bichir',
   'Marlon Wayans'],
  'rating': 3.8,
  'num_favs': 1}]