In [7]:
import pymongo

In [8]:
course_cluster_uri = "mongodb://agg-student:agg-password@cluster0-shard-00-00-jxeqq.mongodb.net:27017,cluster0-shard-00-01-jxeqq.mongodb.net:27017,cluster0-shard-00-02-jxeqq.mongodb.net:27017/test?ssl=true&replicaSet=Cluster0-shard-0&authSource=admin"
course_client = pymongo.MongoClient(course_cluster_uri)

In [9]:
movies = course_client['aggregations']['movies']

# Lab: Group Accumulators

## For this lab, you'll be using group accumulators.

### Question

In this lab, you will need to capture the highest `imdb.rating`, lowest `imdb.rating`, average, and **sample** standard deviation for all films that won an Oscar.

You may find documentation on [group accumulators](https://docs.mongodb.com/manual/reference/operator/aggregation-group/#group-accumulator-operators) helpful!

The matching stage to find films with Oscar wins is provided below.

In [10]:
matching = {
    "$match": {
        "awards": { "$regex": "Won \\d{1,2} Oscars?"}
    }
}

$regex :- Provides regular expression capabilities for pattern matching strings in queries.

``"Won \\d{1,2} Oscars?"`` :- This line tell that 

The text must start with the word ``"Won"`` followed by a space.

``\\d{1,2}`` : This part matches a digit (\d in regex, with the backslash doubled for escaping in the string literal). The {1,2} quantifier means this digit can appear once or twice, allowing for numbers from 0 to 99.

``Oscars?``: This matches the word "Oscar" followed by an optional "s" (s? means the "s" is optional), accommodating both singular and plural forms. The space before "Oscars" ensures that it follows the number directly.

In [11]:

grouping = {
    "$group": {
        "_id": None,
        "highest_rating": { "$max": "$imdb.rating" },
        "lowest_rating": { "$min": "$imdb.rating" },
        "average_rating": { "$avg": "$imdb.rating" },
        "sample_st_dev_rating": { "$stdDevSamp": "$imdb.rating" }
    }
}

``$max`` :-The ``$max``operator updates the value of the field to a specified value if the specified value is greater than the current value of the field. The ``$max`` operator can compare values of different types, using the BSON comparison order.

``$stdDevPop`` :- standard deviation population
``$stdDevSamp`` :- standard deviation sampel

In [12]:
pipeline = [
    matching,
    grouping
]

display(list(movies.aggregate(pipeline)))

[{'_id': None,
  'highest_rating': 9.2,
  'lowest_rating': 4.5,
  'average_rating': 7.527024070021882,
  'sample_st_dev_rating': 0.5988145513344504}]