# Tutorial 3: NoSQL

__The goal of this assignment is to create 10 "queries" based on 2 NoSQL databases.__

I put queries in quotes, as these databases do not provide a declarative query language such as SQL.

Instead, you must rely on the database API to extract information and combine the results with some Python code.

During this tutorial, I encourage you to consider the pros and cons of each databases compared to relational databases.

__The first database: imdb_basics.shelve contains the movies records of IMDB (basics)__.

This database is stored using `shelve`, a module of Python standard library.

`shelve` is a simple Key-Value store that acts as a persistent dictionary.

This is the same model as:
- Redis
- Memcached
- Google LevelDB
- Amazon DynamoDB
- Facebook RocksDB

__The second database: (imdb.json) contains the person records of IMDB (names)__.

This database is stored using `tinymongo`, a drop in replacement for MongoDB.

`tinymongo` is a Document store that provides query methods on JSON docs.

This is the same model as:
- MongoDB
- CouchDB
- ArangoDB
- RethinkDB
- ElasticSearch

__These two data models are [the most popular ones](https://db-engines.com/en/ranking) and they are also used to create more [advanced data model](https://github.com/datathings/greycat/tree/master/plugins)__.

__Grade scale__: 20 points
- correct query: 2 point
- incorrect query: 0 points

__Further documentations__:
* https://www.imdb.com/interfaces/
* https://learnxinyminutes.com/docs/python/
* https://github.com/schapman1974/tinymongo/
* https://docs.python.org/3.6/library/shelve.html

# Core

In [None]:
# import shelve from standard library
import shelve

# open a connection to shelve database
basics = shelve.open('imdb_basics.shelve', 'r')

In [None]:
# import shelve from external dependency
from tinymongo import TinyMongoClient

# open a connection to tinymongo database
names = TinyMongoClient('.').imdb.names

# Examples

In [None]:
# get the first 20 keys from basics collection
list(basics.keys())[:20]

In [None]:
# get the first item from the basics collection
basics['tt0100275']

In [None]:
# get the first item from the names collection
names.find_one()

# Queries

__1. How many movies are in the `basics` collection ?__
- __hint__: you don't have to use a loop to answer this query
- __return__: Count (where Count = int)

In [None]:
def Q1():
    # YOUR CODE HERE
    raise NotImplementedError()

Q1()

In [None]:
assert isinstance(Q1(), int)

__2. Select the record associated to the movie whose primaryTitle is 'Blade Runner 2049'__
* __hints__: you have to use method the database provides
* __return__: Record (where Record = Dict[str, Any])

In [None]:
def Q2():
    # YOUR CODE HERE
    raise NotImplementedError()

Q2()

In [None]:
assert isinstance(Q2(), dict)
assert Q2()['primaryTitle'] == 'Blade Runner 2049'

__3. Select the primary title and runtime of every movies longer than 300 minutes (excluded)__
* __hint__: you have to construct your own return value
* __return__: List[Tuple[primaryTitle, runtimeMinutes]] (where primaryTitle = str, runtimeMinutes=int)

In [None]:
def Q3():
    # YOUR CODE HERE
    raise NotImplementedError()

Q3()

In [None]:
assert len(Q3()) == 14
assert all(len(row) == 2 and row[1] > 300 for row in Q3())

__4. Select the record in the `names` collection associated to \_id 'nm0705356'__
* __hint__: use `find_one` to return only one record
* __return__: Record (where Record = Dict[str, Any])

In [None]:
def Q4():
    # YOUR CODE HERE
    raise NotImplementedError()

Q4()

In [None]:
assert isinstance(Q4(), dict)
assert Q4()['_id'] == 'nm0705356'

__5. Select the primaryName of the first 20 persons born in 2000, sorted by name (descending)__
* __hint__: use the `find` method to return multiple results
* __return__: List[Name] (where Name = str)

In [None]:
def Q5():
    # YOUR CODE HERE
    raise NotImplementedError()

Q5()

In [None]:
assert len(Q5()) == 20
assert all(isinstance(x, str) for x in Q5())

__6. Select the primaryName and birthYear of persons born after 2000 (excluded) and whose name starts with the letter 'M'__
* __hint__: use the `$and`, `$gt` and `$regex` operator of MongoDB
* __return__: List[Tuple[primaryName, birthYear]] (where primaryName = str, birthYear = int)

In [None]:
def Q6():
    # YOUR CODE HERE
    raise NotImplementedError()

Q6()

In [None]:
assert len(Q6()) == 9
assert all(len(row) == 2 for row in Q6())
assert all(row[1] > 2000 for row in Q6())

__7. Compute the average movie runtime in minutes__
* __hint__: aggregation has to be performed with code
* __return__: Average (where Average = float)

In [None]:
def Q7():
    # YOUR CODE HERE
    raise NotImplementedError()

Q7()

In [None]:
assert isinstance(Q7(), float)

__8. Select the primary name and the primary titles for which the first 20 persons are known for__
* __hint__: you have to join the two database collections
* __return__: List[Tuple[primaryName, List[primaryTitle]]] (where primaryName = primaryTitle = str)

In [None]:
def Q8():
    # YOUR CODE HERE
    raise NotImplementedError()

Q8()

In [None]:
assert len(Q8()) == 20
assert all(len(row) == 2 for row in Q8())

__9. Select a sorted (ascending) and distinct list of movie genres__
* __hint__: Python provides a `set` structure and `sorted` function
* __return__: List[Genre] (where Genre = str)

In [None]:
def Q9():
    # YOUR CODE HERE
    raise NotImplementedError()

Q9()

In [None]:
assert len(Q9()) == 27
assert all(isinstance(x, str) for x in Q9())

__10. Select number of distinct movies for which a person is known for and which exists in the `basics` collection__
- __hint__: I insist on the distinct and exists word in the question
- __return__: Count (where Count = int)

In [None]:
def Q10():
    # YOUR CODE HERE
    raise NotImplementedError()

Q10()

In [None]:
assert isinstance(Q10(), int)