# Conexão com o Mongo-DB localmente

Neste notebook criamos uma base de dados mongo db, e usamos um exemplo simples para preenchê-la.

In [1]:
import pymongo
from pymongo import MongoClient

In [None]:
# MongoDB local (defined no docker-compose.yml)
mongo_client = pymongo.MongoClient("mongodb+srv://mongo:123@clusterfed.dbvxq.mongodb.net/?retryWrites=true&w=majority&appName=ClusterFED")

# Select the database
db = mongo_client["imdb_db"]

# Make a new collection
collection = db["movies"]

#### Query 1: The movies with the highest rating

In [19]:

high_rated_movies = collection.find({"Rating": {"$gt": 9}})

print("\nBest films of All Time:")
for movie in high_rated_movies:
    print(movie.get("Title"))


Best films of All Time:
The Shawshank Redemption
The Godfather


*Only The Shawshank Redemption and The Godfather are included in this example. There are unanimously the two best movies of all time.*

#### Query 2: The number of movies in each genre

In [14]:
genre_stats = collection.aggregate([
    {"$unwind": "$Genre"},  # Descomponer arrays de géneros
    {"$group": {"_id": "$Genre", "count": {"$sum": 1}}},  # Agrupar por género y contar
    {"$sort": {"count": -1}}  # Ordenar de mayor a menor
])

print("\nNumber of movies by genre:")
for stat in genre_stats:
    print(f"Genre: {stat['_id']}, Quantity: {stat['count']}")



Number of movies by genre:
Genre: Drama, Quantity: 22
Genre: Crime, Quantity: 9
Genre: Adventure, Quantity: 7
Genre: Fantasy, Quantity: 5
Genre: Action, Quantity: 4
Genre: Sci-Fi, Quantity: 3
Genre: Biography, Quantity: 2
Genre: War, Quantity: 1
Genre: Western, Quantity: 1
Genre: History, Quantity: 1
Genre: Romance, Quantity: 1
Genre: Family, Quantity: 1
Genre: Mystery, Quantity: 1
Genre: Thriller, Quantity: 1


*As it can be seen, Drama always qualifies the highest in the list of genres, followed by Crime & Adventure*.

#### Query 3: Most laureated directors

In [24]:
# Using aggregation to count how many movies does each director have
director_stats = collection.aggregate([
    {"$unwind": "$Director"},  # If they're multiple director for each movie the arrays decomposes
    {"$group": {"_id": "$Director", "movie_count": {"$sum": 1}}},
    {"$sort": {"movie_count": -1}},
    {"$limit": 5}  # top 5
])

# Imprimir el director con más películas
for stat in director_stats:
    print(f"Director: {stat['_id']} | Nº of movies: {stat['movie_count']}")


Director: Peter Jackson | Nº of movies: 3
Director: Christopher Nolan | Nº of movies: 3
Director: Francis Ford Coppola | Nº of movies: 2
Director: Steven Spielberg | Nº of movies: 2
Director: David Fincher | Nº of movies: 2


*The directors with more movies in this top lists are Peter Jackson and Christopher Nolan. Some of the other like Coppola and Spielberg are also in the top list but not as many movies.*