# S18 T01: Tasca base de dades NoSQL

In [1]:
import pymongo
import pandas as pd

## Nivell 1
- Exercici 1

Crea una base de dades NoSQL utilitzant MongoDB. Afegeix-li algunes dades d'exemple que et permetin comprovar que ets capaç de processar-ne la informació de manera bàsica.

 We've created an Atlas account (MongoDB cloud service) an deployed the example databases that comes into the service.
 
 ![Atlas1](./Screenshots/Atlas1.png)

 ![Atlas2](./Screenshots/Atlas2.png)

- Exercici 2

Connecta la base de dades NoSQL a Python utilitzant per exemple pymongo.

In [5]:
user_name = getpass("Username: ")
pw = getpass("Password: ")

In [6]:
conn_str = f"mongodb+srv://{user_name}:{pw}@warrior.nydx7.mongodb.net/"
myclient = pymongo.MongoClient(conn_str, serverSelectionTimeoutMS=5000)
myclient.list_database_names()

['mydatabase_test',
 'sample_airbnb',
 'sample_analytics',
 'sample_geospatial',
 'sample_mflix',
 'sample_restaurants',
 'sample_supplies',
 'sample_training',
 'sample_weatherdata',
 'admin',
 'local']

In [None]:
database_name = "mydatabase_test"
collection_name = "users"
user = {"username": "Vetusta", "email": "vetusta_13@fakemail.com", "pw": "fakepassword4", "Is_admin":True, "admin_level": 5}

mydb = myclient[database_name] #Create a new DB
mycol = mydb[collection_name] #Create a collection --> is the equivalent to a SQL table. 

In [None]:
mycol.delete_many({}) # this line to delete the previous records
myrecord = mycol.insert_one(user) #Insert a record in the collection

if "mydatabase_test" in myclient.list_database_names(): #Check if the DB exists. The DB doesn't show up until it has a collection and data on it. 
    print(f"The database {database_name} exists.")
if "users" in mydb.list_collection_names(): # Check if the collection exists
    print(f"The collection {collection_name} exists:")
    for r in mycol.find(): # print the records of the collection
        print (r)


In [None]:
# Create random records from the word list below:
user_list = ["Blanquet", "Pilgrm", "Ford", "Garrison", "Stormtrooper", "Milwakee", "Cliford", "Saturna", "Hugaceo", "Mad_Racoon", "Tolkien", "Matilda", "SuperSayan"]
users = []
n = 1
a = 0

for i in user_list:
    users.append({"_id":n, "username": i, "email": f"{i}_{n}@fakemail.com".lower(), "pw": f"fakepassword{n}", "Is_admin":False})
    if n % 3 == 0: 
        users[a].update({"Is_admin":True, "admin_level": int(n/3)}) # if the Id is divisble by 3 this user is an admin and has "admin_level" attribute
    n += 1
    a += 1

for i in users: 
    print(i)


In [None]:
x = mycol.insert_many(users) # Insertin multiple records into collection
for r in mycol.find(): # print the records
        print (r)

Note that as we did not determined the "_id" of the first record Mongo assigned a uniqueID automaticaly. 

## Nivell 2

- Exercici 1

Carrega algunes consultes senzilles a un Pandas Dataframe. 

In [None]:
for x in mycol.find({'Is_admin':True}):
    print(x)

In [None]:
for x in mycol.find({"admin_level":{"$gte":3}}, {"_id":0, "email":0, "pw":0}).sort("username"): # users with "admin_level >= 3 and sorted by username"
    print(x)

At this point I realize that the attribute "Is_admin" is irrelevant as the self existance of the field "admin_level" determines if the user is admin or not. 

In [None]:
# For the next queries change the database to another one of the test databases that Atlas provides. 

database_name = "sample_mflix"
mydb = myclient[database_name] #Create a new DB
mydb.list_collection_names()

In [None]:
mycol = mydb["movies"]

# Search for all movies with ranting <= 8 in IMDB that are described as a masterpiece by the critic.

cursor = mycol.find({"imdb.rating":{"$lte":8}, "tomatoes.consensus":{"$regex":"masterpiece"}}, {"_id":False, "plot":False, "poster":False, "fullplot":False, "tomatoes.consensus":False}).sort("generes")
movies_df = pd.DataFrame(list(cursor))
movies_df

## Nivell 3

- Exercici 1

Genera un resum estadístic de la informació que conté la base de dades.

In [None]:
# Accesing the DB we've created before
mydb = myclient["mydatabase_test"] 
mycol = mydb["users"]

In [None]:
#Retrieving the entire data to a pandas DatFrame

cursor = mycol.find({})
users_df = pd.DataFrame(list(cursor))
users_df

In [None]:
users_df.info()

In [None]:
# Showing the statistics of the DB:
users_df.describe(include="all")

In order to resume statistically some of the features on sample_mflix DB (Test DB Atlas mongo cloud service) we've created and [Interactive Dashboard](https://charts.mongodb.com/charts-tasca18-mspka/public/dashboards/60df227c-e219-496c-8607-b4068dc95752) with Atlas Charts. 


 ![Atlas_Dashboard](./Screenshots/MoviesDashboard.png)