<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#MongoClient" data-toc-modified-id="MongoClient-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>MongoClient</a></span></li><li><span><a href="#Reading-data" data-toc-modified-id="Reading-data-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Reading data</a></span></li><li><span><a href="#Querying-MongoDB" data-toc-modified-id="Querying-MongoDB-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Querying MongoDB</a></span></li><li><span><a href="#Bonus" data-toc-modified-id="Bonus-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Bonus</a></span></li></ul></div>

# Mongo Through Python

In [None]:
from pymongo import MongoClient

## MongoClient
In order to communicate with mongo, we need a `client`. 

The client is what allows us to communicate with the server.

In [55]:
# For localhost connection, no arguments needed
client = MongoClient()

We are already connected to our MongoDB server. 

In [56]:
client

MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True)

This connection is with the whole of our server, we can select a database from it, by simply getting an attribute of our client with the name of the database.

`client.database`

In [57]:
client.datamad0121

Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'datamad0121')

For easy keeping and better readability of the code, we don't have to keep refering to the database with `client.database`, we can store it in a variable so it more convenient.

In [26]:
db = client.datamad0121

And the same applies to the collections, just use an attribute of the database with the collection name.

`database.collection`

`client.database.collection`

In [None]:
db.students

## Reading data
Having the collection, it is very easy to read data from our collection. 

We must use the `.find` method and pass a mongo query as argument.

In [29]:
db.students.find({"nickname":"Pepe"})

<pymongo.cursor.Cursor at 0x109749e20>

This method returns a cursor object as output. 

Cursors are iterable, meaning that we can use the  `next()` function with it to get the results one by one, we can iterate through it with a for loop and we can convert them to a list.

In [31]:
results = list(db.students.find({"nickname":"Pepe"}))
results

[{'_id': ObjectId('601a84641c1cb76202b22d53'),
  'name': 'Jose',
  'last_name': 'Lopez',
  'nickname': 'Pepe',
  'age': 32}]

And the elements of these results will be dictionaries. 

In [32]:
type(results)

list

In [33]:
results[0]

{'_id': ObjectId('601a84641c1cb76202b22d53'),
 'name': 'Jose',
 'last_name': 'Lopez',
 'nickname': 'Pepe',
 'age': 32}

In [34]:
type(results[0])

dict

In [35]:
results[0]["nickname"]

'Pepe'

# We can even call on databases and collections that do not exist and they will be created!!!

No need for creating databases previously or setting attributes as in mySQL, just treat it as if it already existed and it will be as it it did. 

In [36]:
obj = {"Hello":"World"}

In [38]:
db.test.insert_one(obj)

<pymongo.results.InsertOneResult at 0x10a46da40>

## Querying MongoDB
As on the GUI MongoCompass, we can have the 4 parts of our query when using pymongo.

- `filter` and `project` are arguments to the `.find` method.

- `skip` and `filter` are methods of their own.

In [44]:
filt = {"year":{"$gt":1989,"$lte":2000}}
project = {"title":1} 
results = movies.find(filt,project)#.skip(1).limit(1)
results

<pymongo.cursor.Cursor at 0x10a85dee0>

In [45]:
results = list(results)
results[:5]

[{'_id': ObjectId('601a87e037473c88e7f25c49'),
  'title': 'The Adventures of Ford Fairlane'},
 {'_id': ObjectId('601a87e037473c88e7f25c4a'),
  'title': 'After Dark, My Sweet'},
 {'_id': ObjectId('601a87e037473c88e7f25c4b'), 'title': 'Air America'},
 {'_id': ObjectId('601a87e037473c88e7f25c4c'), 'title': 'Alice'},
 {'_id': ObjectId('601a87e037473c88e7f25c4d'), 'title': 'Almost an Angel'}]

## Bonus

Let's do a comparison between making a Mongo query to our local server and requesting an API for data.

In [47]:
import requests 

In [46]:
%%timeit
filt = {"year":{"$gt":1989,"$lte":2000}}
project = {"title":1} 
results = movies.find(filt,project)#.skip(1).limit(1)
results

3.13 µs ± 50.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [48]:
%%timeit
requests.get("https://pokeapi.co/api/v2/pokemon/pikachu")

69.9 ms ± 6.54 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


As we can see, there is a significant difference in time!

And, even though the poke API is free, there are many API's which are not and that may limit the number of requests we do. 

In this case, it is not a bad idea to store the precious data we gather from the API locally. This way we avoid loosing all information if the API blocks us or our quota of requests ends.

Our program, even if limited, will still work.

To do this we need to check if the information we want `is` on our database before going after it with the API.

First of all, let's add some data to a new mongo collection. This is something that can be done beforehand or not. It doesn't really matter since Mongo allows us to treat non existing databases and collections as if they existed.

In [58]:
poke_nums = range(1,152)

In [59]:
coll = client.poke_data.pokemons
for num in poke_nums:
    res = list(coll.find({"#":num}))
    if res: 
        continue
    data = requests.get(f"https://pokeapi.co/api/v2/pokemon/{num}").json()
    coll.insert_one({**data,"#":num})

Finally, when we are acctually working on our data, we first look for it on the database.

In [60]:
for num in [55, 200, 451]:
    res = list(coll.find({"#":num}))
    # If any of this pokemon is not on our database, res will be an empty list
    if not res: 
        print("Not found... Requesting api")
        # In such event, we request the api for that information
        data = requests.get(f"https://pokeapi.co/api/v2/pokemon/{num}").json()
        # And imediately insert it on the database
        coll.insert_one({**data,"#":num})
    else:
        # If we do have the data we queried, we just have to take it out of the list
        # Keep in mind that this list may contain more than one element.
        # In this case it shouldn't happen because there are no two pokemons with the same number, so 
        # we can just take element 0.
        data = res[0]
    print(data["sprites"]["front_default"])

https://raw.githubusercontent.com/PokeAPI/sprites/master/sprites/pokemon/55.png
https://raw.githubusercontent.com/PokeAPI/sprites/master/sprites/pokemon/200.png
Not found... Requesting api
https://raw.githubusercontent.com/PokeAPI/sprites/master/sprites/pokemon/451.png


In [61]:
# When repeating the trial, pokemons we just added to the ddbb will no longer be requested to the api.
for num in [451,440, 151]:
    res = list(coll.find({"#":num}))
    if not res: 
        print("Not found... Requesting api")
        data = requests.get(f"https://pokeapi.co/api/v2/pokemon/{num}").json()
        coll.insert_one({**data,"#":num})
    else:
        data = res[0]
    print(data["sprites"]["front_default"])

https://raw.githubusercontent.com/PokeAPI/sprites/master/sprites/pokemon/451.png
Not found... Requesting api
https://raw.githubusercontent.com/PokeAPI/sprites/master/sprites/pokemon/440.png
https://raw.githubusercontent.com/PokeAPI/sprites/master/sprites/pokemon/151.png
