## Interacting with MongoDB from Python!

We'll be using the pymongo package, like we used py2neo yesterday. 

In [1]:
!pip install pymongo
import pymongo

Collecting pymongo
  Downloading pymongo-3.11.4-cp37-cp37m-macosx_10_6_intel.whl (414 kB)
[K     |████████████████████████████████| 414 kB 5.0 MB/s eta 0:00:01
[?25hInstalling collected packages: pymongo
Successfully installed pymongo-3.11.4


## Start the mongo daemon! ~ scary ~

Import the MongoClient class to connect to your database, which should now be running.

In [8]:
from pymongo import MongoClient
import pandas as pd
client = MongoClient('localhost', 27017)

### Create a database

In [9]:
db = client.launch #Create a database called "launch"

### Create a collection

As we said before, we can think of this as a loose equivalent to a SQL table. However, because of nesting, you might not end up having multiple of these in the way that you will SQL tables!

In [10]:
#Create the "people" collection
collection = db.people

### Let's make the type of data we want to add. 

Same as yesterday with Forge full-timers: 

In [11]:
people = ['Daniel Willson', 'Andy Page', 'Kaleigh Watson', "Amanda Coombs"]
job_title = ['VPP', 'ED', 'LPD', "COO"]
schools = ['UVA', 'UVA', 'UVA', 'UVA']
workplace = ['Forge', 'Forge', 'Forge', 'Forge']


people = pd.DataFrame({'name':people, 'job':job_title, 'alma_mater':schools,
                    'workplace':workplace})

In [12]:
school_name = ['UVA', 'VT']
school_type= ['Public', 'Public']
school_size = [16000, 30000]

company_name = ['Forge', 'Astraea']
company_type = ['501(c)(3)', 'For-profit startup']


schools = pd.DataFrame({'name':school_name, 'type':school_type, 'size':school_size})
companies = pd.DataFrame({'name':company_name, 'type':company_type})

In [13]:
people

Unnamed: 0,name,job,alma_mater,workplace
0,Daniel Willson,VPP,UVA,Forge
1,Andy Page,ED,UVA,Forge
2,Kaleigh Watson,LPD,UVA,Forge
3,Amanda Coombs,COO,UVA,Forge


In [14]:
schools

Unnamed: 0,name,type,size
0,UVA,Public,16000
1,VT,Public,30000


In [15]:
companies

Unnamed: 0,name,type
0,Forge,501(c)(3)
1,Astraea,For-profit startup


### Restructuring our data 

To insert it into MongoDB, we want our data to be in the format of python dictionaries. 

This allows us to properly nest our objects, and is much more convenient and pythonic than literal string representations. 

In [16]:
#Isolating just Daniel's row as an example: 
dan = people.iloc[0]

In [17]:
dan

name          Daniel Willson
job                      VPP
alma_mater               UVA
workplace              Forge
Name: 0, dtype: object

In [25]:
#Let's see how we can iterate over these properties
for i in range(len(dan)):
    print(dan.index[i], dan[i])

name Daniel Willson
job VPP
alma_mater UVA
workplace Forge


In [26]:
#Making a dictionary in flat structure: 
daniel_dictionary = {}
for i in range(len(dan)):
    daniel_dictionary[dan.index[i]] = dan[i]

In [28]:
daniel_dictionary["name"]

'Daniel Willson'

In [29]:
#Insert it into the database with insert_one
collection.insert_one(daniel_dictionary)

<pymongo.results.InsertOneResult at 0x118d5fe88>

### Insert successful!

Since dictionaries can be values of dictionary properties (read that again!), it's easy to perform the nesting functionality we discussed earlier. 

In [38]:
nest = {'name':'Ben',
        'lname':'Artuso',
        
        'position':'Data Scientist',
        
        'company':{
            'name':'Astraea',
            'location': {
                'city':'Charlottesville',
                'state':'Virginia'
            },
            'domain':'GeoAI'
        },
        'pets':[
            {'name':'Ozzie',
            'species':'Dog',
            'breed':'Cocker Spaniel'},
            {'name':'Pip',
            'species':'Dog',
            'breed':'Yorkie'}
        ]}

In [39]:
collection.insert_one(nest)

<pymongo.results.InsertOneResult at 0x118d64608>

### Simple queries

In [42]:
collection.find_one({'name':'Ben'})

{'_id': ObjectId('60b7bd00a70826c53dd6385b'),
 'name': 'Ben',
 'position': 'Data Scientist',
 'company': {'name': 'Astraea',
  'location': {'city': 'Charlottesville', 'state': 'Virginia'},
  'domain': 'GeoAI'},
 'pets': [{'name': 'Ozzie', 'species': 'Dog', 'breed': 'Cocker Spaniel'},
  {'name': 'Pip', 'species': 'Dog', 'breed': 'Yorkie'}]}

In [45]:
#Double inserted ben so you could see this in action: 
for val in collection.find({'name':'Ben'}):
    print(val)
    print("\n\n")

{'_id': ObjectId('60b7bd00a70826c53dd6385b'), 'name': 'Ben', 'position': 'Data Scientist', 'company': {'name': 'Astraea', 'location': {'city': 'Charlottesville', 'state': 'Virginia'}, 'domain': 'GeoAI'}, 'pets': [{'name': 'Ozzie', 'species': 'Dog', 'breed': 'Cocker Spaniel'}, {'name': 'Pip', 'species': 'Dog', 'breed': 'Yorkie'}]}



{'_id': ObjectId('60b7bdaca70826c53dd6385c'), 'name': 'Ben', 'lname': 'Artuso', 'position': 'Data Scientist', 'company': {'name': 'Astraea', 'location': {'city': 'Charlottesville', 'state': 'Virginia'}, 'domain': 'GeoAI'}, 'pets': [{'name': 'Ozzie', 'species': 'Dog', 'breed': 'Cocker Spaniel'}, {'name': 'Pip', 'species': 'Dog', 'breed': 'Yorkie'}]}





In [47]:
#how many docs are in the entire collection? 
collection.count_documents({})

3

### But wait - how do I programmatically insert data in the right structure, i.e. as nested objects? How can I even do that? 

in the words of Launch instructors past:

In [48]:
print("Figure it out ;)")

Figure it out ;)
