<a href="https://colab.research.google.com/github/deepanshuMeteor/QA-Big-Data-Fundamentals/blob/main/LIVE_22_Mongo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fundamentals of Big Data
## Notes 1.1, Document Databases with MongoDB

---

## How do I obtain an example mongo db on the cloud?

* go to mongodb.com
* TRY FREE
    * SIGN UP (with any email)
    * SKIP any "Skip" pages
    * CREATE **FREE** CLUSTER
    * GREEN **"CREATE CLUSTER"** BUTTON in BOTTOM RIGHT

* Clusters Screen
    * Under the Cluster0
    * Press CONNECT
    * Press "Add Your Current IP Address"
        * Pres "Add IP Address"
    * Create a User
        * eg., admin/1234
    * Press "Create MongoDB User"
    * Press "Choose a connection method"
    * Press "Connect to Your Application"
        * Choose "Python" in drop down, and version "3.6+"
        * Choose "full driver example"
            * Press "Copy" 
            * Create a new notebook and paste into a cell 
            * change `<password>` to `1234`


### Part 2:
* Run `!pip install pymongo dnspython` until it reads `Requirement already satisfied`
* Add `import pymongo`
* Change `<password>` in the connection string to the password you set for the admin user

## How do I install MongoDB python libraries?

In [None]:
!pip install pymongo dnspython



## How do I import mongo?

In [None]:
import pymongo

## How do I connect to a running mongo instance?

In [None]:
client = pymongo.MongoClient("127.0.0.1")

## How do I select a database?

Select the `test` database from the connection:

In [None]:
db = client.test

In [None]:
db

Database(MongoClient(host=['127.0.0.1:27017'], document_class=dict, tz_aware=False, connect=True), 'test')

## How can I use mongo locally?

Mongo provides a querying shell:

```bash 

(base) michael@192 ~ % mongo
MongoDB shell version v4.4.1
connecting to: mongodb://127.0.0.1:27017
Implicit session: session { "id" : UUID("903f64e2-ed3d-417f-94ff-08859615b6a8") }
MongoDB server version: 4.4.1
Welcome to the MongoDB shell.
```

    

At which, for example, you can create a user:

```bash
> db.createUser( { "user": "admin", "pwd": "1234", "customData": { "employeeId": 12345 },
...                  "roles": [ { role: "clusterAdmin", db: "admin" }, "readWrite"] },
...                { "w": "majority" , "wtimeout": 5000 } )

Successfully added user: {
	"user" : "admin",
	"customData" : {
		"employeeId" : 12345
	},
	"roles" : [
		{
			"role" : "clusterAdmin",
			"db" : "admin"
		},
		"readWrite"
	]
}
> 

```

```bash

> show databases
admin   0.000GB
config  0.000GB
local   0.000GB
test    0.000GB
> use test
switched to db test
> db.people.find()
{ "_id" : ObjectId("5fbcff4626d654e250c0d8d8"), "name" : "Sherlock", "age" : 18, "fav_hat" : "deer stalker", "location" : "Baker Street", "history" : [ { "location" : "Manchester", "postcode" : "MA1 1AP" }, { "location" : "Paris", "postcode" : "Notre" }, { "location" : "New York", "postcode" : "90210" } ] }
{ "_id" : ObjectId("5fbd03e726d654e250c0d8d9"), "name" : "Michael", "fav_color" : "purple", "age" : 18, "location" : "Old Street", "history" : [ { "location" : "Leeds", "postcode" : "LS1 1LU" }, { "location" : "Paris", "postcode" : "Notre" }, { "location" : "New York", "postcode" : "90210" } ] }
> 


```

## How do I choose a `collection` (aka. table)?

`db.COLLECTION_NAME` if this doesn't exist, mongo will create:

In [None]:
db.people

Collection(Database(MongoClient(host=['127.0.0.1:27017'], document_class=dict, tz_aware=False, connect=True), 'test'), 'people')

no need to do "create table" (esp. because no explicit schema). 

## What are documents?

In [None]:
person_document1 = {
    'name': 'Michael',
    'fav_color': 'purple',
    'age': 18,
    'location': 'Old Street',
    'history': [
        {'location': 'Leeds', 'postcode': 'LS1 1LU'},
        {'location': 'Paris', 'postcode': 'Notre'},
        {'location': 'New York', 'postcode': '90210'},
    ],
}

In [None]:
person_document2 = {
    'name': 'Sherlock',
    'age': 18,
    'fav_hat': 'deer stalker',
    'location': 'Baker Street',
    'history': [
        {'location': 'Manchester', 'postcode': 'MA1 1AP'},
        {'location': 'Paris', 'postcode': 'Notre'},
        {'location': 'New York', 'postcode': '90210'},
    ]
}

## How do I insert documents?

In [None]:
db.people.insert_one(person_document1)

<pymongo.results.InsertOneResult at 0x7f98405a3e00>

## How do I select ("find") documents?

Mongo operations in python, do not run the query:

In [None]:
db.people.find()

<pymongo.cursor.Cursor at 0x7f98600cdd30>

...this does not compute the result set. It just sets up the query.

You can compute using `list()` (ie., convert it to a list):

In [None]:
list(db.people.find())

[{'_id': ObjectId('5fbcff4626d654e250c0d8d8'),
  'name': 'Sherlock',
  'age': 18,
  'fav_hat': 'deer stalker',
  'location': 'Baker Street',
  'history': [{'location': 'Manchester', 'postcode': 'MA1 1AP'},
   {'location': 'Paris', 'postcode': 'Notre'},
   {'location': 'New York', 'postcode': '90210'}]},
 {'_id': ObjectId('5fbd03e726d654e250c0d8d9'),
  'name': 'Michael',
  'fav_color': 'purple',
  'age': 18,
  'location': 'Old Street',
  'history': [{'location': 'Leeds', 'postcode': 'LS1 1LU'},
   {'location': 'Paris', 'postcode': 'Notre'},
   {'location': 'New York', 'postcode': '90210'}]}]

The data is "load on demand" (ie., its an iterator) which computes the results on-demand:

In [None]:
for doc in db.people.find():
    print(doc['name'])

Sherlock
Michael


Above, each name is produced "on-demand", the whole result set isn't in memory. 

## Exercise (20 minutes)

* Review this file, and addition Mongo Notes
* Create a collection of documents for yourself
    * eg., create a collection of film reviews
* Query this collection using `.find()`
    * eg., obtain all reviews, reviews matching a condition, etc.
* Review the mongo documentation
    * eg.,  https://docs.mongodb.com/manual/reference/method/db.collection.find/

## What is a use case of a mixed schema document database?

Almost all applications, almost all developers require & should use schema. 

eg., advertising, dating website,.... 

Consider a website which tracks users, and conditionally fills-in demograph & profile information if it can be obtained/inferred. 


```
user1 = {
  age:
  location:
  gender:
  purchases:
  visithistory:
  events: 
  //...1000s 
}

user2 = {
 age:
 location:
 lastvisit:
}


user3 = {
 newfield:
}

```

In [None]:
def predict_likewine():
    pass

def predict_likefilm():
    pass