## 12-Web-Scraping-and-Document-Databases - Day 1 - Mastering MongoDB

### Class Objectives

* `Create` and `connect` to local MongoDB databases
* `Create`, `read`, `update`, and `delete` MongoDB documents using the Mongo Shell
* Create simple Python applications that `connect to` and `modify` MongoDB databases using the PyMongo library


### Resorces:
* [The difference between NoSQL and SQL](https://www.geeksforgeeks.org/difference-between-sql-and-nosql/)
* [Great 7 min read on NoSQL](https://medium.com/better-programming/introduction-to-nosql-databases-7f6ed6e055c5)
* [Mongo in 30 minutes](https://www.youtube.com/watch?v=pWbMrx5rVBE)
* [Python Requests](http://docs.python-requests.org/en/master/)
* [Webscraping with BeautifulSoup](https://www.dataquest.io/blog/web-scraping-tutorial-python/)
* [Python Splinter](https://splinter.readthedocs.io/en/latest/)

### Install:
* [MongoDB download](https://www.mongodb.com/download-center/community)
* [Install MongoDB on OS-x](https://docs.mongodb.com/manual/tutorial/install-mongodb-on-os-x/)
* Windows: Add MongoDB to your PATH `C:\Program Files\MongoDB\Server\4.4\bin`
* `conda install pymongo -y` (Run as Administrator)
* `pip install pymongo`


### Add MongoDB to Windows PATH:

* Make sure to copy the correct path to your MongoDB bin path like `C:\Program Files\MongoDB\Server\4.4\bin`

![windows_path_windows_01.png](images/windows_path_windows_01.png)

![windows_path_windows_02.png](images/windows_path_windows_02.png)

![windows_path_windows_03.png](images/windows_path_windows_03.png)



# ==========================================

### 1.01 Instructor Do: Basic MongoDB Queries (0:15)

# Query 1 - Creating dbs, inserting data and finding data

* Start up a new database by switching to it. The db does not exist until you create a collection.

```
use travel_db
```

* Show the current db by running db.

```
db
```

* Show current databases in existence

```
show dbs
```

* Create a collection

```
db.createCollection("destinations")
```

* See all collections in a database

```
show collections
```

* Insert data into the travel_db database with this command.

  - NOTE: This will also create the collection automatically, the contents of the insert are basically a JavaScript object, and include an array.

```
db.destinations.insert({"continent": "Africa", "country": "Morocco",
                        "major_cities": ["Casablanca", "Fez", "Marrakech"]})
```

* As a class, come up with 3-5 more countries and insert them into the db using the same syntax as above.

```
db.destinations.insert({"continent": "Europe", "country": "France",
                        "major_cities": ["Paris", "Marseille", "Bordeaux"]})

db.destinations.insert({"continent": "North America", "country": "USA",
                        "major_cities": ["Washington DC", "New York City", "San Francisco"]})
```

* Observe where the data was entered in the MongoDB instance (in mongod)

* Find all data in a Collection with `db.[COLLECTION_NAME].find()`

  - The MongoDB \_id was created automatically.

  - This id is specific for each doc in the collection.

```
db.destinations.find()
```

* Adding .pretty() makes the data more readable.

```
db.destinations.find().pretty()
```

* Find specific data by matching a field.

```
db.destinations.find({"continent": "Africa"})
db.destinations.find({"country": "Morocco"})
```

* Try a few queries with the examples we came up with as a class.

  - Also, pick something that will find more than one entry so we can see how it will return all matches.

  - Find specific data by matching an \_id.

```
db.destinations.find({_id: ObjectId("<ID Number Here>")})

db.destinations.find({_id: ObjectId("5416fe1d94bcf86cd785439036")})
```


# ==========================================

### 1.02 Students Do: Mongo Class (0:10)

## Instructions

* Use the command line to create a `ClassDB` database

* Insert entries into this database for yourself and the people around you within a collection called `students`

* Each document should have a field of `name` with the person's name, a field of `favoriteLibrary` for the person's favorite Python library, a field of `age` for the person's age, and a field of `hobbies` which will hold a list of that person's hobbies.

* Use the `find()` commands to get a list of everyone of a specific age before using `name` to collect the entry for a single person.


# Solution

## A. Use the command line to create a classDB database.

* Insert entries for yourself and the people in your row in a classroom collection. Each document should have:

1. A field of name with the person's name.
2. A field of the person's favorite Python library, e.g. pandas.
3. A field of a list of the person's hobbies .


## Example:

```
# Select the database
use classDB

# Insert a document
db.classroom.insert({name: 'Mariah', age: 23, favorite_python_library: 'Seaborn', hobbies: ['Coding', 'Reading', 'Running']})

# Insert a document
db.classroom.insert({name: 'Ricky', age: 34, favorite_python_library: 'Matplotlib', hobbies: ['Not Coding', 'Not Reading', 'Not Running', 'Guitar']})

# Insert a document
db.classroom.insert({name: 'Srikanth', age: 28, favorite_python_library: 'Pandas', hobbies: ['Netflix', 'Guitar', 'Traveling']})
```

## B. Use find commands to get:

1. A list of everyone of a certain age.


```
db.classroom.find({age: 23}).pretty()
```

2. An entry for a single person.


```
db.classroom.find({name: 'Ricky'}).pretty()
```

# ==========================================

### 1.03 Instructor Do: Removing, Updating and Dropping in MongoDB (0:10)

# Update, Delete and Drop in MongoDB

* Use the travel_db

```shell
db
use travel_db
```

* Insert two countries in Africa

```shell
db.destinations.insert({'country': 'Egypt', 'continent': 'Africa', major_cities: ['Cairo', 'Luxor']})
db.destinations.insert({'country': 'Nigeria', 'continent': 'Africa', major_cities: ['Lagos', 'Kano']})
```

* Update data using `db.[COLLECTION_NAME].update()`

```sql
UPDATE destinations SET "continent"='Antarctica' WHERE "country"= 'Egypt' LIMIT 1
```

```shell
db.destinations.update({"country": "Egypt"}, {$set: {"continent": "Antarctica"}})
```
* Note that the above will only update the first entry it matches.

* To update multiple entries, you can add `{multi:true}`, all countries listed as being in Africa will now show Antarctica as their continent

```sql
UPDATE destinations SET "continent"='Antarctica' WHERE "country"= 'Egypt'
```

```shell
db.destinations.update({"continent": "Africa"}, {$set: {"continent": "Antarctica"}}, {multi: true})
```

* Alternatively, we can use this syntax to update more than one record.

```shell
db.destinations.updateMany({"continent": "Africa"}, {$set: {"continent": "Antarctica"}})
```

* Q: What you think will happen when you run this command, even though a capital doesn't exist?

```shell
db.destinations.update({"country": "Morocco"}, {$set: {"capital": "Rabat"}})
```

* A: it will add the capital field to the document and show the field can now be updated with the same command.

```shell
db.destinations.update({"country": "Morocco"}, {$set: {"capital": "RABAT"}})
```

* Push to an array with `$push`.

```shell
db.destinations.update({"country": "Morocco"}, {$push: {"major_cities": "Agadir"}})
```

* The upsert option updates a document if one exists; it otherwise creates a new document.

```shell
db.destinations.update({'country': 'Canada'}, {$set: {'capital': 'Ottawa'}}, {upsert: true})
```

* Delete an entry with `db.[COLLECTION_NAME].remove({justOne: true})`.

```shell
db.destinations.remove({"country": "Morocco"}, {justOne: true})
```

* Empty a collection with `db.[COLLECTION_NAME].remove()`.

```shell
db.destinations.remove({})
```

* Drop a collection with `db.[COLLECTION_NAME].drop()`.

```shell
db.destinations.drop()
```

* Drop a database

```shell
db.dropDatabase()
```

# ==========================================

### 1.04 Students Do: Dumpster DB (0:15)

## Instructions

* Create and use a new database called `Dumpster_DB` using the Mongo shell.

* Create a collection called `divers` which will contain a string field for `name`, an integer field for `yearsDiving`, a boolean field for `stillDiving`, and an array of strings for `bestFinds`.

* Insert three new documents into the collection. Be creative with what you put in here and have some fun with it.

* Update the `yearsDiving` fields for your documents so that they are one greater than their original values.

* Update the `stillDiving` value for one of the documents so that it is now false.

* Push a new value into the `bestFinds` array for one of the documents.

* Look through the collection, find the diver with the smallest number of `bestFinds`, and remove it from the collection.


# Solution

* Create and use a database called `Dumpster_DB`.

```
use Dumpster_DB
```

* Create the `divers` collection and then insert a couple documents into it

```python
db.divers.insert({"name":"Davey", "yearsDiving":10, "stillDiving": true, "bestFinds":["Flat Screen", "Ruby Collar", "$100"]})

db.divers.insert({"name":"Jeanie", "yearsDiving":1, "stillDiving": true, "bestFinds":["Movie Theater Chairs", "Music Box"]})

db.divers.insert({"name":"Boppo", "yearsDiving":5, "stillDiving": true, "bestFinds":["Half-Eaten Hamburger", "Some Goop"]})
```

* Update `yearsDiving` so that it is one year higher for each diver

```python
db.divers.update({"name":"Davey"},{$set:{"yearsDiving":11}})
db.divers.update({"name":"Jeanie"},{$set:{"yearsDiving":2}})
db.divers.update({"name":"Boppo"},{$set:{"yearsDiving":6}})
```

* Update `stillDiving` to False for Davey

```python
db.divers.update({"name":"Davey"},{$set:{"stillDiving": false}})
```

* Add a new value to Jeanie's `bestFinds`

```python
db.divers.update({"name":"Jeanie"},{$push:{"bestFinds":"Mona Lisa"}})
```

* Remove `Boppo` from the collection

```python
db.divers.remove({"name":"Boppo"})
```

# ==========================================

### 1.04.2 Instructor Do: Mongo Compass (0:10)

* Download [MongoDB Compass](https://www.mongodb.com/products/compass) if you didn't already installed MongoDB Compass during your installation of MongoDB Server.

![Mongo Compass Connect](1/Images/07-MongoCompass_Connect.png)

* After hitting `CONNECT` button, you should be able to view a list of all of the MongoDB databases hosted on your localhost server.

* Clicking on a database's name will take you to a list of all of the collections stored on that database.
* Clicking on a collection name will then take you into a view in which you can peruse all of that collection's documents.

![Compass Docs View](1/Images/07-MongoCompass_DocsView.png)

* When inside of the Document Viewer, you can create, read, update, and even delete data using the GUI.
* You can also choose to view your data as a table if you really wanted to.

# ==========================================

### 1.04.3 Students Do: Compass Playground (0:05)

* Now that MongoDB Compass installed on your computer, take some time to play around with the application.

# ==========================================

### BREAK (0:10)

# ==========================================

### 1.05 Instructor Do: Introduction to Pymongo (0:15) 

[MongoDB Compass](https://www.mongodb.com/products/compass)

In [None]:
# !conda install pymongo -y

In [None]:
# !pip install pymongo

In [1]:
# Module used to connect Python with MongoDb
import pymongo

In [6]:
# The default port used by MongoDB is 27017
# https://docs.mongodb.com/manual/reference/default-mongodb-port/
conn = 'mongodb://localhost:27017'
client = pymongo.MongoClient(conn)

# Define the 'classDB' database in Mongo
db = client.ClassDB

In [7]:
# db = client.DataClass

In [8]:
# Query all students
# Here, db.students refers to the collection 'classroom '
classroom = db.classroom.find()

# Iterate through each student in the collection
for student in classroom:
    print(student)

In [9]:
# Insert a document into the 'students' collection
db.classroom.insert_one(
    {
        'name': 'Ahmed',
        'row': 3,
        'favorite_python_library': 'Matplotlib',
        'hobbies': ['Running', 'Stargazing', 'Reading']
    }
)

# query the classroom collection
classroom = db.classroom.find()

# see change in collection
for student in classroom:
    print(student)

{'_id': ObjectId('5fe41b1af81430bb63ab2b0e'), 'name': 'Ahmed', 'row': 3, 'favorite_python_library': 'Matplotlib', 'hobbies': ['Running', 'Stargazing', 'Reading']}


In [11]:
# Update a document
db.classroom.update_one(
    {'name': 'Ahmed'},
    {'$set':
        {'row': 4}
     }
)

# query the classroom collection
classroom = db.classroom.find()

# see change in collection
for student in classroom:
    print(student)

{'_id': ObjectId('5fe41b1af81430bb63ab2b0e'), 'name': 'Ahmed', 'row': 4, 'favorite_python_library': 'Matplotlib', 'hobbies': ['Running', 'Stargazing', 'Reading']}


In [12]:
# Add an item to a document array
db.classroom.update_one(
    {'name': 'Ahmed'},
    {'$push':
        {'hobbies': 'Listening to country music'}
     }
)

# query the classroom collection
classroom = db.classroom.find()

# see change in collection
for student in classroom:
    print(student)

{'_id': ObjectId('5fe41b1af81430bb63ab2b0e'), 'name': 'Ahmed', 'row': 4, 'favorite_python_library': 'Matplotlib', 'hobbies': ['Running', 'Stargazing', 'Reading', 'Listening to country music']}


In [13]:
# Delete a field from a document
db.classroom.update_one({'name': 'Ahmed'},
                        {'$unset':
                         {'row': ""}
                         }
                        )

# query the classroom collection
classroom = db.classroom.find()

# see change in collection
for student in classroom:
    print(student)

{'_id': ObjectId('5fe41b1af81430bb63ab2b0e'), 'name': 'Ahmed', 'favorite_python_library': 'Matplotlib', 'hobbies': ['Running', 'Stargazing', 'Reading', 'Listening to country music']}


In [14]:
# Delete a document from a collection
db.classroom.delete_one(
    {'name': 'Ahmed'}
)
from pprint import pprint

# query the classroom collection
classroom = db.classroom.find()

# see change in collection
for student in classroom:
    pprint(student)

# ==========================================

### 1.06 Students Do: Mongo Grove (0:25)

## Instructions

* You are the purchaser for the produce department of a large supermarket chain. You decide to use MongoDB to create a database of fruits received from your various suppliers.

### Part I

* Use Pymongo to create a `fruits_db` database, and a `fruits` collection.

* Into that collection, insert two documents of fruit shipments received by your supermarket. They should contain the following information: vendor name, type of fruit, quantity received, and ripeness rating (1 for unripe, 2 for ripe, 3 for over-ripe).

### Part II

* Because not every supermarket employee is versed in using MongoDB, your task is to build an easy-to-use app that can be run from the console.

* Build a Python script that asks the user for the above information, then inserts a document into a MongoDB database.

### Part III

* It would be good to Modify the app so that when the record is entered, the current date and time is automatically inserted into the document.

* Hint: consult the [documentation](https://docs.python.org/3/library/datetime.html) on the `datetime` library.


In [None]:
# Dependencies
import pymongo
import datetime

In [None]:
# The default port used by MongoDB is 27017
# https://docs.mongodb.com/manual/reference/default-mongodb-port/
conn = 'mongodb://localhost:27017'
client = pymongo.MongoClient(conn)

# Declare the database
db = client.fruits_db

# Declare the collection
collection = db.fruits_db

In [None]:
# Part I
# A dictionary that represents the document to be inserted
post = {
    'vendor': 'fruit star',
    'fruit': 'raspberry',
    'quantity': 21,
    'ripeness': 2,
    'date': datetime.datetime.utcnow()
}
# Insert the document into the database
# The database and collection, if they don't already exist, will be created at this point.
collection.insert_one(post)

In [None]:
# Part II
# Ask the user for input. Store information into variables.
vendor = input('Vendor name: ')
fruit_type = input('Type of fruit: ')
quantity = input('Number of boxes received: ')
ripeness = input('Ripeness of fruit (1 is unripe; 2 is ripe, 3 is over-ripe: ')

# A dictionary that will become a MongoDB document
post = {
    'vendor': vendor,
    'fruit': fruit_type,
    'quantity': quantity,
    'ripeness': ripeness,
    'date': datetime.datetime.utcnow()
}

# Insert document into collection
collection.insert_one(post)

In [None]:
# Verify results:
results = collection.find()
for result in results:
    pprint(result)

# ==========================================

### Rating Class Objectives

* rate your understanding using 1-5 method in each objective

In [None]:
objectives = [
    "Create and connect to local MongoDB databases",
    "Create, read, update, and delete MongoDB documents using the Mongo Shell",
    "Create simple Python applications that connect to and modify MongoDB databases using the PyMongo library",
]
rating = []
total = 0
for i in range(len(objectives)):
    rate = input(objectives[i]+"? ")
    total += int(rate)
    rating.append(objectives[i] + ". (" + rate + "/5)")
print("="*96)
print("My rating today is:")
print("-"*24)
for i in rating:
    print(i)
print("-"*64)
print("Average: " + str(total/len(objectives)))