# Docker experiments
## Connecting to a Mongo database in a container

We will connect to Mongo db using the pymongo library; you need to have a mongo db server running. If using Docker, try:

```docker run -d -p 27017:27017 --name myMongoDB mongo:latest```

We will then:

1. Make a new database called movie_db
2. Create a collection called fave_movies
3. Add some records to the database
4. Query those records back
5. Add an extra new record and show the number of records has changed

In [1]:
import pandas as pd
from pymongo import MongoClient

In [2]:
# This establishes a connection to Mongo

client = MongoClient('localhost', 27017)

In [3]:
# Connect to a database called "movie_db", 
# if it doesn't exist it will create a new database
db = client["movie_db"]

In [4]:
# Connect to a collection called "fave_movies", 
# if it doesn't exist it will create a new collection
collection = db["fave_movies"]

In [5]:
# Make some fake data

sample_movie1 = {
    "Title": "Star Wars",
    "Director": "George Lucas",
    "Year": 1977
}

sample_movie2 = {
    "Title": "Tenet",
    "Director": "Christopher Nolan",
    "Year": 2020    
}

In [6]:
all_data = [sample_movie1, sample_movie2]

In [7]:
all_data

[{'Title': 'Star Wars', 'Director': 'George Lucas', 'Year': 1977},
 {'Title': 'Tenet', 'Director': 'Christopher Nolan', 'Year': 2020}]

In [8]:
# Show there are no records in our collection yet
collection.estimated_document_count()

0

In [9]:
# Bulk insert our list of movies

result = collection.insert_many(all_data)

In [10]:
# Query a random record (its not really random, it is the 1st record)
collection.find_one()

{'_id': ObjectId('603d7a8566f96887a2ebc7d9'),
 'Title': 'Star Wars',
 'Director': 'George Lucas',
 'Year': 1977}

In [11]:
# Extract all the records and put them in a DataFrame

pd.DataFrame(collection.find())

Unnamed: 0,_id,Title,Director,Year
0,603d7a8566f96887a2ebc7d9,Star Wars,George Lucas,1977
1,603d7a8566f96887a2ebc7da,Tenet,Christopher Nolan,2020


In [12]:
# Show there are 2 records in our collection so far
collection.estimated_document_count()

2

In [13]:
# Add a new movie with additional fields

sample_movie3 = {
    "Title": "Batman",
    "Director": "Christopher Nolan",
    "Actor": "Christian Bale",
    "Year": 2020,
    "BoxOffice": 10000000
}

result = collection.insert_one(sample_movie3)

In [14]:
# Show the number of records has increased

collection.estimated_document_count()

3

In [15]:
# If we store the data in a DataFrame we get blank fields

pd.DataFrame(collection.find())

Unnamed: 0,_id,Title,Director,Year,Actor,BoxOffice
0,603d7a8566f96887a2ebc7d9,Star Wars,George Lucas,1977,,
1,603d7a8566f96887a2ebc7da,Tenet,Christopher Nolan,2020,,
2,603d7a8566f96887a2ebc7db,Batman,Christopher Nolan,2020,Christian Bale,10000000.0


In [17]:
# If we store the data in a list we only get the original fields

for record in list(collection.find()):
    print(record)

{'_id': ObjectId('603d7a8566f96887a2ebc7d9'), 'Title': 'Star Wars', 'Director': 'George Lucas', 'Year': 1977}
{'_id': ObjectId('603d7a8566f96887a2ebc7da'), 'Title': 'Tenet', 'Director': 'Christopher Nolan', 'Year': 2020}
{'_id': ObjectId('603d7a8566f96887a2ebc7db'), 'Title': 'Batman', 'Director': 'Christopher Nolan', 'Actor': 'Christian Bale', 'Year': 2020, 'BoxOffice': 10000000}
