# Project 2
### Python Backend: Load Tsunami TSV file to MongoDB

#### Requirements:
* Ensure that the input TSV file is located in the same directory as this Jupyter Notebook file
* Ensure that the MongoDB mongod process is running.  On Mac OS, use "ps aux | grep -v grep | grep mongod" to verify that the mongod process is active

#### Imports

In [2]:
import pandas as pd
import pymongo
import numpy as np
import csv
import pprint

#### Create MongoDB connection

In [3]:
# The default port used by MongoDB is 27017
# https://docs.mongodb.com/manual/reference/default-mongodb-port/
conn = 'mongodb://localhost:27017'
client = pymongo.MongoClient(conn)

# Define the 'Project2' database in Mongo
db = client.Project2

#### Read input TSV file

In [4]:
input_tsv_file = "runups-2000_to_2020.tsv"

In [None]:
# Declare an empty list and empty dictionary
tsunami_runups_lst = []
tsunami_runups_row_json = {}

# Open the input TSV file for reading
with open(input_tsv_file) as tsvfile:
    reader = csv.reader(tsvfile, delimiter='\t')

    # Save header
    tsv_header = next(reader)
    
    # Skip row that has the search parameters
    junk = next(reader)
    
    for row in reader:
        for x in range(len(row)):
            # Build JSON object for one row of data
            tsunami_runups_row_json[tsv_header[x]] = row[x]
        # Add to array of JSONs
        tsunami_runups_lst.append(tsunami_runups_row_json)
        tsunami_runups_row_json = {}


#### Query existing tsunami data collection
* Use this to count how many records currently exist in the tsunami data collection

In [11]:
# Query 'tsunamidata_collection'
record_count = db.tsunamidata_collection.count_documents({})
print(f"Number of tsunami data records in DB: {record_count}")

Number of tsunami data records in DB: 14119


#### Remove existing rows from tsunami data collection
* Use this to remove existing records before inserting new records based on the input file

In [8]:
# Empty the collection
db.tsunamidata_collection.delete_many({})

<pymongo.results.DeleteResult at 0x7fb0a18d24c0>

#### Insert tsunami data collection records
* Insert all records from the input file
* Recommend using the record count query mentioned earlier to check database document count

In [10]:
db.tsunamidata_collection.insert_many(tsunami_runups_lst)

<pymongo.results.InsertManyResult at 0x7fb0a40dc2c0>