# Mongodb Connection
The Mongo database is fairly straight forward to connect to. Before connecting to the database you need to install the pymongo driver in your environment.

Once theses have been installed you can navigate to the mongodb atlas 'austin-green-energy' cluster on your web browser and select connect. You'll then select 'connect your application.' On the next screen select python and your version of python (this example uses python 3.6 or later) and it will generate the connection string. Copy the connection string and paste it below. Be sure to change the default database and include a config file with your username and password.

config file:
Your config file needs to have the USERNAME and PASSWORD variables.

USERNAME = "your username"  
PASSWORD = "your password"

In [1]:
# import dependencies
import config
import pymongo
import pandas as pd
import json

## Create Connection String and Test Connection

The first thing to do is to create the connection string by pulling the username and password from the config file (be sure to include the .gitignore). Then we use a try and excpet block to make sure we are connected to the database.

You might also need to install pymongo-srv (you can see this in the first part of the connection string "mongo+srv"). [This posting](https://stackoverflow.com/questions/52930341/pymongo-mongodbsrv-dnspython-must-be-installed-error) on it can help you install the package. Try the first answer and then go through the rest of the posting if that doesn't work.

In [2]:
# set string variables
DEFAULT_DATABASE = 'wind_solar_data' 
USERNAME = config.USERNAME
PASSWORD = config.PASSWORD

#create connection to database
client = pymongo.MongoClient(f"mongodb+srv://{USERNAME}:{PASSWORD}@austin-green-energy.pwzpm.mongodb.net/{DEFAULT_DATABASE}?retryWrites=true&w=majority")
try:
    client.server_info()
    print("Mongodb connected")
except:
    print("The Mongodb failed to connect. Check username/password in connection string.")


Mongodb connected


## Pull the Data Sets from the Database
There are two data sets to pull into dataframes, the wind and the solar data. The collection wind_solar_data has two collections, wind_data and solar_data.

### Wind Data

In [3]:
# select database
db = client.get_database('wind_solar_data')
# select collection
collection = db.wind_data

# pull collection into dataframe
wind_df = pd.DataFrame(list(collection.find()))
wind_df


Unnamed: 0,_id,Date_Time,Year,Month,Day,Hour,MWH,MWH_perTurbine,Temperature_F,Humidity_percent,WindSpeed_mph,WindGust_mph,WindDirection_degrees,WindDirection_compass,Weather_Description
0,5f98662ac1c5e33be427ce93,2019-01-01 00:00:00,2019,1,1,0,5.009100,0.069571,35,73,12,24,126,SE,Clear
1,5f98662ac1c5e33be427ce94,2019-01-01 01:00:00,2019,1,1,1,110.487950,1.534555,35,74,13,23,89,E,Clear
2,5f98662ac1c5e33be427ce95,2019-01-01 02:00:00,2019,1,1,2,72.020225,1.000281,35,76,14,23,53,NE,Clear
3,5f98662ac1c5e33be427ce96,2019-01-01 03:00:00,2019,1,1,3,67.639475,0.939437,35,77,15,22,17,NNE,Clear
4,5f98662ac1c5e33be427ce97,2019-01-01 04:00:00,2019,1,1,4,63.718900,0.884985,35,77,14,21,18,NNE,Clear
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13866,5f98662ac1c5e33be42804bd,2020-07-31 19:00:00,2020,7,31,19,10.764125,0.149502,82,35,8,11,104,ESE,Patchy rain possible
13867,5f98662ac1c5e33be42804be,2020-07-31 20:00:00,2020,7,31,20,4.998600,0.069425,82,39,8,12,78,ENE,Patchy rain possible
13868,5f98662ac1c5e33be42804bf,2020-07-31 21:00:00,2020,7,31,21,16.390275,0.227643,82,43,7,13,52,NE,Patchy rain possible
13869,5f98662ac1c5e33be42804c0,2020-07-31 22:00:00,2020,7,31,22,20.637800,0.286636,82,47,7,13,55,NE,Patchy rain possible


### Solar Data

In [4]:
# select database
db = client.get_database('wind_solar_data')
# select collection
collection = db.solar_data

# pull collection into dataframe
solar_df = pd.DataFrame(list(collection.find()))
solar_df

Unnamed: 0,_id,Date_Time,Year,Month,Day,Hour,MWH,MWH_perPanel,Temperature_F,Humidity_percent,Sunhour,CloudCover_percent,uvIndex,Weather_Description
0,5f986632c1c5e33be42804c2,2019-01-01 00:00:00,2019,1,1,0,0.0,0.0,43,88,6.7,0,1,Clear
1,5f986632c1c5e33be42804c3,2019-01-01 01:00:00,2019,1,1,1,0.0,0.0,43,88,6.7,0,1,Clear
2,5f986632c1c5e33be42804c4,2019-01-01 02:00:00,2019,1,1,2,0.0,0.0,43,89,6.7,0,1,Clear
3,5f986632c1c5e33be42804c5,2019-01-01 03:00:00,2019,1,1,3,0.0,0.0,43,90,6.7,0,1,Clear
4,5f986632c1c5e33be42804c6,2019-01-01 04:00:00,2019,1,1,4,0.0,0.0,43,90,6.7,0,1,Clear
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13866,5f986632c1c5e33be4283aec,2020-07-31 19:00:00,2020,7,31,19,0.0,0.0,79,58,6.9,73,1,Partly cloudy
13867,5f986632c1c5e33be4283aed,2020-07-31 20:00:00,2020,7,31,20,0.0,0.0,79,62,6.9,73,1,Partly cloudy
13868,5f986632c1c5e33be4283aee,2020-07-31 21:00:00,2020,7,31,21,0.0,0.0,79,66,6.9,73,1,Partly cloudy
13869,5f986632c1c5e33be4283aef,2020-07-31 22:00:00,2020,7,31,22,0.0,0.0,79,71,6.9,73,1,Partly cloudy


## Useful Functions

In [5]:
# list all of the databases
for db in client.list_databases():
    print(db)

{'name': 'sample_airbnb', 'sizeOnDisk': 54894592.0, 'empty': False}
{'name': 'sample_analytics', 'sizeOnDisk': 9895936.0, 'empty': False}
{'name': 'sample_geospatial', 'sizeOnDisk': 983040.0, 'empty': False}
{'name': 'sample_mflix', 'sizeOnDisk': 42336256.0, 'empty': False}
{'name': 'sample_restaurants', 'sizeOnDisk': 5865472.0, 'empty': False}
{'name': 'sample_supplies', 'sizeOnDisk': 983040.0, 'empty': False}
{'name': 'sample_training', 'sizeOnDisk': 42512384.0, 'empty': False}
{'name': 'sample_weatherdata', 'sizeOnDisk': 2490368.0, 'empty': False}
{'name': 'wind_solar_data', 'sizeOnDisk': 3145728.0, 'empty': False}
{'name': 'admin', 'sizeOnDisk': 286720.0, 'empty': False}
{'name': 'local', 'sizeOnDisk': 4082724864.0, 'empty': False}


In [6]:
# # Uploading the wind data to the Database

# # select database
# db = client.get_database('wind_solar_data')
# # select collection
# collection = db.wind_data

# # pull the csv from file
# wind_data = pd.read_csv('..\Output\Hackberry_Wind_MWH.csv')    
# # turn the CSV into a JSON
# wind_data_json = json.loads(wind_data.to_json(orient='records'))

# # remove what is in the collection cureently
# collection.remove()
# # insert the new JSON data into the database
# collection.insert(wind_data_json)

In [7]:
# # Uploading the wind data to the Database

# # select database
# db = client.get_database('wind_solar_data')
# # select collection
# collection = db.solar_data

# # pull the csv from file
# solar_data = pd.read_csv('..\Output\Webberville_Solar_MWH.csv')    
# # turn the CSV into a JSON
# solar_data_json = json.loads(solar_data.to_json(orient='records'))

# # remove what is in the collection cureently
# collection.remove()
# # insert the new JSON data into the database
# collection.insert(solar_data_json)