# How to retreive data into Python from Postgres and/or files in an S3 bucket

## Retrieiving data from Postgres tables

In [None]:
# Imports
import psycopg2
from sqlalchemy import create_engine
import pandas as pd

Special note: Maya is awesome for figuring out that sqlalchemy piece!  :-)

In [None]:
# Set up the connection to the database
db_loc = 'postgresql+psycopg2://postgres:pass@localhost:5432/solarenergy'
engine = create_engine(db_loc)

In [None]:
# Read contents of a postgres table into a dataframe
# IMPORTANT: If you want the dataframe to add a new index,
# remove the index_col argument out of the below code.
# If you're unsure how this would look, just run the code below
# with and without the index_col argument, then take a look
# at the resulting dataframe to see the difference
test_df = pd.read_sql('SELECT * FROM plants', engine, index_col = "plant_id")

The above code goes at the top of any script that will be analyzing the data in any way. Note that how you set up the database with the read_sql code above will determine how you query it in this script.  The dataframe created in a single script will only persist within that script.

In [None]:
# Always take a look at the dataframe while you are building the code!
test_df

### Retrieving data from CSV files 

In [None]:
# Assumption 1: the file (test.csv) is in the same folder
# from which this command is run.  If not, just add
# the full path to the file inside the quotes, i.e.,
# pd.read_csv('/home/w205/whatever_name.csv', etc...

# Asumption 2: that the index column in the csv is 
# named "index".  This, of course, won't be the same in 
# our working files.
# The solution:  take a look at the CSV file and note
# the column name of the first column, which is presumably
# what should be turned into the index in the dataframe.
# Then replace the word 'index' in the code below with 
# that column name.
# Unless, of course, you want this dataframe to create a 
# new numerical index.  Then remove the index_col argument.
test_df = pd.read_csv('test.csv', index_col = 'index')

In [None]:
# As when retrieving data from a Postgres table,
# always take a look at the data before buliding code!
test_df

### Retrieving a file from an S3 bucket via Python

Note: writing code to programmatically copy CSV files from the AMI to S3 turned out to require some work with permissions that was beyond me (Laura). So that particular piece is not included in the project code.

I manually uploaded CSV files created from the Postgres tables to my Amazon S3 account. Anyone can download those files with the following code.  
** MAKE SURE TO ENTER IT ON THE COMMAND LINE OR ADD ! BEFORE IT IF RUNNING IT FROM A NOTEBOOK

In [None]:
wget http://s3.amazonaws.com/w205.project.sunshine/csv_files/plants.csv
wget http://s3.amazonaws.com/w205.project.sunshine/csv_files/generation.csv
wget http://s3.amazonaws.com/w205.project.sunshine/csv_files/solar.csv
wget http://s3.amazonaws.com/w205.project.sunshine/csv_files/uscrn.csv
wget http://s3.amazonaws.com/w205.project.sunshine/csv_files/stations.csv