# Go Further with PostGIS and Plotly
Assumptions: <br>
- PostgreSQL is installed locally

## Let's Setup

Download and unzip the Plotly Database Connector app

In [None]:
!wget https://github.com/plotly/plotly-database-connector/releases/download/v0.0.7-alpha/Plotly.Database.Connector-Mac.zip

In [None]:
!unzip Plotly.Database.Connector-Mac.zip -d ./

Download and unzip the data required for this example

In [None]:
!wget https://github.com/plotly/plotly-database-connector/tree/master/examples/postgis/dc/dc_census.zip

In [None]:
!unzip dc_census.zip -d ./

Start your postgres server with a command similar to this (depending on where you installed PostgreSQL)<BR>
$ postgres -D /usr/local/pgsql/data

In [None]:
# Create a new postgreSQL database called 'dc_census_tracts'
!createdb dc_census_tracts

In [None]:
# add postgis language to the postgis database
!createlang plpgsql dc_census_tracts
# will get the following if it is there already :
# $ language "plpgsql" is already installed in database "postgis"

In [None]:
# install the postgis extensions to the postgis database
!psql -d dc_census_tracts -c "CREATE EXTENSION postgis;"
!psql -d dc_census_tracts -c "CREATE EXTENSION postgis_topology;"
# will get the following if they are already installed:
# ERROR:  extension "postgis" already exists
# ERROR:  extension "postgis_topology" already exists

In [None]:
!cd dc_census && ls

In [None]:
# Import shapefile
!shp2pgsql -c -D -s 4269 -I dc_census/tl_2010_11001_tract10.shp dc_census_tracts | psql -d dc_census_tracts

Create a table in the database by entering the pgsql prompt: <br>
$ psql dc_census_tracts
and entering the following SQL query into `dc_census_tracts=#` prompt
```
CREATE TABLE dc_census_data (GEOID varchar(11), SUMLEV varchar(3), STATE varchar(2), COUNTY varchar(3), CBSA varchar(5), CSA varchar(3), NECTA integer, CNECTA integer, NAME varchar(30), POP100 integer, HU100 integer, POP1002000 integer, HU1002000 integer, P001001 integer, P0010012000 integer);
```

In [None]:
!cat dc_census/all_140_in_11.P1.csv | psql -d dc_census_tracts -c 'COPY dc_census_data FROM STDIN WITH CSV HEADER'

## Let's Connect To Our Database

Start up the Plotly Database Application and connect to the `dc_census_tracts` database.

In [None]:
!open ./Plotly\ Database\ Connector-darwin-x64/Plotly\ Database\ Connector.app

Follow the instructions until you are connected and can view the desired tables.

In [None]:
import pandas as pd

In [1]:
import requests

Let's make sure the app is connected by using it's API.

In [None]:
auth = requests.get('http://localhost:5000/v1/authenticate')

In [None]:
auth.json()

The API permits us to switch databases if we have to, since we want to use `dc_census_data` selecting a database here is optional but here is how that would work:

In [None]:
connectDatabase = requests.get('http://localhost:5000/v1/selectdatabase?database=dc_census_tracts')

In [None]:
connectDatabase.json()

We just created two tables in that databse: `dc_census_tacts` and `dc_census_data`. Let's make sure they are there by retreiving the list of tables from our database.

In [None]:
tables = requests.get('http://localhost:5000/v1/tables')

In [None]:
tables.json()

Looks like both `u'dc_census_tracts` and `u'dc_census_data` are there!

## Let's Explore Our Data

In [None]:
response = requests.get('http://localhost:5000/v1/preview?tables=dc_census_tracts,dc_census_data')

The `response` received has a `previews` object that contains the first five rows of each table specified as in the above `tables` parameter of the request. Let's get into the table `dc_census_tracts` and see the geojson of the fifth row (index 4) row only. To get the geojson object we enter in to the `geom` key.

In [None]:
response.json()['previews'][0]['dc_census_tracts']['raw'][0]['geom']

Looks like the data is a collection of complex Polygons.

Let's look at the other table, `dc_census_data`, whose preview is also in our response object.

In [None]:
df = pd.DataFrame(response.json()['previews'][1]['dc_census_data']['rows'])
df.columns = response.json()['previews'][1]['dc_census_data']['columnnames']

In [None]:
df

Looks like it has population data for each county

## Let's Extract Our Data

We can define right from the start how much data exactly we want to visualize. Use an integer as `LIMIT` value or set it simply to be null i.e. `LIMIT = ''`

In [4]:
LIMIT = '' #'LIMIT 100'

Let's combine both tables and do some analysis.

Right before, let's add a column that will have the centroid of each county. <br>
Run these commands in the psql prompt.

`ALTER TABLE "dc_census_tracts" ADD centroid_geom geometry;` <br>
`UPDATE "dc_census_tracts" SET centroid_geom = ST_Centroid(geom);`

In [5]:
query = 'SELECT * from dc_census_tracts JOIN dc_census_data on dc_census_tracts.geoid10 = dc_census_data.geoid ' + LIMIT

With the connector API we can send our own queries as well

In [6]:
queryResponse = requests.get('http://localhost:5000/v1/query?statement=' + query)

In [None]:
# queryResponse.json()

Looks like we have the data we need, let's create a geometries object that we can use when drawing shapes using plotly! These geometry objects are inside our data under the `geom` key.

## Let's Process Our Data

We only need the raw response from PostGIS, let's put that into a local variable and go from there.

In [7]:
locations = queryResponse.json()['raw']

#### 1 Sectors of Counties

In [8]:
geometries = [location['geom'] for location in locations]

In [9]:
geojsons = [{
    "type": "FeatureCollection",
    "features": [{
        "type": "Feature",
        "properties": {},
        "geometry": {
            "type": "GeometryCollection",
            "geometries": [geometry]
        }
    }]
} for geometry in geometries]

#### 2 Centroids of Counties

In [10]:
centroids = [location['centroid_geom'] for location in locations]

In [None]:
# centroids

#### 3 Populations of Counties

In [11]:
populations = [location['pop100'] for location in locations]

In [None]:
# populations

#### 4 A map of DC has to have the White House on it...

In [12]:
USA_HQ = dict(
            lon='-77.0365',
            lat='38.8977'
        )

## Let's Make a Plot!

In [13]:
import plotly.plotly as py
import plotly.tools as tls
from plotly.graph_objs import *

In [14]:
mapbox_access_token = 'pk.eyJ1IjoiY2hyaWRkeXAiLCJhIjoiRy1GV1FoNCJ9.yUPu7qwD_Eqf_gKNzDrrCQ'

In [16]:
data = Data([
    Scattermapbox(
        name='USA HQ',
        lat=['38.8977'],
        lon=['-77.0365'],
        mode='markers',
        marker=Marker(
            size=10
        ),
        text=['Barack Obama lives in this house']
    ),
    Scattermapbox(
        name='County Populations',
        lat=[str(centroid['coordinates'][1]) for centroid in centroids],
        lon=[str(centroid['coordinates'][0]) for centroid in centroids],
        mode='markers',
        marker=Marker(
            size=10
        ),
        text=[str(population) + ' people live in this county' for population in populations]
    )
])

In [17]:
layout = Layout(
    autosize=True,
    hovermode='closest',
    mapbox=dict(
        accesstoken=mapbox_access_token,
        center=dict(
            lat=38.8977,
            lon=-77.0365
        ),
        pitch=0,
        zoom=12,
        layers=[
            {
                'sourcetype':'geojson',
                'source': geojson,
                'type': 'fill',
                'color': 'rgba(30, 30, 30, 0.2)'            
            } for geojson in geojsons
        ]
    )
)

In [18]:
fig = dict(data=data, layout=layout)

In [19]:
tls.set_credentials_file(username='alexandres', api_key='1mfdjhzsd3')

In [None]:
py.iplot(fig, filename='dc_census')