# Loading a PostGIS Database

Often times, you can accumulate data from multiple sources and you want to persist it into a multilayer geodatabase.
While Shapefiles and GeoDB Files support this, they are inherently *single user*.
When you want to facilitate a *multi-user* environment or more advanced geospatial analytics via SQL,
instead of programming, PostGIS may be the solution you are looking for!

You have seen in previous courses and labs for this course how to query PostgreSQL and even PostGIS data.
In this practice, we are going to load PostGIS data!


# Data Sets

For this practice, you will build on what you have learned and your previous data carpentry skills to acquire, stage, ingest, and persist various datasets.

We will be accessing data linked at the US Government's Geospatial Platform: https://www.geoplatform.gov/


All the datasets will be in different formats. Some you may have seen, some will be new.
* [New Mexico Populated Places (GNIS), 2009](http://gstore.unm.edu/apps/rgis/datasets/c73b5e4d-fd64-4a2c-8a93-668e47d982d8/gnis_nm_poppl09.derived.csv)
* [Bureau of Land Management Land Grant Boundaries](http://gstore.unm.edu/apps/rgis/datasets/3d23ac95-2b28-4c1f-b5cc-b656133a018f/land_grants.original.zip/)

In the previous practice, you acquired these data sets.
If, for some reason you removed or cleaned up your data, please re-acquire using the code you wrote in the [Geospatial Data Carpentry Notebook](./GeoCarpentry.ipynb)

In [None]:
import pandas as pd

df=pd.read_csv('../temp/gnis_nm_poppl09.csv')

df.describe().transpose()

We see from the table above that we have the following columns:
 * SOURCE_LAT
 * Object_ID
 * FEATURE_ID
 * STATE_NUME
 * COUNTY_NUM
 * PRIM_LAT_D
 * PRIM_LONG1
 * SOURCE_LON
 * SOURCE_L_1
 * SOURCE_L_2
 * ELEVATION
 * observed

Read about this dataset [here](https://catalog.data.gov/dataset/new-mexico-populated-places-gnis-2009).

From some inspection, and digging into [related links](https://geonames.usgs.gov/) we can understand that *FEATURE_ID*, *PRIM_LAT_D*, *PRIM_LONG1*, and *ELEVATION* would be useful to have a map of populated places (location and elevation).

## Defining a PostGIS Table

Please review the [DBASE information sheet](../../resources/DSA_Student_DBASE_HotTo.pdf) for details on connecting to `dsa_student` PostGIS database using the `psql` CLI.  
Please review the [PostGIS information Sheet](../../resources/PostGIS_Info_Sheet.pdf) for details on creating a PostGIS table.

 * **NOTE:** Substitute your actual MU SSO for in the `SSO` below. 
 
```SQL
CREATE TABLE SSO.new_mexico_populated_places (
  feature_id INT,
  elevation real,
  CONSTRAINT pk_new_mexico_populated_places
    PRIMARY KEY (feature_id)
);
SELECT AddGeometryColumn('SSO','new_mexico_populated_places','coords',4326,'POINT',2);
CREATE INDEX idx_new_mexico_populated_places_point ON SSO.new_mexico_populated_places USING GIST (coords);
```

### Result Check:

```SQL
dsa_student=# \d new_mexico_populated_places 
  Table "scottgs.new_mexico_populated_places"
   Column   |         Type         | Modifiers 
------------+----------------------+-----------
 feature_id | integer              | not null
 elevation  | real                 | 
 coords     | geometry(Point,4326) | 
Indexes:
    "pk_new_mexico_populated_places" PRIMARY KEY, btree (feature_id)
    "idx_new_mexico_populated_places_point" gist (coords)

```

## Task : Reduce the data frame to just the desired columns

##### ('FEATURE_ID','ELEVATION','PRIM_LONG1','PRIM_LAT_D')

In [None]:
## M2:P2:Cell01
# Create a DF with limited original data
# ---------- Add your code below this line




# ----- Then check data types and ensure you have what you expect
df.dtypes

#### Load your password into Memory

In [None]:
import getpass
mypasswd = getpass.getpass()

#### Get a connection Object to the PostGIS database

In [None]:
import psycopg2
import numpy
from psycopg2.extensions import adapt, register_adapter, AsIs

connection = psycopg2.connect(database = 'dsa_student', 
                              user = 'scottgs', 
                              host = 'dbase.dsa.missouri.edu',
                              password = mypasswd)

#### Unload your password from memory

In [None]:
del mypasswd


## Task: Load data into database table


### Review the code below, looking at the structure

### Then, review these API links:

 * http://initd.org/psycopg/docs/extensions.html#sql-adaptation-protocol-objects
 * PostGIS Documentation for creating points
   * [GeomFromText](http://www.postgis.net/docs/ST_GeomFromText.html)
   * [MakePoint](https://postgis.net/docs/ST_MakePoint.html)
   * [PointFromText](https://postgis.net/docs/ST_PointFromText.html)
   
We are going to use *ST_MakePoint* as it is considered the fastest (most efficient).
Along with that, we will need to use the function [*ST_SetSRID*](https://postgis.net/docs/ST_SetSRID.html) to ensure the data is in the `4326` spatial reference sytem.

##### Below, we have provided the structure and some starter code.

 * Finally, [read about prepared statements](http://initd.org/psycopg/articles/2012/10/01/prepared-statements-psycopg/)


In [None]:
## M2:P2:Cell02
# Magic adapters for the Numpy Fun of Pandas
register_adapter(numpy.int64,AsIs)
register_adapter(numpy.float64,AsIs)

INSERT_SQL = 'INSERT INTO scottgs.new_mexico_populated_places '
INSERT_SQL += ' (feature_id, elevation, coords) values '
INSERT_SQL += # Add the rest of the insert statement that includes ST_MakePoint


# Note: The Commit Will Be Automatic after this with clause
with connection, connection.cursor() as cursor:
    for row in df.itertuples():  # pull each row as a tuple
        
        # This is an indexed Tuple
        print(row) 
        
        # TODO: This is needed to remove the index element
        data = # Add your code
        
        print(data)

        # TODO: Insert the row
        cursor.execute(  <add_code>  )


#### Check for Data:

```SQL
dsa_student=# select count(*) from scottgs.new_mexico_populated_places;
 count 
-------
  1702
(1 row)
```

#### Peak at it
```SQL
dsa_student=# select feature_id,elevation, ST_AsText(coords)
dsa_student-# from scottgs.new_mexico_populated_places 
dsa_student-# limit 5;
 feature_id | elevation |          st_astext           
------------+-----------+------------------------------
    2413618 |      1701 | POINT(-106.537676 34.607173)
    2375434 |      2123 | POINT(-106.329722 34.582778)
    2375433 |      2107 | POINT(-106.34 34.611944)
    2055903 |      1931 | POINT(-106.381135 35.082269)
    2413664 |      1471 | POINT(-106.73403 34.649593)
(5 rows)
```

# Your Turn: A second data set

The second dataset we will work with is [http://gstore.unm.edu/apps/rgis/datasets/3d23ac95-2b28-4c1f-b5cc-b656133a018f/land_grants.original.zip](http://gstore.unm.edu/apps/rgis/datasets/3d23ac95-2b28-4c1f-b5cc-b656133a018f/land_grants.original.zip).

### Task: Explore the layers in the file, then load the data into a GeoPanda data frame.

In [None]:
## M2:P2:Cell03
import fiona
GEODATA_FILE = '../temp/land_grants'



In [None]:
## M2:P2:Cell04
import geopandas as gpd

geo_df = # Add code here

In [None]:
geo_df.head()

In [None]:
print(geo_df.crs)

### NOTE: The EPSG (Coordinate Reference System) is 26913!

We want to have it in 4326 so it is in the most common CRS and compatible with our PostGIS data.

### Task: Use the GeoPandas built in functions for convert to CRS 4326

In [None]:
## M2:P2:Cell05

geo_df = # Your code here

### Task: Define your table, geometry column, and indexing

Write your SQL Statements below, then copy-and-paste into terminal database command line.

Note that you should end up with a **coords** column that is a SRID=4326 POLYGON of 2-D (Lon,Lat).

In [None]:
import getpass
mypasswd = getpass.getpass()

In [None]:
import psycopg2
import numpy
from psycopg2.extensions import adapt, register_adapter, AsIs

connection = psycopg2.connect(database = 'dsa_student', 
                              user = 'scottgs', 
                              host = 'dbase.dsa.missouri.edu',
                              password = mypasswd)
del mypasswd

---

## Pause: 

Think about what a polygon is relative to a point.
Imagine the programing "*fun*" of code-constructing Polygons for Insert statements

Because it will be the opposite of fun, we will use the friendly *ST_GeomFromText* function.
The challenge is how to build that WKT text that PostGIS wants?
Since a GeoSeries is a series of Shapely geoemtries, we can look to that [API](http://shapely.readthedocs.io/en/stable/manual.html).
Then, for an element of the tuple that is the Shapely Geometry, we can extract the WKT.

### Task: Load the data including the Polygon.  
#### The SQL provided is ready to use if your table columns match up.


In [None]:
## M2:P2:Cell06
# Magic adapters for the Numpy Fun of Pandas
register_adapter(numpy.int64,AsIs)
register_adapter(numpy.float64,AsIs)

# Note, ID is left off because it is SERIAL type and auto-incremented
INSERT_SQL = 'INSERT INTO scottgs.new_mexico_land_grants '
# This next line may need an edit based on your table construction.
INSERT_SQL += ' (area,grant_conf,grant_name,land_grant,land_gra_1,perimeter,survey_app,coords) '
INSERT_SQL += ' values (%s,%s,%s,%s,%s,%s,%s,ST_GeomFromText(%s, 4326))'

# Note: The Commit Will Be Automatic after this with clause
with connection, connection.cursor() as cursor:
    for row in geo_df.itertuples():  # pull each row as a tuple
        
        # This is an indexed Tuple
        print(row) 
        
        # This is needed to remove the index element
        data = # Add your code here.
        
        print(data)
        cursor.execute(INSERT_SQL,data)


### Verify

```SQL
dsa_student=# select count(*) from scottgs.new_mexico_land_grants;
 count 
-------
   222
(1 row)
```

```DSA
dsa_student=# select id,grant_name, grant_conf,st_area(coords) from scottgs.new_mexico_land_grants limit 5;
 id |      grant_name      | grant_conf |     st_area      
----+----------------------+------------+------------------
  1 | TIERRA AMARILLA      | 1860-06-21 | 2004791148.87488
  2 | SANGRE DE CRISTO     | 1860-06-21 | 890653352.742407
  3 | BEAUBIEN AND MIRANDA | 1860-06-21 | 5899487289.42897
  4 | ANTOINE LEROUX       | 1869-03-03 | 63101507.3669469
  5 | ARROYO HONDO         | 1900-12-18 |  74328127.180984
(5 rows)
```

In [None]:
## M2:P2:Cell07

check_sql= "select id,grant_name, grant_conf, coords from scottgs.new_mexico_land_grants"


gdf= # Add code here
gdf.head()

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline


gdf.plot(figsize=(15,15))

## Write a little Spatial SQL

Write an SQL Query to Count the number of populate places within each Land Grant polygon, showing the only the top 10!

Write your SQL below and Copy-Paste into the CLI for the database.
Also, please past your results in the cell as well.

#### Expected Output:

```SQL
 id  |         grant_name          | count 
-----+-----------------------------+-------
 216 | NON GRANT                   |  1176
  29 | MORA                        |    47
   3 | BEAUBIEN AND MIRANDA        |    44
  66 | LAS VEGAS                   |    30
 191 | TOME                        |    15
   1 | TIERRA AMARILLA             |    14
 138 | SAN MIGUEL DEL BADO TRACT 2 |    12
 198 | SEVILLETA                   |    11
  49 | SANTA CLARA PUEBLO          |    11
 203 | PEDRO ARMENDARIZ NO. 33     |    10
(10 rows)
```

# Save Your Notebook
## Then Notebook Menu: File > Close and Halt