# Geospatial Data Carpentry

For this Exercise, you will build on what you have learned of geospatial data and your previous data carpentry skills to acquire, stage, ingest, and render various datasets.

We will be accessing data linked at the US Government's Geospatial Platform: https://www.geoplatform.gov/


All the datasets will be in different formats. Some you may have seen, some will be new.
 * [New Mexico Populated Places (GNIS), 2009](http://gstore.unm.edu/apps/rgis/datasets/c73b5e4d-fd64-4a2c-8a93-668e47d982d8/gnis_nm_poppl09.derived.csv)
 * [Bureau of Land Management Land Grant Boundaries](http://gstore.unm.edu/apps/rgis/datasets/3d23ac95-2b28-4c1f-b5cc-b656133a018f/land_grants.original.zip/)
 * http://gstore.unm.edu/apps/rgis/datasets/b4ae8f53-8dff-46bb-9058-e5501cabdd1b/school_district_boundaries.derived.gml
 * http://gstore.unm.edu/apps/rgis/datasets/ab17adb4-0992-436b-8ae4-575d8405d188/gpsrdsddshp.derived.kml

These datasets, while discoverable on geoplatform.gov are hosted at the University of New Mexico.

## Exercise Prerequisite
In the module 2 practices, you ingested the first two data sources into PostGIS.
This is necessary, as you will ingest additional files and then execute Geospatial queries against the ingested data.

# File 1: Geography Markup Format (GML) format

 * http://gstore.unm.edu/apps/rgis/datasets/b4ae8f53-8dff-46bb-9058-e5501cabdd1b/school_district_boundaries.derived.gml

This file is a GML formatted file of school district boundaries.

Read about the GML format here: https://en.wikipedia.org/wiki/Geography_Markup_Language


Acquire and stage the GML into the `../temp/` folder as you did in the practices.


In [None]:
import urllib.request
import shutil
from pathlib import Path

In [None]:
## M2:E1:Cell01
## ----- Add Acquisition Code Below -----------














You should then be able to peak at the first few lines of the file:

```BASH
$ head -n8 school_district_boundaries.gml
<?xml version="1.0" encoding="UTF-8"?>
    <gml:FeatureCollection 
        xmlns:gml="http://www.opengis.net/gml" 
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
        xmlns:xlink="http://www.w3.org/1999/xlink"
        xmlns:ogr="http://ogr.maptools.org/">
    <gml:description>NM School District Boundaries</gml:description>
    <gml:featureMember><ogr:g_school_district_boundaries><ogr:geometryProperty><gml:Polygon srsName="EPSG:4326"><gml:outerBoundaryIs><gml:LinearRing><gml:coordinates>-105.505039000052733,35.870676999750749 -105.354281999606613,35.870495999899177 ...
 -105.505039000052733,35.870676999750749</gml:coordinates></gml:LinearRing></gml:outerBoundaryIs></gml:Polygon></ogr:geometryProperty><ogr:id>18071698</ogr:id><ogr:COUNTY>San Miguel</ogr:COUNTY><ogr:CNTY_CODE>47</ogr:CNTY_CODE><ogr:DIST_CODE>69</ogr:DIST_CODE><ogr:NAMEPROPER>Las Vegas City</ogr:NAMEPROPER><ogr:Shape_Area>3283502950.87</ogr:Shape_Area><ogr:observed></ogr:observed></ogr:g_school_district_boundaries></gml:featureMember>
```

#### Confirm the file name looks correct and then attempt to examine the file with Fiona

In [None]:
import fiona

## ------ 
# Fill in the Filename as appropriate 
# based on acquisition above
GEODATA_FILE = ## ?

# Expect an error
numLayers = len(fiona.listlayers(GEODATA_FILE))
print("'{}' has {} layers".format(file_Path,numLayers))

## Fiona + GML = Error

So, we have a file that looks like XML that we cannot use a simple library to load it into GeoPandas.
You will need to recall your previous experience in classes and doing data carpentry.
GML is a hierarchical [Document Object Model (DOM)](https://developer.mozilla.org/en-US/docs/Web/API/Document_Object_Model/Introduction) similar to [HTML](https://www.w3schools.com/html/), as it is a type of [XML](https://www.w3schools.com/xml/).

## Processing GML

Programmatically, we must do the following:
 1. Parse file into a DOM
 1. Find all `<gml:featureMember>`
 1. Parse each **`featureMember`** into a record with:
    * Polygon
    * County
    * County Code
    * District Code
    * Name Proper
    * Shape Area
 1. Add each record into the database
 
Prerequisite: Create an appropriate PostGIS table to hold the data as noted by the properties above.



### Task: Define your table, geometry column, and indexing

Write your SQL Statements below, then copy-and-paste into terminal database command line.

Note that you should end up with a **coords** column that is a SRID=4326 POLYGON of 2-D (Lon,Lat).

### Task: Load the table with geospatial data!

In [None]:
## M2:E1:Cell03
## ----- Add Connection Setup Code Below -----------

import getpass
import psycopg2
mypasswd = getpass.getpass()
connection = psycopg2.connect(database = ### ?
                              user = ### ? 
                              host = ### ?
                              password = ### ?
                             )
del mypasswd

In [None]:
## M2:E1:Cell04
## ----- Add Code Below -----------
















### Check the data exists!

 * Replace SSO with your SSO
 
```SQL
select count(*) 
 from SSO.new_mexico_school_districts;
 count 
-------
    89
(1 row)


```SQL
select id,county,county_nbr,dist_code
      ,name_proper,shape_area,st_area(coords) 
 from SSO.new_mexico_school_districts 
 limit 10;
 id |   county   | county_nbr | dist_code |  name_proper   | shape_area  |       st_area        
----+------------+------------+-----------+----------------+-------------+----------------------
  1 | San Miguel |         47 |        69 | Las Vegas City |  3.2835e+09 |    0.327119394040282
  2 | Colfax     |          7 |         8 | Cimarron       | 3.72951e+09 |    0.376029998974695
  3 | Union      |         59 |        84 | Clayton        |  6.8283e+09 |     0.68589709019756
  4 | Taos       |         55 |        76 | Taos           | 1.64788e+09 |    0.165643766421673
  5 | Mora       |         33 |        44 | Mora           | 1.92629e+09 |    0.192729991835299
  6 | Sierra     |         51 |        73 | T or C         | 1.09801e+10 |     1.06067417213241
  7 | Rio Arriba |         39 |        54 | Dulce          | 3.56663e+09 |    0.359921591946156
  8 | Farmington |         45 |        64 | Aztec          | 1.12842e+09 | 8.73823529385122e-05
  9 | Farmington |         45 |        65 | Farmington     | 2.03918e+09 |    0.205177989999736
 10 | San Juan   |         45 |        67 | Central        | 7.30004e+09 |    0.732714906827257
(10 rows)
```

### Task: Query the database to answer a couple questions:

#### Q1: List the counties with more than 3 school districts, in descending order by number of school districts, then in alphabetic order within a group of the same number of school districts.

#### Q2: List the top 5 school districts, in descending order by number of populated places.

#### Q3 (Optional): List the top 3 counties based on total size of school districts, in descending order by size.  List the size in square kilometers!

Expected Answer:

```SQL
  county  |      sqr_km      
----------+------------------
 Catron   | 19532.8696952213
 Otero    |  16983.887376987
 McKinley |  14471.088162269
(3 rows)
```

---

# File 2: Keyhole Markup Language (KML) format

 * http://gstore.unm.edu/apps/rgis/datasets/ab17adb4-0992-436b-8ae4-575d8405d188/gpsrdsddshp.derived.kml
 

This file is a KML formatted file of GPS coordinates of roads in New Mexico.
Read more [here](https://catalog.data.gov/dataset/gps-roads).

KML is a file format used to display geographic data in an Earth browser such as Google Earth. 
KML uses a tag-based structure with nested elements and attributes and is based on the XML standard. 
All tags are case-sensitive and must appear exactly as they are listed in the KML Reference. 

Read about KML [here](https://developers.google.com/kml/documentation/kml_tut)

Acquire and stage the KML into the `../temp/` folder as you did in the practices.

## Processing 

Basically, KML is similar in nature to GML.  
So, for the last file you will repeat the exercise above with a changes necessary to import KML instead of GML and into a new table.

The elements we will process are **`Placemark`**s.
These Placemarks have **`LineString`** geometries with a **`coordinates`** list.
```
<Placemark id="17948705"><name>17948705</name>
   <LineString><coordinates>-107.915138244628793,36.809299468994105 -107.915 ... </coordinates></LineString>
   <ExtendedData><SchemaData schemaUrl="#attributes">
     <SimpleData name="NAME">I 40</SimpleData>
     <SimpleData name="TYPE">State Highway</SimpleData>
...
</Placemark>
```
  * There are numerous additional fields, however we will limit our parsing and loading to the Name and Type.

Programmatically, we must do the following:
 1. Parse file into a DOM
 1. Find all `<Placemark>`
 1. Parse each **`Placemark`** into a record with:
    * LineString
    * Name
    * Road Type
 1. Add each record into the database


### Task: Define your table, geometry column, and indexing

Write your SQL Statements below, then copy-and-paste into the terminal database command line.

Note that you should end up with a **coords** column that is a SRID=4326 LINESTRINF of 2-D (Lon,Lat).

### Task: Load the table with geospatial data!

 * **NOTE:** You may need to rerun the DB Connection Cell Above

In [None]:
## M2:E1:Cell08
## ----- Add Code Below -----------

GEODATA_FILE = '../temp/gpsrdsddshp.derived.kml'












### Check the data exists!

 * Replace SSO with your SSO
 
```SQL
select count(*) 
 from SSO.new_mexico_roads;
 count 
-------
 11299
(1 row)
```

### Task: Query the database to answer a couple questions:

#### Q1: What is the total kilometers of each road type in New Mexico?

#### Q2: Which school district has the most interstate roadway?

### Task: Use GeoPandas to pull the road data from PostGIS, plot it, and then write it out as an ESRI Shapefile

 1. Write the query
 1. Use GeoPandas to pull straight into the GeoDataFrame
 1. Plot the roads
 1. Save to `../temp/roads.shp` as ESRI Shapefile

In [None]:
## M2:E1:Cell11
## ----- Add Pull and Plot from PostGIS Code Below -----------










In [None]:
## M2:E1:Cell12
## ----- Add Save to Shapefile Code Below -----------






# Save Your Notebook
## Then Notebook Menu: File > Close and Halt

### Additional Resources
 * https://geohackweek.github.io/vector/03-encodings-libraries/