## Importing Data Notebook

Code sources:
- https://github.com/dlab-berkeley/Geospatial-Fundamentals-in-Python/blob/master/Geopandas_Intro_F2019_GC.ipynb
- https://rasterio.readthedocs.io/en/stable/

### Working with Geopandas (Mac)

Geopandas works for vector data. For raster data, use rasterio (see next section).

First, install Homebrew so that the `!brew install spatialindex` command  works. `spatialindex` is a dependency of `rtree`, which is a dependency of `geopandas`. To install Homebrew open a new Terminal window and enter the following command:

`!ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" < /dev/null 2> /dev/null`

When it prompts for your password, type in your computer's password and press Enter. Your password won't show up typed onto the screen but is still being entered.

Install libraries:

In [None]:
# Install os
!pip install os
# Install zipfile
!pip install zipfile
# Install wget
!pip install wget
# Install pysal
!pip install pysal
# Install mapclassify
!pip install mapclassify

# Install Geopandas dependencies
!pip install fiona 
!pip install shapely 
!pip install pyproj 
!brew install spatialindex # dependency of rtree
!pip install rtree

# Install Geopandas
!pip install geopandas
# Install descartes - Geopandas requirement
!pip install descartes

Geopandas can also be installed directly from github. This still requires installing geopandas dependencies separately (`pandas fiona shapely pyproj rtree`)

To install directly from github: `!pip install git+git://github.com/geopandas/geopandas.git`

Import libraries:

In [None]:
import os
import zipfile
import wget

# geopandas dependencies
import pandas as pd
import fiona 
import shapely
import pyproj
import rtree

import geopandas as gpd
import mapclassify
import matplotlib.pyplot as plt
from shapely.geometry import Point
from matplotlib import pyplot

Fetch data files with `wget`

In [None]:
master = 'C:\\Users\\theaa\\Desktop\\Data Science Pedagogy Resources\\Python\\Human_Mobility_Project\\Human_Mobility_Project'

os.chdir(master)

In [None]:
myfiles = ('TravelTime_50k.zip', 
           'World_Country_Borders.zip', 
           'PovMap_Global_Infant_Mortality.zip', 
           'val_prod.zip')

In [None]:
data_path = 'data/raw_data'

In [None]:
prefix = 'https://github.com/AaronScherf/Human_Mobility_Project/blob/master/data/raw_data/'

for f in myfiles:
    wget.download(prefix+f, out = data_path)

Save the files as an object list `myfiles`

Unzip the data files

In [None]:
# for f in myfiles: # works for Macs but not Windows
#  print("Unzipping: ", f)
#  !unzip {f}

In [None]:
for f in myfiles:
    with zipfile.ZipFile(data_path+'/'+f, 'r') as zip_ref:
        zip_ref.extractall(data_path)

### Working with raster data

References: 
- https://rasterio.readthedocs.io/en/stable/
- https://rasterio.readthedocs.io/en/stable/quickstart.html#reading-raster-data

Note that geopandas only works for vector data.

In [None]:
# !pip install rasterio ## Works for Macs but not Windows

In [None]:
conda config --add channels conda-forge

In [None]:
conda install rasterio

In [None]:
import rasterio
import rasterio.features
import rasterio.warp

## Step 1: Import our data

Import the shapefile for country borders. Steps needed:
1. Import shapefile (done)
2. Change the bounds to only include southern Africa (to do)

In [None]:
# read in borders shapefile
country_borders = gpd.read_file(data_path+'/'+
                                'World_Country_Borders/ne_50m_admin_0_countries.shp') 
# visualize country borders
country_borders.plot()

Read in the data for variables to use in the principal component analysis.

In [None]:
# travel time to nearest city with population of at least 50,000
travel_time = rasterio.open(data_path+'/'+'TravelTime_50k/Traveltime_50k.tif')
infant_mort = rasterio.open(data_path+'/'+'PovMap_Global_Infant_Mortality/povmap_global_subnational_infant_mortality_rates_v2.tif')
val_prod = rasterio.open(data_path+'/'+'val_prod.tif')
val_prod_per_hect = rasterio.open(data_path+'/'+'val_prod_per_ha.tif')

Next step: use dataset.transform to map pixel locations. See https://rasterio.readthedocs.io/en/stable/quickstart.html#reading-raster-data

In [None]:

with rasterio.open(data_folder+'TravelTime_50k/Traveltime_50k.tif') as data_set:

    # Read the dataset's valid data mask as a ndarray.
    mask = data_set.dataset_mask()

    # Extract feature shapes and values from the array.
    for geom, val in rasterio.features.shapes(
            mask, transform=data_set.transform):

        # Transform shapes from the dataset's own coordinate
        # reference system to CRS84 (EPSG:4326).
        geom = rasterio.warp.transform_geom(
            data_set.crs, 'EPSG:4326', geom, precision=6)

        # Print GeoJSON shapes to stdout.
        print(geom)

In [None]:
pyplot.imshow(travel_time.read(1), cmap='pink')

pyplot.show()

In [None]:
pyplot.imshow(infant_mort.read(1), cmap='pink')

pyplot.show()

In [None]:
pyplot.imshow(val_prod.read(1), cmap='pink')

pyplot.show()