<center>
<table>
  <tr>
    <td><img src="https://portal.nccs.nasa.gov/datashare/astg/training/python/logos/nasa-logo.svg" width="100"/> </td>
     <td><img src="https://portal.nccs.nasa.gov/datashare/astg/training/python/logos/ASTG_logo.png?raw=true" width="80"/> </td>
     <td> <img src="https://www.nccs.nasa.gov/sites/default/files/NCCS_Logo_0.png" width="130"/> </td>
    </tr>
</table>
</center>

        
<center>
<h1><font color= "blue" size="+3">ASTG Python Courses</font></h1>
</center>

---

<center><h1><font color="red" size="+3">Introduction to GeoPandas</font></h1></center>

## Reference Documents

- [GeoPandas User Guide](https://geopandas.org/en/stable/docs/user_guide.html)
- [GeoPandas Tutorial](http://gallery.pangeo.io/repos/pangeo-data/pangeo-tutorial-gallery/geopandas.html)
- [Analyze Geospatial Data in Python: GeoPandas and Shapely](https://www.learndatasci.com/tutorials/geospatial-data-python-geopandas-shapely/)
- [GeoPandas Tutorial: An Introduction to Geospatial Analysis](https://www.datacamp.com/tutorial/geopandas-tutorial-geospatial-analysis)
- [Introduction to geospatial data analysis with GeoPandas and the PyData stack](https://github.com/jorisvandenbossche/geopandas-tutorial)
- [GeoPandas Projections](https://geopandas.org/en/stable/docs/user_guide/projections.html)
- [Introduction to Python GIS: Map Projection](https://automating-gis-processes.github.io/CSC/notebooks/L2/projections.html)
- [Understanding a CRS: Proj4 and CRS codes](https://pygis.io/docs/d_understand_crs_codes.html)

_______

# <font color="red"> Objectives</font>

In this class, we want to accomplish the following:

- Provide an overview of GeoPandas and its main data structures.
- Learn how to create and manipulate a GeoDataFrame.
- Learn how to read different types of data files and perform analyses and visualizations.

# <font color="red"> What is GeoPandas? </font>

- A Python library that allows you to process shapefiles representing tabular data (like Pandas), where every row is associated with a geometry.
- Designed to primarily work with vector data.
- Provides access to many spatial functions for applying geometries, plotting maps, and geocoding. 
- Extends the capabilities of Pandas to enable spatial operations. 
- Includes new data types such as `GeoDataFrame` and `GeoSeries` which are subclasses of Pandas DataFrame and Series and enables efficient vector data processing in Python. 
- Is built on top of the following libraries that allow it to be spatially aware:
  - `Shapely`: geometric operations (i.e. buffer, intersections etc.)
  - `PyProj`: working with projections
  - `Fiona`: file input (reading) and output (writing).

Geospatial data describe any object or feature on Earth's surface. Common examples include:

- Which area will be hit hardest by a hurricane?
- How does ice cap melting relate to carbon emissions?
- Which areas will be at the highest risk of fires?

# <font color="red"> GeoPandas Data Structure </font>

GeoPandas implements two main data structures:
- GeoSeries
- GeoDataFrame. 

These are subclasses of Pandas Series and Pandas DataFrame, respectively.
This means that we can use all our Pandas skills also when working with GeoPandas.

The main difference between GeoDataFrames and Pandas DataFrames is that a GeoDataFrame should contain one column for geometries.

## <font color="blue">GeoSeries</font>
- A vector where each entry in the vector is a set of shapes corresponding to one observation. 
- An entry may consist of only one shape (like a single polygon) or multiple shapes that are meant to be thought of as one observation (like the many polygons that make up the State of Hawaii or a country like Indonesia).

#### Coordinate Reference System

Every GeoSeries comes with associated [Coordinate Reference System (CRS)](https://pygis.io/docs/d_crs_what_is_it.html#d-crs-what-is-it) information.
- A coordinate-based local, regional or global system used to locate geographical objects.
- Tells GeoPandas where and how to place coordinates on Earth’s surface.
- Is critical for spatial analysis.
- There are two main categories of CRS:
   - **Geographic coordinates**. They define a global position in degrees of latitude and longitude relative to the equator and the prime meridian.
   - **Projected coordinates.** Moving from the three-dimensional Earth to a two-dimensional map inevitably introduces some distortions. We need different approaches for creating projected coordinates.
- If you plan to use two GeoPandas objects (for analysis and visualization) it is important to redefine (or reproject) the CRS to be identical in both objects.
- Choosing an appropriate projection for your map depends on what you actually want to represent with your map, and what is the spatial scale of your data.
   - There does not exist a “perfect projection” since each one of them has some strengths and weaknesses, and you should choose such projection that fits best for your needs.
   - Any choice of CRS involves a tradeoff that distorts one or all of the following: shape, scale/distance, and area.
   


#### Choosing CRS

While choosing the CRS, it is important to have the folloing in mind:

- Mixing coordinate systems: When combining datasets, the spatial objects must have the same reference system.
    - You need to convert everything to the same CRS. 
- Calculating areas: Use an equal-area CRS before measuring a shape's area.
- Calculating distances: Use an equidistant CRS when calculating distances between objects.


The [European Petroleum Survey Group (EPSG)](epsg.io) compiled and disseminated CRSs of different locations of the globe. They are used in GeoPandas to faciliate visulaization and analyses.

#### Attributes and Methods for GeoSeries
The GeoSeries class implements nearly all of the attributes and methods of Shapely objects. When applied to a GeoSeries, they will apply elementwise to all geometries in the series.

Some inportant attributes are:
- `area`: shape area (units of projection)
- `bounds`: tuple of max and min coordinates on each axis for each shape
`total_bounds`: tuple of max and min coordinates on each axis for entire GeoSeries
`geom_type`: type of geometry.
`is_valid`: tests if coordinates make a shape that is reasonable geometric shape.

Some basic methods are:
- `distance()`: returns Series with minimum distance from each entry to other
- `centroid`: returns GeoSeries of centroids
- `representative_point()`: returns GeoSeries of points that are guaranteed to be within each geometry. It does NOT return centroids.
- `to_crs()`: change coordinate reference system.
- `plot()`: plot GeoSeries.

## <font color="blue">GeoDataFrame</font>
- A tabular data structure that contains a GeoSeries.
- __It always has one GeoSeries column__ that holds a special status. 
    - This GeoSeries is referred to as the GeoDataFrame’s “geometry”. 
- When a spatial method is applied to a GeoDataFrame (or a spatial attribute like `area` is called), this commands will always act on the “geometry” column.
- The geometry column defines a point, line, or polygon associated with the rest of the columns. This column is a collection of shapely objects. Whatever you can do with shapely objects, you can also do with the geometry object.
- The Coordinate Reference System (CRS) is the coordinate reference system of the geometry column that tells us where a point, line, or polygon lies on the Earth's surface. Geopandas maps a geometry onto the Earth's surface.
- The “geometry” column – no matter its name – can be accessed through the geometry attribute (`gdf.geometry`), and the name of the `geometry` column can be found by typing `gdf.geometry.name`.

A GeoDataFrame may also contain other columns with geometrical (Shapely) objects, but only one column can be the active geometry at a time.

![fig_frame](https://geopandas.org/en/stable/_images/dataframe.svg)
Image Source: [GeoPandas](https://geopandas.org/en/stable/getting_started/introduction.html)

---

## Required Packages

- __Matplotlib__: for basic plots.
- __Pandas__: Manipulation and exploratory data analysis of tabular data.
- __Shapely__: For manipulation and analysis of planar geometric objects
- __GeosPandas__: Combines the capabilities of Pandas and Shapely for geospatial operations.

----

### <font color="red">Uncomment and run the cell below only if in Google Colab</font>

In [None]:
#!sudo apt-get update && apt-get install -y libspatialindex-dev
#!pip install rtree
#!pip install geopandas
#!pip install mapclassify

----

In [None]:
import warnings
warnings.filterwarnings("ignore")

In [None]:
from zipfile import ZipFile
from pathlib import Path
import requests

In [None]:
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
from matplotlib import cm
import matplotlib.ticker as mticker
from mpl_toolkits.axes_grid1.axes_divider import make_axes_locatable

In [None]:
import pandas as pd

In [None]:
import geopandas as gpd
from geopandas.tools import geocode

In [None]:
from shapely import geometry as shpgeom
from shapely import wkt as shpwkt

In [None]:
# supports geospatial join
import rtree 

In [None]:
# visualize all columns in dataframe
pd.set_option('display.max_columns', None) 

In [None]:
print(f"Version of Pandas:    {pd.__version__}")
print(f"Version of GeoPandas: {gpd.__version__}")

# <font color="red"> Creating GeoDataFrame </font>

- We start with a Pandas DataFrame that has latitude and longitude coordinates as columns representing locations of cities.
- We perform transformations to create a GeoPandas GeoDataFrame that includes the "geometry" column (representing points).

[Mapping in Python](https://cybergisxhub.cigi.illinois.edu/notebook/spatial-data-exploration-and-visualization-on-google-colab/)

In [None]:
cities = ['Paris', 'New York', 'Mumbai', 'Tokyo', 
          'Moscow', 'Mexico City', 'Sao Paulo', 'Yaounde', 
          'Vancouver', 'Sydney', 'Harare']
countries = ['France', 'USA', 'India', 'Japan', 
             'Russia', 'Mexico', 'Brazil', 'Cameroon', 
             'Canada', 'Australia', 'Zimbabwe']
longitudes = [2.25, -73.92, 72.83, 139.69, 37.36, -99.13, 
              -46.63, 11.50, -123.08, 151.20, 31.0]
latitudes = [48.85, 40.69, 28.35, 35.68, 55.45, 19.43,
             -23.55, 3.84, 49.32, -33.87, -18.0]

cities_df = pd.DataFrame({
    'City': cities,
    'Country': countries,
    'Longitude': longitudes,
    'Latitude': latitudes
})

cities_df

#### We zip the `Latitude` and `Longitude` together to create a new column named `Coordinates`.

In [None]:
cities_df["Coordinates"] = list(zip(cities_df.Longitude, cities_df.Latitude))
cities_df

In [None]:
type(cities_df.Coordinates[0])

- We turn the `Coordinates` tuple into a Shapely `Point` object.
    - Apply Shapely’s `Point` method to the `Coordinates` column.

In [None]:
cities_df["Coordinates"] = cities_df["Coordinates"].apply(shpgeom.Point)
cities_df

In [None]:
type(cities_df.Coordinates[0])

- Finally, we will convert our DataFrame into a GeoDataFrame by calling the `geopandas.DataFrame` method.
- GeoDataFrame is a data structure with the convenience of a normal DataFrame but also an understanding of how to plot maps.

>The most important property of a GeoDataFrame is that it always has one GeoSeries column that holds a special status. This GeoSeries is referred to as the GeoDataFrame’s “geometry”. When a spatial method is applied to a GeoDataFrame (or a spatial attribute like area is called), this commands will always act on the “geometry” column.

In [None]:
cities_gdf = gpd.GeoDataFrame(cities_df, geometry="Coordinates")
cities_gdf.head()

Does not look different than a vanilla Pandas DataFrame:

In [None]:
print(f'cities_gdf is of type: {type(cities_gdf)}')

How can we tell which column is the geometry column>

In [None]:
print(f'\nThe geometry column is: {cities_gdf.geometry.name}')

Plot the city locations:

In [None]:
cities_gdf.plot()

#### What is the CRS?

In [None]:
print(f"CRS:  {cities_gdf.crs}")

In [None]:
cities_gdf.crs is None

#### <font color="green"> If the CRS is not set in the GeoDataFrame, the default value is `None`.</font>

# <font color="red">Reading Files</font>

- `Fiona` can read and write real-world data using multi-layered GIS formats, zipped and in-memory virtual file systems, from files on your local machine or at remote online locations. 
- The GeoPandas function `read_file()` uses `Fiona` under to hood to read files.
- The call of `read_file()` returns a GeoDataFrame object.

You can obtain the list of supported file formats:

In [None]:
import fiona
fiona.supported_drivers

# <font color="red">Application 1: Belgium Administrative Boundary</font>

- We use the `geopandas.read_file()` function to read a remote shapefile (as `zip` file)  containing geospatial data about the different provinces in Belgium. 
- After reading the file, we will obtain a GeoDataFrame that will be used for data manipulation and visualization.
- **This application is similar to what was done with Cartopy (same shapefile).**

In [None]:
url = "https://portal.nccs.nasa.gov/datashare/astg/training/python/cartopy/borders/BEL_shapefile.zip"

In [None]:
bel_provinces = gpd.read_file(url)
bel_provinces

In [None]:
bel_provinces.info()

In [None]:
We can quickly plot the geometry:

In [None]:
bel_provinces.plot()

We can randomly color the provinces:

In [None]:
import numpy as np
n = len(bel_provinces)
bel_provinces.plot(color=np.random.rand(n,3), edgecolor='k')

Determine the area of each province:

In [None]:
bel_provinces.area

Add the `area` column:

In [None]:
bel_provinces['area'] = bel_provinces.area
bel_provinces

Plot again the provinces weighted by the area:

In [None]:
bel_provinces.plot('area', 
                   edgecolor='black',
                   cmap='jet',
                   legend=True, 
                   figsize=(20,11)
                  );

#### We did not talk here about the CRS

In [None]:
bel_provinces.crs

It will be address in the next example.

# <font color="red">Application 2: Exploring Washington, DC</font>

- We use the `geopandas.read_file()` function to read a GeoJSON file hosted in [GitHub](https://github.com/codeforgermany/click_that_hood/blob/main/public/data/washington.geojson) containing geospatial data about the different neighborhoods of the city of Washinton, DC. 

#### The url pointing to the raw GeoJSON file

In [None]:
url = "https://raw.githubusercontent.com/codeforgermany/click_that_hood/main/public/data/washington.geojson"

#### Read the file

In [None]:
dc_neighborhoods = gpd.read_file(url)
dc_neighborhoods

- The GeoDataFrame ressembles a traditional Pandas DataFrame.
- The `geometry` column can contain any type of vector data, such as points, lines, and polygons.

#### Quick plot of the first row geometry

In [None]:
dc_neighborhoods['name'][0]

In [None]:
dc_neighborhoods['geometry'][0]

We can extract the geometry object:

In [None]:
str(dc_neighborhoods['geometry'][0])

#### CRS information

GeoPandas requires we know the geospatial reference system identifier.  
Here are the common one:

- `EPSG:4326`: WGS84 Latitude/Longitude, used in GPS. 
- `EPSG:3395`: Spherical Mercator. Google Maps, OpenStreetMap, Bing Maps
- `EPSG:32633`: UTM Zones (North) – (Universal Transverse Mercator)
- `EPSG:32733`: UTM Zones (South) – (Universal Transverse Mercator)

In [None]:
dc_neighborhoods.crs

A CRS has the following components:

- **Datum** - The reference system, which in our case defines the starting point of measurement (Prime Meridian) and the model of the shape of the Earth (Ellipsoid). The most common Datum is WGS84.
- **Area of use** - In our case, the area of use is the whole world, but there are many CRS that are optimized for a particular area of interest.
- **Axes and Units** - Usually, longitude and latitude are measured in degrees. Units for x, y coordinates are often measured in meters.

We can inspect the CRS's `axis_info` atrribute to get more information, in particular each axis name and axis unit.

In [None]:
dc_neighborhoods.crs.axis_info

- The district GeoDataFrame is associated with geographic coordinates (EPSG: 4326), but we want to transform it into projected coordinates, so we can make some calculations in meters. 
- We can transform (using the `Geopandas.to_crs()` method) it to EPSG:9311, the standard projected CRS for the ContinentalUS.

In [None]:
dc_neighborhoods.to_crs(epsg=9311, inplace=True)
dc_neighborhoods.crs

In [None]:
dc_neighborhoods.crs.axis_info

Note that the unit for each axis is `metre` (`meter`).

#### Manipulating the area of each neighborhood

- The area attribute returns the calculated area of a geometry. 

In [None]:
dc_neighborhoods.area

- We can save the resulting area (converted to $km^2$) in a new column.

In [None]:
dc_neighborhoods['area'] = dc_neighborhoods.area/10**6
dc_neighborhoods

In [None]:
dc_neighborhoods['area'].sum()

In [None]:
dc_neighborhoods['area'].hist()

#### Determining the centroid of each neighborhood

- The centroid attribute returns the center point of a geometry. 
- We can add it to our dataset in a new geometry column.

In [None]:
dc_neighborhoods['centroid'] = dc_neighborhoods.centroid
dc_neighborhoods

#### Determining the boundary

In [None]:
dc_neighborhoods['boundary'] = dc_neighborhoods.boundary
dc_neighborhoods

#### Distance to the White House

- We want to calculate the distance from the White House ($77.0365 W$ and $38.8977 N$, to the centroids of every neighborhood in Washington, DC.
- We add the distances (in kilometers) in a new column. 

We first create a point using the Shapely `Point()` function with the desired coordinates, convert it into a `GeoSeries` with the right CRS, and then use the `GeoPandas.distance()` method.

In [None]:
white_house = shpgeom.Point(-77.0365, 38.8977)
white_house = gpd.GeoSeries(white_house, crs=4326)
white_house = white_house.to_crs(epsg=9311)
dc_neighborhoods['white_house_dist'] = [float(white_house.distance(centroid)) / 10**3 for centroid in dc_neighborhoods.centroid]

In [None]:
dc_neighborhoods

#### Statistical analysis on the distance

In [None]:
farthest = dc_neighborhoods[dc_neighborhoods['white_house_dist']==dc_neighborhoods['white_house_dist'].max()]
farthest

In [None]:
closest = dc_neighborhoods[dc_neighborhoods['white_house_dist']==dc_neighborhoods['white_house_dist'].min()]
closest

In [None]:
stdv = dc_neighborhoods['white_house_dist'].std()
avg = dc_neighborhoods['white_house_dist'].mean()

In [None]:
farthest.white_house_dist.values[0]

In [None]:
print(f"Largest distance ({farthest.name.values[0]}) = {farthest.white_house_dist.values[0]:,.3f} km")
print(f"Smallest distance ({closest.name.values[0]}) = {closest.white_house_dist.values[0]:,.3f} km")
print(f"Distance STDV: {stdv:,.3f} km")
print(f"Average distance: {avg:,.3f} km")

Do a `barh` plot:

In [None]:
dc_neighborhoods.sort_values('white_house_dist').plot.barh(x='name', 
                                                           y='white_house_dist',
                                                           figsize=(12,11));

#### Basic plot with GeoPandas

In [None]:
ax = dc_neighborhoods.plot(figsize=(10,6))

#### Plot with colors and legend

In [None]:
fig, ax = plt.subplots(1,1,figsize=(12,9))
dc_neighborhoods.plot(ax=ax,
                      column='name', 
                      edgecolor='black', 
                      legend=True,
                      legend_kwds={'bbox_to_anchor': (1.4, 1)})
ax.axis('off');

#### Include White House and centroids on the plot

In [None]:
fig, ax = plt.subplots(1,1,figsize=(12,9))
dc_neighborhoods.plot(ax=ax, column='name', 
                      alpha=0.5, legend=True,
                      legend_kwds={'bbox_to_anchor': (1.37, 1)})
dc_neighborhoods["centroid"].plot(ax=ax, color="green")
white_house.plot(ax=ax, color='black', marker='*', markersize=40)
plt.title('Map of Washington, DC')
plt.axis('off');

#### Saving the data into a `csv` file

In [None]:
dc_neighborhoods.to_csv("washington_dc.csv")

### <font color="green">Breakout</font>

Go to the webpage:

[https://github.com/codeforgermany/click_that_hood/tree/main/public/data](https://github.com/codeforgermany/click_that_hood/tree/main/public/data)

Select an abitrary GeoJSON file and perform the operations as above.

# <font color="red">Application 3: Manipulating the World Map</font>

From [Spatial Analysis with Colab](https://cybergisxhub.cigi.illinois.edu/notebook/spatial-data-exploration-and-visualization-on-google-colab/)

#### Available datasets from the NaturalEarth database

In [None]:
gpd.datasets.available

#### Obtain the dataset for the countries

In [None]:
world = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
world.head()

#### Set the index to be the country abbreviations.

In [None]:
world = world.set_index("iso_a3")
world.head()

world is a GeoDataFrame with the following columns:

- `pop_est`: Contains a population estimate for the country
- `continent`: The country’s continent
- `name`: The country’s name
- `iso_a3`: The country’s 3 letter abbreviation
- `gdp_md_est`: An estimate of country’s GDP
- `geometry`: A POLYGON for each country

In [None]:
world.geometry.name

#### Get the CRS

In [None]:
world.crs

Here, the CRS is `EPSG:4326`. That CRS uses Latitude and Longitude in degrees as coordinates.

In [None]:
world.crs.axis_info

#### Determine the population density

In [None]:
world['pop_density'] = world.pop_est / world.area * 10**6

world.sort_values(by='pop_density', ascending=False)

#### Show the world map

In [None]:
world.plot();

In [None]:
world.plot('pop_density', legend=True, figsize=(20,11));

In [None]:
norm = matplotlib.colors.LogNorm(vmin=world.pop_density.min(), 
                                 vmax=world.pop_density.max())

world.to_crs('epsg:4326').plot("pop_density", 
                                   figsize=(20,11), 
                                   legend=True,  
                                   norm=norm);

#### Select a specific continent

In [None]:
world[world['continent']=='Asia'].plot(cmap='CMRmap')

#### Color each continent

In [None]:
list_continents = world['continent'].unique().tolist()
list_continents

In [None]:
list_continents = world['continent'].unique().tolist()
list_colors = ["MistyRose", "PaleGoldenRod", "Plum", 
               "PaleTurquoise", "LightPink", "olive",
               "LightGreen", "LightBlue"]

world_map = world.plot(figsize=(18, 12))
for i, c in enumerate(list_continents):
    world[world['continent']==c].plot(ax=world_map, color=list_colors[i])

#### Show the geometry of the USA

In [None]:
world.loc["USA", 'geometry']

#### Filter the data to exclude Antarctica

In [None]:
world_gdp = world[(world.pop_est>0) & (world.name!="Antarctica")]

world_gdp['gdp_per_cap'] = world_gdp.gdp_md_est / world_gdp.pop_est

world_gdp.plot(column='gdp_per_cap');

Add legend:

In [None]:
fig, axes = plt.subplots(1, 1)

world.plot(column='pop_est', ax=axes, legend=True);

Resize the colorbar:

In [None]:
fig, axes = plt.subplots(1, 1)

divider = make_axes_locatable(axes)
cax = divider.append_axes("right", size="5%", pad=0.1)

world.plot(column='pop_est', ax=axes, legend=True, cax=cax);

#### Use the `legend_kwds` argument to add more features:

In [None]:
fig, axes = plt.subplots(1, 1)

world.plot(column='pop_est',
           ax=axes,
           legend=True,
           legend_kwds={'label': "Population by Country",
                        'orientation': "horizontal"});

#### Use the `cmap` argument to select the colormap:

In [None]:
world_gdp.plot(column='gdp_per_cap', cmap='OrRd');

To make the color transparent for when you just want to show the boundary, you have two options:
- Use `world.plot(facecolor="none", edgecolor="black")`. 
- Use `world.boundary.plot()`. 

In [None]:
world_gdp.boundary.plot();

#### Scale the colormaps by using the `scheme` option

In [None]:
world_gdp.plot(column='gdp_per_cap', 
               cmap='OrRd', 
               scheme='quantiles');

#### Plot cities on top of the map

In [None]:
base = world_gdp.boundary.plot();
cities_gdf.plot(ax=base, marker='o', color='red', markersize=5)

In [None]:
fig, axes = plt.subplots(figsize=(15, 10))
world_gdp.plot(ax=axes)
cities_gdf.plot(ax=axes, marker='o', color='red', markersize=9)

#### Select an area of the world

Consider the polygon:

In [None]:
#my_box = [(-165, 80), (-165, -60), (-20, -60), (-20, 80)]
my_box = [(-165, 80), (-165, -60), (-20, -60), (-20, 40), (-65,80)]
polygon_geom = shpgeom.Polygon(my_box)
polygon_geom

In [None]:
polygon_gpd = gpd.GeoDataFrame(index=[0], geometry=[polygon_geom]) 
polygon_gpd

In [None]:
polygon_gpd.area

In [None]:
polygon_gpd.centroid

In [None]:
polygon_gpd.plot(edgecolor='r')

In [None]:
fig, axes = plt.subplots(1, 1)

world.plot(ax=axes, cmap='CMRmap')
polygon_gpd.plot(ax=axes, facecolor="none", edgecolor="r")

In [None]:
world.overlay(polygon_gpd, how="intersection").plot(cmap='CMRmap')

In [None]:
world.overlay(polygon_gpd, how="difference").plot(cmap='CMRmap')

# <font color="red">Application 4: Tracking the International Space Station</font>
- The ISS orbits at a speed of 17,100 mph which leads to an orbit about every hour and a half.
- We read a `csv` file containing the timeseries locations of ISS. Information on how the file can be generated is [HERE](https://medium.com/@katehayes.m51/tracking-the-international-space-station-a-mini-project-with-geopandas-e682e8a3489f).
- We use GeoPandas to plot the ISS track on a map.

#### Read the file using Pandas

In [None]:
url = "https://portal.nccs.nasa.gov/datashare/astg/training/python/geopandas/ISS_timeseries_path.csv"

df = pd.read_csv(url, sep=",")

In [None]:
df

In [None]:
df.info()

In [None]:
type(df.geometry[0])

#### Need to convert the `geometry` column into Shapely geometry

In [None]:
df['geometry'] = df['geometry'].apply(shpwkt.loads)

In [None]:
df

In [None]:
type(df.geometry[0])

#### Convert the Pandas DataFrame into GeoDataFrame

In [None]:
iss_path = gpd.GeoDataFrame(df, crs={'init': 'epsg:4326'})

In [None]:
iss_path

In [None]:
iss_path.info()

#### Plot ISS track

In [None]:
iss_path.plot(figsize=(15,10), color='red');

In [None]:
world.plot(figsize=(15,10),  color='#0B2380');

In [None]:
fig, ax = plt.subplots(1, figsize=(15,10))
base = world.plot(ax=ax, color='#8BE898',)
base.set_facecolor('#9CD6FF')
# plotting the ISS position over the eart with navy
iss_path.plot(ax=base, marker="*", markersize=10, cmap = 'jet');

# <font color="red">Apprilcation 4: US Census and Tornado Data</font>

## <font color="blue">State Geographical Data</font>

#### Read the state geographical data as a GeoDataFrame

In [None]:
geo_url = "http://www2.census.gov/geo/tiger/GENZ2020/shp/cb_2020_us_state_5m.zip"
state_geo = gpd.read_file(geo_url)

How many data points?

In [None]:
len(state_geo)

Print the first data points

In [None]:
state_geo.head()

In [None]:
state_geo.info()

Some of the columns are:
- **NAME**: the names of the states
- **STUSPS**: USPS acronym of each state
- **ALAND**: surface area
- **AWATER**: surface area of water


In [None]:
for i, st in enumerate(state_geo['NAME'].tolist(), start=1):
    print(f"{i:>3} --> {st}")

#### We can do a quick plot of the USA with state boundaries:

In [None]:
fig, axes = plt.subplots(figsize=(15, 10))
state_geo.plot(ax=axes);

#### How could we only map the area covering the USA?
- We first need to grab the spatial extent of the `census_data` object.
- It is a tuple of 4 values: `(xmin, ymin, xmax, ymax)`.

In [None]:
df_bounds = state_geo.geometry.total_bounds
df_bounds

In [None]:
fig, axes = plt.subplots(figsize=(15, 10))
xlim =([-176.0, -64.0])
ylim =([13.0, df_bounds[-1]])
axes.set_xlim(xlim)
axes.set_ylim(ylim)
state_geo.plot(ax=axes);

#### Color each state

In [None]:
fig, axes = plt.subplots(figsize=(15, 10))
xlim =([-176.0, -64.0])
ylim =([13.0, df_bounds[-1]])
axes.set_xlim(xlim)
axes.set_ylim(ylim)
state_geo.plot(ax=axes, cmap='CMRmap');

#### Plot the states with colored based on their area

In [None]:
norm = matplotlib.colors.LogNorm(vmin=state_geo.ALAND.min(), 
                                 vmax=state_geo.ALAND.max())
fig, axes = plt.subplots(figsize=(15, 10))
xlim =([-176.0, -64.0])
ylim =([13.0, df_bounds[-1]])
axes.set_xlim(xlim)
axes.set_ylim(ylim)

divider = make_axes_locatable(axes)
cax = divider.append_axes("right", size="5%", pad=0.1)

state_geo.to_crs('epsg:4326').plot("ALAND", ax=axes, 
                                   legend=True,  cax=cax, norm=norm);

#### Plot the states with colored based on their water surface area

In [None]:
norm = matplotlib.colors.LogNorm(vmin=state_geo.AWATER.min(), 
                                 vmax=state_geo.AWATER.max())
fig, axes = plt.subplots(figsize=(15, 10))
xlim =([-176.0, -64.0])
ylim =([13.0, df_bounds[-1]])
axes.set_xlim(xlim)
axes.set_ylim(ylim)

divider = make_axes_locatable(axes)
cax = divider.append_axes("right", size="5%", pad=0.1)

state_geo.to_crs('epsg:4326').plot("AWATER", ax=axes, 
                                   legend=True,  
                                   cax=cax,
                                   norm=norm, cmap="Blues");

#### Add the State capitals

In [None]:
capital_dict = {
    'Alabama': 'Montgomery',
    'Alaska': 'Juneau',
    'Arizona':'Phoenix',
    'Arkansas':'Little Rock',
    'California': 'Sacramento',
    'Colorado':'Denver',
    'Connecticut':'Hartford',
    'Delaware':'Dover',
    'Florida': 'Tallahassee',
    'Georgia': 'Atlanta',
    'Hawaii': 'Honolulu',
    'Idaho': 'Boise',
    'Illinois': 'Springfield',
    'Indiana': 'Indianapolis',
    'Iowa': 'Des Monies',
    'Kansas': 'Topeka',
    'Kentucky': 'Frankfort',
    'Louisiana': 'Baton Rouge',
    'Maine': 'Augusta',
    'Maryland': 'Annapolis',
    'Massachusetts': 'Boston',
    'Michigan': 'Lansing',
    'Minnesota': 'St. Paul',
    'Mississippi': 'Jackson',
    'Missouri': 'Jefferson City',
    'Montana': 'Helena',
    'Nebraska': 'Lincoln',
    'Nevada': 'Carson City',
    'New Hampshire': 'Concord',
    'New Jersey': 'Trenton',
    'New Mexico': 'Santa Fe',
    'New York': 'Albany',
    'North Carolina': 'Raleigh',
    'North Dakota': 'Bismarck',
    'Ohio': 'Columbus',
    'Oklahoma': 'Oklahoma City',
    'Oregon': 'Salem',
    'Pennsylvania': 'Harrisburg',
    'Rhode Island': 'Providence',
    'South Carolina': 'Columbia',
    'South Dakota': 'Pierre',
    'Tennessee': 'Nashville',
    'Texas': 'Austin',
    'Utah': 'Salt Lake City',
    'Vermont': 'Montpelier',
    'Virginia': 'Richmond',
    'Washington': 'Olympia',
    'West Virginia': 'Charleston',
    'Wisconsin': 'Madison',
    'Wyoming': 'Cheyenne'  
}

Create a Pandas DataFrame

In [None]:
loc_list = [", ".join([capital_dict[key], key]) for key in capital_dict]
capital_df = pd.DataFrame(dict(city_name=loc_list))
capital_df 

Use `geocode` to get the coordinates of each capital:

In [None]:
capital_gdf = geocode(capital_df['city_name'])

In [None]:
capital_gdf

In [None]:
capital_gdf.iloc[0].geometry.coords[0][0]

In [None]:
fig, axes = plt.subplots(figsize=(20, 14))

#xlim =([-176.0, -64.0])
#ylim =([13.0, df_bounds[-1]])
#axes.set_xlim(xlim)
#axes.set_ylim(ylim)

# US Lower 48 Bounding Box
xlim =([-129.00, -66.00])
ylim =([22.00, 50.50])
axes.set_xlim(xlim)
axes.set_ylim(ylim)

state_geo.to_crs("EPSG:4326").plot(ax=axes, color="none", linewidth=.9)

capital_gdf.apply(lambda x: axes.annotate(text=x.address.split(",")[0], 
                                        xy=x.geometry.coords[0],
                                        xytext=(x.geometry.coords[0][0],
                                               x.geometry.coords[0][1]+0.2),
                                        ha='center', 
                                        color="blue",
                                        fontsize=9),
                     axis=1);
capital_gdf.plot(ax=axes)
axes.set_title('United States - State Capitals', fontsize=16);

## <font color="blue">Tornado Data</font>

#### Read the tornado dataset

In [None]:
torn_url = "https://portal.nccs.nasa.gov/datashare/astg/training/python/geopandas/1950-2022-torn-initpoint.zip"

In [None]:
tornado_gpd = gpd.read_file(torn_url)

In [None]:
tornado_gpd.shape

In [None]:
tornado_gpd.info()

#### Reproject coordinates

In [None]:
tornado_gpd.crs

In [None]:
tornado_gpd = tornado_gpd.to_crs("EPSG:4326")

#### Visualization

In [None]:
tornado_gpd.plot(figsize=(12,9), color='red', markersize=1)

In [None]:
fig, axes = plt.subplots(figsize=(20, 14))

# US Lower 48 Bounding Box
xlim =([-129.00, -66.00])
ylim =([22.00, 50.50])
axes.set_xlim(xlim)
axes.set_ylim(ylim)

state_geo.apply(lambda x: axes.annotate(x.NAME, 
                                        xy=x.geometry.centroid.coords[0], 
                                        ha='center', 
                                        color="blue",
                                        fontsize=9),
                     axis=1);
state_geo.to_crs("EPSG:4326").plot(ax=axes, color="none", linewidth=.9)
tornado_gpd.plot(ax=axes, color='red', marker='.', markersize=1)
axes.set_title('United States Tornado Map (1950-2022)', fontsize=16);

#### Tornados by State

In [None]:
# Create a copy of the original DataFrame
tornado_by_state = tornado_gpd.copy()
# Add a new column and set the value to 1
tornado_by_state['tornados'] = 1

# use groupby() and count() to total up all the tornadoes by state
tornado_by_state = tornado_by_state[['st','tornados']].groupby('st').count()

In [None]:
# sort by most tornadoes first
tornado_by_state.sort_values('tornados', ascending=False)

In [None]:
tornado_by_state.sort_values('tornados', ascending=True).plot.barh(figsize=(12,9), title='Tornados by State (1950-2022)');

In [None]:
fig, axes = plt.subplots(figsize=(20, 14))

state_geo[state_geo['NAME']=='Texas'].to_crs("EPSG:4326").plot(ax=axes, color="none", linewidth=1.3)
tornado_gpd[tornado_gpd['st'] == 'TX'].plot(ax=axes, color='red', marker='.', markersize=16)
axes.set_title('Texas Tornado Map (1950-2022)', fontsize=16);

## <font color="blue">County Geographical Data</font>

#### Read county geographical dataas a GeoDataFrame

In [None]:
county_url = "http://www2.census.gov/geo/tiger/GENZ2020/shp/cb_2020_us_county_5m.zip"
county_geo = gpd.read_file(county_url)
county_geo.head()

In [None]:
len(county_geo)

In [None]:
norm = matplotlib.colors.LogNorm(vmin=county_geo.ALAND.min(), 
                                 vmax=county_geo.ALAND.max())
fig, axes = plt.subplots(figsize=(15, 10))
xlim =([-176.0, -64.0])
ylim =([13.0, df_bounds[-1]])
axes.set_xlim(xlim)
axes.set_ylim(ylim)

divider = make_axes_locatable(axes)
cax = divider.append_axes("right", size="5%", pad=0.1)

county_geo.to_crs('epsg:4326').plot("ALAND", ax=axes, cax=cax, 
                                   legend=True,  norm=norm);

### <font color="blue">Zoom in on the State of Wisconsin</font>

#### Obtain the geographical data for the state of Wisconsin

In [None]:
wisconsin_geo = state_geo.query("NAME == 'Wisconsin'")
wisconsin_geo

#### Draw the map of the state

In [None]:
fig, axes = plt.subplots(figsize=(10, 10))
wisconsin_geo.plot(ax=axes, edgecolor="black", color="white");

#### Get geographical data for the State of Wisconsin
- Use `STATEFP == '55'` as id for the state

In [None]:
wis_county_geo = county_geo.query("STATEFP == '55'")
wis_county_geo

#### Plot the map of the different counties in Wisconsin

In [None]:
fig, axes = plt.subplots(figsize=(10, 10))

wis_county_geo.plot(ax=axes, edgecolor="red", color="white");
wisconsin_geo.plot(ax=axes, edgecolor="black", color="none")

Color and write the name of each county

In [None]:
fig, axes = plt.subplots(figsize=(10, 10))

wis_county_geo.apply(lambda x: axes.annotate(x.NAME, 
                                             xy=x.geometry.centroid.coords[0], 
                                             ha='center', 
                                             fontsize=7),
                     axis=1);

wis_county_geo.plot(ax=axes, cmap='Pastel2', edgecolor="red", 
                    color="white")
wisconsin_geo.plot(ax=axes, edgecolor="black", color="none");

#### Use 2016 Presidential Election Results

In [None]:
url = "https://datascience.quantecon.org/assets/data/ruhl_cleaned_results.csv"

pres_election_2016 = pd.read_csv(url, thousands=",")
pres_election_2016.head()

In [None]:
pres_election_2016.info()

In [None]:
pres_election_2016["county"]

In [None]:
pres_election_2016["county"] = pres_election_2016["county"].str.title()
pres_election_2016["county"] = pres_election_2016["county"].str.strip()

In [None]:
wis_county_geo["NAME"] = wis_county_geo["NAME"].str.title()
wis_county_geo["NAME"] = wis_county_geo["NAME"].str.strip()

In [None]:
res_states = wis_county_geo.merge(
    pres_election_2016, 
    left_on="NAME", 
    right_on="county", 
    how="inner"
    )

In [None]:
res_states.head()

In [None]:
%%time
res_states["trump_share"] = res_states["trump"] / (res_states["total"])
res_states["rel_trump_share"] = res_states["trump"] / (res_states["trump"]+res_states["clinton"])

In [None]:
res_states.head()

Show the vote map:

In [None]:
fig, axes = plt.subplots(figsize = (10,8))

# Plot the state
wisconsin_geo.plot(ax=axes, edgecolor='black',color='white')

# Plot the counties and pass 'rel_trump_share' as the data to color
res_states.plot(
    ax=axes, edgecolor='black', column='rel_trump_share', 
    legend=True, cmap='RdBu_r',
    vmin=0.01, vmax=0.95
)

# Add text to let people know what we are plotting
axes.annotate('Republican vote share',
              xy=(0.76, 0.06),  xycoords='figure fraction')

# No axis with long and lat
plt.axis('off')

Number of counties won by each candidate:

In [None]:
res_states.eval("trump > clinton").sum()

In [None]:
res_states.eval("trump < clinton").sum()

Total number of votes obtained by each candidate:

In [None]:
res_states["trump"].sum()

In [None]:
res_states["clinton"].sum()

### <font color="green">Breakout</font>

Use the dataset below to map (by county) the results of the 2020 presidential election in the State of Maryland.

```python
url_2020_elect = "https://raw.githubusercontent.com/tonmcg/US_County_Level_Election_Results_08-20/master/2020_US_County_Level_Presidential_Results.csv"

elect_2020_data = pd.read_csv(url_2020_elect, sep=",")
elect_2020_data
```

# <font color="red"> Application 5: [Smithsonian Global Volcanism Database](https://volcano.si.edu/) </font>

In [None]:
server = 'https://webservices.volcano.si.edu/geoserver/GVP-VOTW/ows?'
query = 'service=WFS&version=2.0.0&request=GetFeature&typeName=GVP-VOTW:Smithsonian_VOTW_Holocene_Volcanoes&outputFormat=json'
volc_gdf = gpd.read_file(server+query)
volc_gdf.head()

In [None]:
volc_gdf.info()

In [None]:
volc_gdf.iloc[2]

#### Subsetting
- Often we only want points in a certain bounding box.
- Subsetting is very easy in Geopandas. 

In [None]:
ymin, ymax, xmin, xmax = [45, 49, -120, -124]
subset = volc_gdf.cx[xmin:xmax, ymin:ymax]
subset

#### Plot the locations of volcanoes on the map of the world

In [None]:
fig, axes = plt.subplots(figsize=(15, 10))
world.plot(ax=axes, edgecolor="black", color="white")
volc_gdf.plot(ax=axes, marker='o', color='red', markersize=5);

### <font color="blue">Focus on Colombia</font>

#### Get volcanoes that occured in Colombia

In [None]:
colombia = world.query('name == "Colombia"')
colombia

In [None]:
colombian_volcanoes = gpd.sjoin(volc_gdf, colombia, how="inner", op='within')
colombian_volcanoes

#### Plot the location of volcanoes on the map of Colombia

In [None]:
fig, axes = plt.subplots(figsize=(10, 10))
colombia.plot(ax=axes, edgecolor="black", color="white")
colombian_volcanoes.plot(ax=axes, marker='o', color='red', markersize=5);

#### Simple analysis

In [None]:
# Create a copy of the original DataFrame
volcano_by_country = volc_gdf.copy()
# Add a new column and set the value to 1
volcano_by_country['volcanos'] = 1

# use groupby() and count() to total up all the volcanos by country
volcano_by_country = volcano_by_country[['Country', 'volcanos']].groupby('Country').count()
volcano_by_country

In [None]:
volcano_by_country.sort_values('volcanos', ascending=True).plot.barh(figsize=(7, 16), title='Volcanos by Country');