<a href="https://colab.research.google.com/github/cul-data-club/meetings/blob/main/2022/march-24-geopandas/Hello%2C%20GeoPandas!.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Hello, GeoPandas!

Start out by installing GeoPandas (which you can do via the Anaconda Navigator), but you should also install `cartopy`, `geoplot`, and `contextily` so that the example notebooks in the `gallery_jupyter` folder.

If you have trouble installing via the Navigator, try using the shell:

```
conda install -c conda-forge geopandas
conda install -c conda-forge cartopy
conda install -c conda-forge geoplot
conda install -c conda-forge contextily
```

Geographic data can take several primitive forms. The [GeoData@Columbia](https://geodata.library.columbia.edu/) offers ten different primitive formats the data can take, but they boil down to four, more or less:

1. **Points** With point data, every observation/row/member is at least two coordinates. Each point is independent of the others.
2. **Lines** Instead of one point, every observation/row/member is at least two points connected with a line, where order matters.
3. **Polygons** Like lines, except the lines close to make shapes with calculable areas.
4. **Rasters** “Pictures” of the area under study, where each pixel represents a certain amount of space, like with satellite photography or other remote sensing data sources.

The first three types, as a whole, are called “vector data.”

For vector data, every observation/row/member will typically have other properties that can take familiar data types: numeric variables, continuous variables, and categoric variables.

GeoPandas, then, merges the “geometry” of an observation/row/member with its other properties to create a dataframe with geometries.

Even though geospatial data typically only has the four primitives mentioned above (often in some mixture), the data can be *formatted* in many, many ways. For GeoPandas, we will look at two file formats:

1. **Shapefile** Created by Esri, the company behind ArcGIS, [shapefiles](https://en.wikipedia.org/wiki/Shapefile) are an established vector format. Every shapefile is actually a combination of files, including one that ends in `*.shp`, which are often bundled together as a `.zip`. GeoPandas can read unbundled shapefiles.
2. **GeoJSON** A comparative newcomer to geospatial data encoding, [GeoJSON](http://geojson.org/) encodes all of the data into a giant, plain text file formatted as JSON, or JavaScript Object Notation. As such, every GeoJSON data file is also a valid JavaScript object. With only one file, GeoJSON is somewhat more portable than shapefiles, and the file format is especially web-friendly.

You can create your own toy GeoJSON data at [http://geojson.io/](http://geojson.io/)

In fact, go ahead and so so, and save your file as `test.json` or something similar in the same folder as this notebook.

Now let’s import GeoPandas and fire up inline Matplotlib.

In [None]:
!pip install matplotlib-venn

In [None]:
!apt-get -qq install -y libfluidsynth1

In [1]:
import geopandas
%matplotlib inline

ModuleNotFoundError: ignored

GeoPandas has a few datasets built in: two from [Natural Earth](http://naturalearth.org), and one of NYC. Just like with regular Pandas, we can use a `.read_file()` class method to create a geodataframe from a file. Here, we can read in the built-in NYC data.

Geodataframes have a built-in `.plot()` method.

In [None]:
nyc = geopandas.read_file(geopandas.datasets.get_path('nybb'))
nyc.plot()

Geodataframes also have a `.crs` property that gives us the coordinate reference system, which yields an EPSG code. That we can subsequently look up like so: [http://spatialreference.org/ref/epsg/2263/](http://spatialreference.org/ref/epsg/2263/)

In [None]:
nyc.crs

We can read in our own GeoJSON file now, but note that the CRS is different from the NYC data’s.

In [None]:
df = geopandas.read_file("./test.json")
df.crs

Luckily, unifying the CRSes is rather trivial. Just set one’s to the other’s.

In [None]:
df = df.to_crs(nyc.crs)
df.crs

Geodataframes behave much like regular dataframes.

In [None]:
df

In [None]:
df[df.sentiment.str.contains("happy")]

We can plot data together by using one plot as the `ax` for the other.

In [None]:
base = nyc.plot(figsize=(10, 10), edgecolor="k", color="green")
df.plot(ax=base, color="red", edgecolor="white")

## NYC MTA data

Now let’s grab the [subway station location data](https://data.cityofnewyork.us/Transportation/Subway-Stations/arq3-7z49) from the City of New York. Export it as a shapefile and save it to the same folder as this notebook.

In [None]:
stations = geopandas.read_file("./Subway Stations/geo_export_ab83d225-393b-4f95-b275-4a8d050fc8e3.shp")
stations.head()

In [None]:
stations.plot()