# Introduction

TBD / Will cover:
- what is geospatial data?
- applications of geospatial data visualization
- pandas is a strong prereq for this course. in this tutorial, you'll get an idea if you need to review the pandas course before proceeding.
- set up the example with why we want to create a map of new york (idea being: [perhaps] you live in NYC and want to go hiking/camping in the country for a weekend. instead of using TripAdvisor to guide your trip, you're excited about all of the data that the state of NY has released and want to do your own, tailored analysis based on your specific preferences)

# Reading and plotting (shapefile) data

The first step is to read in the data!  To do this, we'll use the [Geopandas](http://geopandas.org/) library.

In [None]:
import geopandas as gpd

Geopandas is an extension of the [Pandas](https://pandas.pydata.org/) library with added functionality for geospatial data.

With Geopandas, we can read data from a variety of common geospatial file formats, such as [shapefile](https://en.wikipedia.org/wiki/Shapefile), [GeoJSON](https://en.wikipedia.org/wiki/GeoJSON), [KML](https://en.wikipedia.org/wiki/Keyhole_Markup_Language), and [GPKG](https://en.wikipedia.org/wiki/GeoPackage).  You can also work with [CSV](https://en.wikipedia.org/wiki/Comma-separated_values) files.  

For now, we'll load a shapefile containing information about lands under the care of the [Department of Environmental Conservation](https://www.dec.ny.gov/index.html) in the state of New York.

In [None]:
# Read in the data
full_data = gpd.read_file("../input/geospatial-course-data/dec_lands/DEC_lands.shp")
full_data.head()

The command above reads the data into a (GeoPandas) **GeoDataFrame** object that has all of the capabilities of a (Pandas) DataFrame object: so, everything you learned in the [Pandas micro-course](https://www.kaggle.com/learn/pandas) can be used to work with the data. 

In [None]:
type(full_data)

For instance, if we don't plan to use all of the columns, we can select a subset of them.  
> To review other methods for selecting data, check out [this tutorial](https://www.kaggle.com/residentmario/indexing-selecting-assigning/) from the Pandas micro-course.

In [None]:
data = full_data.loc[:, ["CLASS", "COUNTY", "geometry"]].copy()

We use the `value_counts()` method to see a list of different land types, along with how many times they appear in the dataset. 
> To review this (and related methods), check out [this tutorial](https://www.kaggle.com/residentmario/summary-functions-and-maps) from the Pandas micro-course.

In [None]:
# How many lands of each type are there?
data.CLASS.value_counts()

You can also use `loc` (and `iloc`) and `isin` to select subsets of the data.  
> To review this, check out [this tutorial](https://www.kaggle.com/residentmario/indexing-selecting-assigning/) from the Pandas micro-course.

In [None]:
# Select lands that fall under the "WILD FOREST" or "WILDERNESS" category
wild_lands = data.loc[data.CLASS.isin(['WILD FOREST', 'WILDERNESS'])].copy()
wild_lands.head()

GeoDataFrames also have some added methods and attributes (that don't apply to DataFrames).  For instance, we can quickly visualize the data with the `plot()` method.  This method takes as (optional) input several parameters that can be used to customize the appearance of your plots.

In [None]:
# Other available options line up with matplotlib options
wild_lands.plot(color='darkgreen')

# The "geometry" column

Every GeoDataFrame contains a special "geometry" column.  It contains all of the geometric objects that are displayed when we call the `plot()` method.

In [None]:
wild_lands.geometry.head()

While this column can contain a variety of different datatypes, each entry will typically be a `Point`, `LineString`, or `Polygon`.

![](https://i.imgur.com/N1llefr.png)

The "geometry" column in the dataset that we've just loaded contains 2983 different `Polygon` objects, each corresponding to a different shape in the plot above.

In the code cell below, we create three more GeoDataFrames, containing campsite locations (`Point`), foot trails (`LineString`), and county boundaries (`Polygon`).

In [None]:
# Contains locations of campsites in New York state (Point)
POI_data = gpd.read_file("../input/geospatial-course-data/dec_pointsinterest/Decptsofinterest.shp")
campsites = POI_data.loc[POI_data.ASSET=='PRIMITIVE CAMPSITE'].copy()

# Contains foot trails in New York state (LineString)
roads_trails = gpd.read_file("../input/geospatial-course-data/dec_roadstrails/Decroadstrails.shp")
trails = roads_trails.loc[roads_trails.ASSET=='FOOT TRAIL'].copy()

# Contains county boundaries in New York state (Polygon)
counties = gpd.read_file("../input/geospatial-course-data/NY_county_boundaries/NY_county_boundaries.shp")

And now we have a map of New York that we can use to guide our next camping trip!

In [None]:
ax = counties.plot(figsize=(10,10), color='none', edgecolor='gainsboro', zorder=3)
wild_lands.plot(color='lightgreen', ax=ax)
campsites.plot(color='maroon', markersize=2, ax=ax)
trails.plot(color='black', markersize=1, ax=ax)

# Your turn

Use what you've learned to investigate ...