# Lesson 10 & 11 (from book), Vector Data, Geopandas


## 10.1 Points, lines, and polygons

Vector data represents speciifc features on the Earth’s surface. There are three types of vector data:

Points: each point has a single x,y location. Examples of pointdata that can be represented as point vector data are sampling locations or animal sightings.

Lines: a line is composed of at least two points that are connected. Roads and streams are commonly depicted as line vector data.

Polygons: polygons are sets of three or more vertices that are connected and form a closed region. Political boundaries (outlines of countires, states, cities, etc) are examples of polygon vector data.




In addition to the geospatial information stored, vector data can include attributes that describe each feature. For example, a vector dataset where each feature is a polygon representing the boundary of a state could have as attributes the population and are of the state.



## 10.2 Shapefiles

One of the most popular formats to store vector data is the shapefile data format. The shapefile format is developed and maintained by the Environmental Systems Research Institute (Esri).

So far we’ve been working with data that comes stored in a single file, like a csv or txt file for tabular data. A shapefile is actually a collection of files that interact together to create a single data file. All the files that make up a shapefile need to have the same name (different extensions) and be in the same directory. For our shapefiles to work we need at least these three files:

- .shp: shape format, this file has the geometries for all features.
- .shx: shape index format, this file indexes the features
- .dbf: attribute format, this file stores the attributes for features as a table
Sometimes shapefiles will have additional files, including:

- .prj: a file containing information about the projection and coordinate reference system
- .sbn and .sbx: files that contian a spatial index of the features
- .shp.xml: geospatial metadata in XML format.
Check the Wikipedia page about shapefiles to see a more extensive list of files associated to shapefiles.

# Intro to geopandas

GeoPandas is a Python library that extends the `pandas` library by adding support for geospatial data. In this lesson we will introduce the `geopandas` library to work with vector data. We will also make our first map.

To begin with, let’s import `geopandas` with its standard abbreviation gpd:

In [1]:
# this is the library we will explore
import geopandas as gpd

# we will start using matplotlib for making maps
import matplotlib.pyplot as plt

In this lesson we will use simplified point data about wild pigs (Sus scrofa) sightings in California, USA from the Global Biodiversity Information Facility. (GBIF)

We can read in a shapefile with geopandas by using the `gpd.read_file()` function.

In [4]:
# read in data using geopandas
# specifying path, thru 'data' folder
# reading .shp, the main shape file
pigs = gpd.read_file('data/gbif_sus_scroga_california/gbif_sus_scroga_california.shp')

# view first 5 rows
pigs.head()

Unnamed: 0,gbifID,species,state,individual,day,month,year,inst,collection,catalogNum,identified,geometry
0,899953814,Sus scrofa,California,,22.0,3.0,2014.0,iNaturalist,Observations,581956,edwardrooks,POINT (-121.53812 37.08846)
1,899951348,Sus scrofa,California,,9.0,6.0,2007.0,iNaturalist,Observations,576047,Bruce Freeman,POINT (-120.54942 35.47354)
2,896560733,Sus scrofa,California,,20.0,12.0,1937.0,MVZ,Hild,MVZ:Hild:195,"Museum of Vertebrate Zoology, University of Ca...",POINT (-122.27063 37.87610)
3,896559958,Sus scrofa,California,,1.0,4.0,1969.0,MVZ,Hild,MVZ:Hild:1213,"Museum of Vertebrate Zoology, University of Ca...",POINT (-121.82297 38.44543)
4,896559722,Sus scrofa,California,,1.0,1.0,1961.0,MVZ,Hild,MVZ:Hild:1004,"Museum of Vertebrate Zoology, University of Ca...",POINT (-121.74559 38.54882)


One shapefile = multiple files

Although the parameter for `gpd.read_file(`) is only the .shp file, remember that we need to have at least the `.shx` and `.dbf` files in the same directory as the `.shp` to read in the data.

## 11.2 GeoSeries and GeoDataFrame
The core data structure in GeoPandas is the `geopandas.GeoDataFrame`. We can think of it as a `pandas.DataFrame` with a dedicated geometry column that can perform spatial operations.

The geometry column in a `gpd.GeoDataFrame` holds the geometry (point, polygon, etc) of each spatial feature. Columns in the `gpd.GeoDataFrame` with attributes about the features are `pandas.Series` like in a regular `pd.DataFrame`.

Example

First of all, notice that the leftmost column of `pigs` is a column named `geometry` whose values indicate points.