<a href="https://colab.research.google.com/github/PaulToronto/Applied-Geospatial-Data-Science-with-Python---Book/blob/main/2_What_Is_Geospatial_Data_and_Where_Can_I_Find_It.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# What Is Geospatial Data and Where Can I Find It?

- **Geospatial data**:
    - data that has a geographic component
    - data that times the data to a point on, or adjacent to, the Earth's surface

## Static and dynamic geospatial data

- **static geospatial data**:
    - does not change over a short-term time period
    - examples:
        - epicentre of an earthquake
        - location of a store
        - the number of college educated adults
- **dynamic geospatial data**:
    - can change in real-time
    - examples:
        - location of a shopper within a shopping mall
        - the position of a bike courier delivering food
        - the spread of an infectious disease
    - dynamic spatial data **spatialtemporal data**, or data relating to both time and space
- static and dynamic geospatial data come in two formats:
    1. **vector**
    2. **raster**

## Geospatial file formats

- the geographic component of geospatial data of often **latitude** and **longitude**
- a latitude and longitude coordinate is collected via a **global positioning system (GPS)**
- a geographic component can also be derived from an address using a process called **geocoding**
- geospatial data is a subset of **spatial data** or data that is related to a point in some broader study space

## Vector data

- vector data is not unique to the field of **geographic information systems (GIS)** or to geospatial data science
    - it has applications in many digital mediums
- within GIS or geospatial data science, the vector data represents real-world features
- the foundation of vector graphics is a **vertex** or **point** that is typically denoted by an **X** and **Y** **coordinate**
    - this point is the location of something
- X: **longitude**
- Y: **latitude**
- if you have two or more vertices, they can be connected by paths to form **polylines**
- a series of polylines can be connected to form a **polygon**
- polygons can also have interior vertices and polylines that carve out internal sections and for **multipart polygons**

### Other vector data uses

- **X**, **Y** and **Z** coordinate data can be used to create **point clouds**
    - which are becoming more ubiquitous as **Light Detection and Ranging (LiDAR)** technologies become more mainstream
        - self-driving cars use LiDAR
- point clouds can also be created from photography via a process called **photogrammetry** to convert overlapping 2D images into 3D models of objects
    - photogrammetry is used in surveying on Earth and in space
        - the James Webb Space Telescope uses photogammetry

### Vector file formats

https://gisgeography.com/gis-formats/

#### Shapefile

- developed by Esri as an open specification data storage format
- stores the spatial geometry of points, lines and polygons
- also stores attribute information related to those features
- a shapefile is a multipart file format that requires 4 main parts (only the first 3 are mandatory):
    1. `.shp`: the geometry of a point, line or polygon feature
    2. `.shx`: the index of a feature
    3. `.dff`: attribute data that stores columnar variables related to their features
    4. `.prj`: project metadata that utilizes **well-known text** to store information related to the **projection and coordinate reference system**
- a shapefile can include several other parts
- https://en.wikipedia.org/wiki/Shapefile
- The United States Census Bureau maintains a specialized shapefile format called **Topologically Integrated Geographic Encoding and Reference system (TIGER)**
    - TIGER files do not contain attribute data that is collected by census products

#### GeoJSON

- **Geographic Javascript Object Notation**
    - the geographic sibling of **Javascript Object Notation (JSON)**
    - mostly used for web-based mapping
- GeoJSON formats store the coordinates of the geometry as well as the columnar attribute information related to those geometries as text within curly braces
- can easily be read by any text-based file editor as well as web-based tools for working with JSON data (ex. CodeBeautify's JSON View)

#### KML

- **Keyhole Markup Language**
- used to store and display geographic data that was created by Google
- Google transitioned the KML file format to **Open Geospatial Consortium (OGC)** to maintain and evolve into a standard format for displaying GIS data on web-based and mobile-based 2D maps and 3D Earth browsers
- KML is an XML language primarily focused on geographic data visualization, including annotating maps and images
- not just concerned about displaying data but also focused on assisting the end user in the navigation by providing them with context on what to look for and how to get there
- https://www.ogc.org/standards/kml/
- https://developers.google.com/kml/documentation/kmlreference

#### OSM

- an XML-based file format that was created to store and easily distribute geospatial data by OpenStreetMap
- OpenStreetMap is one of the largest crowdsourcing communities for geospatial data
- the OSM file format is a collection of vector-based features from the crowdsourcing community

## Raster data

- vector data uses point, polylines and polygons to model real-world objects
- raster data is any picture data that is composed of uniform cells or **pixels** (typically square)
- raster data takes the form of a matrix of cells or pixels
- in geospatial data, each cell is geolocated to a specific point on the Earth's surface
- typically used for continuous data, which cannot be easily formatted as vector data
- often used as a background layer underneath vector data to provide more context
- Google Maps:
    - **points of interest (POIs)** are vector data
    - the underlying satellite image is raster data

### Raster file formats

#### GeoTIFF

- based on the **Tagged Image File Format (TIFF)**
- an evolution of the TIFF file format in that it allows for the addition of **georeferencing** information within the image, thus allowing for geographic metadata in the image file
- meta data included in the GeoTIFF file:
    - vertical and horizontal components of the raster
    - the **coordinate reference system (CRS)** that the data is based on
    - the spatial extent and spatial resolution
    - rules for how to project the raster data into a 2D digital medium
- the OGC has set for the **OGC GeoTIFF 1.1 format standard**

#### JPEG

- **Joint Photographic Experts Group**
- open source standard imrage format for containing lossy and compressed image data
- did not allow for the inclusion of georeferenced metadata until the release of the JPEG 2000 format

#### PNG

- **Portable Network Graphics (PNG)**
- supports georeferenced metadata
- supports both lossy and lossless compression
- makes use of 24-bit images

## Introducing geospatial databases and storage

### PostgreSQL and PostGIS

- **PostgreSQL** is not spatially enabled by default, but it can be spatially enabled through the use of the PostGIS database extender
    - https://www.postgresql.org/about/
- **PostGIS** is a project of the **Open Source Geospatial Foundation (OSGeo)**
    - https://postgis.net/workshops/postgis-intro/
    - adds spatial operations souch as distance, area, union and intersection, as well as spatial geometry to teh standard PostgreSQL database
    - example queries:
        - How far away is the nearest pharmacy from X address?
        - What is the size of this census tract?

### ArcGIS geodatabase

- proprietary databas created by Esri
- used with the ArcGIS suite of products

## Exploring open geospatial data assets

### Human geography

- human geographic data or anthropogeographic data is a branch of geographic data that deals with humans and their relationship to the area around them

#### United Stats Census Bureau data

- https://data.census.gov/
- one of the best sources of **geo-demographic data**
- **American Housing Survey (AHS)**: https://www.census.gov/programs-surveys/ahs.html
- **American Business Survey (ABS)**: https://www.census.gov/programs-surveys/abs.html

##### GEOIDs

- census data proejcts include geographic entity codes or **GEOIDs**
- there are two primary types of GEOIDs:
    1. **Federal Information Processing Standars (FIPS)**
        - nested hierarchical
    2. **Geographic Names Information System (GNIS)** codes
        - chronological

##### Data Products

- census geodemographic data products can easily be downloaded from: https://data.census.gov/


#### OpenStreetMap

- by leveraging OSM data, we can crate new geospatial features based on distance calculatons generated by driving, walking or public transit modes
- OSM street network data is also useful when it comes to soving **vehicle routing problems (VRPs) or shortest path problems
- OSM data can also be used as a **reference layer** within mapping products

#### United Nations Environment Programme geodata

- the **United Nations Environment Programme (UNEP)** maintains a rich catalog of human geographic data that covers geographies across the globe
- https://datacore-gn.unepgrid.ch/geonetwork/srv/eng/catalog.search#/home

#### University of Wisconsin Center for Sustainability and the Global Environment (SAGE)

- mostly grid-based datasets that represent topics such as air quality, urban expansion and crop calendars
- to view SAGE's data catalog: https://sage.nelson.wisc.edu/data-and-models/


#### CIA World Factbook

- the US **Central Intelligence Agency (CIA)** maintains a data catalog called the **World Factbook** that provides intelligence regarding:
    - governments
    - people
    - geography
    - transportation systems
    - militaries
    - terrorism
- https://www.cia.gov/the-world-factbook/

### Physical Geography

- physical geography is a branch of geospatial data that represents the physical, or natural environment
- includes:
    - weather
    - climate
    - land formations
    - plants
    - natural phenomena such as earthquakes and tsunamis
    - ...

#### United States Geological Survey

- the **United States Geological Survey (USGS)** mains a rich data catalog of both real-time and historical physical geography
- real-time data includes active monitoring of earthquakes, landslides, volcanoes, wildfires and geomagnetism
- the USGS, in partnership with **National Aeronautics and Space Administration (NASA)** publishes and maintains **Landsat** data
    - images captured by NASA's Earth observation satellites
    - raster data
- https://www.usgs.gov/

#### National Aeronautics and Space Agency (NASA)

- provides Earth observation via its **Earthdata portal**
- https://search.earthdata.nasa.gov/search
- to download data you need to register for a free account: https://www.earthdata.nasa.gov/eosdis/science-system-description/eosdis-components/earthdata-login

#### OpenTopography