# GEOG-414 Final Exam


Please take note of the following important information regarding the final exam:

**Date and Location**

The final exam is scheduled to take place on Tuesday, December 12, 2023, in room BGB 404. The exam will begin at 10:30 AM and you have 120 minutes to complete the exam.

**Exam Structure**

The final exam accounts for 20% of the total grade, equivalent to 200 points. The exam consists of five questions, with each question carrying a weight of 40 points. There are four questions about Earth Engine and one question about DuckDB. While it is an open-book exam, it is essential that you complete it independently, without collaborating with others. You are allowed to utilize online resources to find solutions. The exam must be completed within 120 minutes and is due at precisely 12:30 pm. However, please note that for each 10-minute interval of late submission, a penalty of 10% will be deducted from your score.

**Submission Requirements**

1. **Screenshots:** For each question, upload a screenshot of your map/chart. Ensure the screenshot includes your name on it.
2. **HTML file:** Submit an HTML version of your notebook. Ensure all code outputs are visible. (Export via VS Code: Notebook > Export > HTML).
3. **Colab ink:** Provide a link to your notebook hosted on Google Colab for interactive review.

## Question 1

Create annual cloud-free Landsat composite (**2015-2023**) of the state of Tennessee and display them on the map using false color composite.

Relevant datasets:

* [TIGER: US Census States](https://developers.google.com/earth-engine/datasets/catalog/TIGER_2018_States): `ee.FeatureCollection("TIGER/2018/States")`
* [USGS Landsat 8 Level 2, Collection 2, Tier 1](https://developers.google.com/earth-engine/datasets/catalog/LANDSAT_LC08_C02_T1_L2): `ee.ImageCollection("LANDSAT/LC08/C02/T1_L2")`

In [3]:
# Import ee, geemap and initialize geemap
import ee
import geemap
geemap.ee_initialize()
m = geemap.Map()

In [17]:
# Add tiger and filter to Tennessee
states = ee.FeatureCollection("TIGER/2018/States")
tenn = states.filter(ee.Filter.eq('NAME', 'Tennessee'))
roi = tenn.geometry()
m.addLayer(tenn, {}, 'Tennessee')

In [20]:
# Add Landsat 8 and layer visualization params
collection = geemap.landsat_timeseries(
    roi=tenn, start_year=2015, end_year=2023, frequency='year'
)

vis = {
    'min': 0.0,
    'max': 0.5,
    'bands': ['NIR', 'Red', 'Green'],
}

In [21]:
# Create a function to add each landsat layer by year to the map
def add_landsat_layers(m, collection, vis):
    for year in range(2015, 2024):
        # Filter the collection to only include images from the current year
        yearly_collection = collection.filter(ee.Filter.calendarRange(year, year, 'year'))
        
        # Get the first image from the filtered collection
        image = yearly_collection.first()
        
        # Add the image as a layer to the map
        m.add_layer(image, vis, f'Landsat {year}')

# Call the function to add the Landsat layers to the map
add_landsat_layers(m, collection, vis)
m.center_object(tenn)
m.add_text('Made by Vance Russell', add_header=False)
m
#this bloody took about 2 hours to do itself!

Map(bottom=201.0, center=[35.83059875921555, -85.97860765606961], controls=(WidgetControl(options=['position',…

![](https://i.imgur.com/UKQUi85.png)

## Question 2

Based on Question 1, extract annual water areas (**2015-2023**) for the state of Tennessee based on the [Normalized Difference Water Index (NDWI)](https://en.wikipedia.org/wiki/Normalized_difference_water_index) and display them on the m. See [this example](https://developers.google.com/earth-engine/guides/image_visualization#color-palettes).

Relevant datasets:

* [TIGER: US Census States](https://developers.google.com/earth-engine/datasets/catalog/TIGER_2018_States): `ee.FeatureCollection("TIGER/2018/States")`
* [USGS Landsat 8 Level 2, Collection 2, Tier 1](https://developers.google.com/earth-engine/datasets/catalog/LANDSAT_LC08_C02_T1_L2): `ee.ImageCollection("LANDSAT/LC08/C02/T1_L2")`

In [None]:
# Add your code here
# Solely did 2015 and added to question 2 in the final exam file

![](https://i.imgur.com/GSfICAZ.png)

## Question 3

Based on Question 2, create the maximum water extent (**2015-2023**) for the state of Tennessee. Each pixel in the maximum water extent indicates that the pixel has been detected as water at lease once since 2015. Also extract surface water extent for the state of Tennessee based on the [JRC Global Surface Water Mapping Layers](https://developers.google.com/earth-engine/datasets/catalog/JRC_GSW1_4_GlobalSurfaceWater) (select the `occurrence` band). Create a split map to visually compare the water areas extracted from two different methods (i.e., NDWI and JRC).

**Hints:** use the sum() function on the ImageCollection and then convert it to a binary image to get the maximum water extent.

Relevant datasets:

* [TIGER: US Census States](https://developers.google.com/earth-engine/datasets/catalog/TIGER_2018_States): `ee.FeatureCollection("TIGER/2018/States")`
* [USGS Landsat 8 Level 2, Collection 2, Tier 1](https://developers.google.com/earth-engine/datasets/catalog/LANDSAT_LC08_C02_T1_L2): `ee.ImageCollection("LANDSAT/LC08/C02/T1_L2")`
* [JRC Global Surface Water Mapping Layers](https://developers.google.com/earth-engine/datasets/catalog/JRC_GSW1_4_GlobalSurfaceWater): `ee.Image("JRC/GSW1_4/GlobalSurfaceWater").select('occurrence')`

In [None]:
# Add your code here

![](https://i.imgur.com/lHtlpB4.png)

## Question 4

Create annual composite of 4-band(RGBN) NAIP imagery and Normalized Difference Vegetation Index (NDVI) for Knox County, Tennessee and display them on the m.

Relevant datasets:
* [TIGER: US Census Counties](https://developers.google.com/earth-engine/datasets/catalog/TIGER_2018_Counties): `ee.FeatureCollection("TIGER/2018/Counties")`
* [NAIP: National Agriculture Imagery Program](https://developers.google.com/earth-engine/datasets/catalog/USDA_NAIP_DOQQ): `ee.ImageCollection("USDA/NAIP/DOQQ")`

In [None]:
# Add your code here

![](https://i.imgur.com/yWkyENq.png)

## Question 5

Analyzing the NYC crime data from 2003 to 2011 using DuckDB. The database `nyc_data.db` is available for download from [here](https://github.com/opengeos/data/raw/main/duckdb/nyc_data.db.zip). The database contains two tables: `nyc_homicides` and `nyc_neighborhoods`. The `nyc_homicides` table contains the homicide data from 2003 to 2011, and the `nyc_neighborhoods` table contains the neighborhood boundaries of New York City. Use these two tables to answer the following questions:

1. What is the total number of homicides in New York City from 2003 to 2011?

In [1]:
import leafmap
import duckdb

In [13]:
con = duckdb.connect()
con.install_extension("httpfs")
con.load_extension("httpfs")
con.install_extension("spatial")
con.load_extension("spatial")

In [3]:
url = "https://github.com/opengeos/data/raw/main/duckdb/nyc_data.db.zip"
leafmap.download_file(url, unzip=True)

nyc_data.db.zip already exists. Skip downloading. Set overwrite=True to overwrite.


'c:\\Users\\vance\\OneDrive\\1 Consulting\\Spatial\\geog-414-copy\\final\\nyc_data.db.zip'

In [17]:
# Connect to the database
con = duckdb.connect('nyc_data.db')
con.sql("SHOW TABLES;")

┌─────────────────────┐
│        name         │
│       varchar       │
├─────────────────────┤
│ nyc_census_blocks   │
│ nyc_homicides       │
│ nyc_neighborhoods   │
│ nyc_streets         │
│ nyc_subway_stations │
└─────────────────────┘

In [18]:
# Show the nyc_homicides db
con.sql('FROM nyc_homicides AS homicides').to_df()

Unnamed: 0,INCIDENT_D,BORONAME,NUM_VICTIM,PRIMARY_MO,ID,WEAPON,LIGHT_DARK,YEAR,geom
0,2008-01-01,Brooklyn,1,,7,gun,D,2008,"[0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,..."
1,2008-01-04,Manhattan,1,,14,gun,D,2008,"[0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,..."
2,2008-01-05,Queens,1,,15,gun,D,2008,"[0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,..."
3,2008-01-04,Queens,1,,16,knife,D,2008,"[0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,..."
4,2008-01-05,Queens,1,,18,gun,D,2008,"[0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,..."
...,...,...,...,...,...,...,...,...,...
3977,2010-10-11,The Bronx,1,,4269,gun,,2010,"[0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,..."
3978,2010-10-06,The Bronx,1,,4271,knife,,2010,"[0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,..."
3979,2011-07-26,The Bronx,1,,4282,gun,,2011,"[0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,..."
3980,2011-07-28,The Bronx,1,,4284,gun,,2011,"[0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,..."


In [21]:
con.sql('FROM nyc_homicides LIMIT 5')

┌────────────┬───────────┬────────────┬────────────┬───────┬─────────┬────────────┬───────┬────────────────────────────┐
│ INCIDENT_D │ BORONAME  │ NUM_VICTIM │ PRIMARY_MO │  ID   │ WEAPON  │ LIGHT_DARK │ YEAR  │            geom            │
│    date    │  varchar  │  varchar   │  varchar   │ int32 │ varchar │  varchar   │ int32 │          geometry          │
├────────────┼───────────┼────────────┼────────────┼───────┼─────────┼────────────┼───────┼────────────────────────────┤
│ 2008-01-01 │ Brooklyn  │ 1          │ NULL       │     7 │ gun     │ D          │  2008 │ \x00\x00\x18\x00\x00\x00…  │
│ 2008-01-04 │ Manhattan │ 1          │ NULL       │    14 │ gun     │ D          │  2008 │ \x00\x00\x18\x00\x00\x00…  │
│ 2008-01-05 │ Queens    │ 1          │ NULL       │    15 │ gun     │ D          │  2008 │ \x00\x00\x18\x00\x00\x00…  │
│ 2008-01-04 │ Queens    │ 1          │ NULL       │    16 │ knife   │ D          │  2008 │ \x00\x00\x18\x00\x00\x00…  │
│ 2008-01-05 │ Queens    │ 1    

In [9]:
# Count the total number of murders from the NUM_VICTIM column
con.sql("""

SELECT COUNT(NUM_VICTIM) FROM nyc_homicides
;
          
  """)

┌───────────────────┐
│ count(NUM_VICTIM) │
│       int64       │
├───────────────────┤
│              3974 │
└───────────────────┘

2. Find out the top 10 neighborhoods with the highest number of homicides in New York City from 2003 to 2011.

In [22]:
con.sql('FROM nyc_neighborhoods LIMIT 5;').to_df()

Unnamed: 0,BORONAME,NAME,geom
0,Brooklyn,Bensonhurst,"[5, 4, 41, 0, 0, 0, 0, 0, 54, 71, 14, 73, 198,..."
1,Manhattan,East Village,"[5, 4, 152, 0, 0, 0, 0, 0, 35, 215, 14, 73, 13..."
2,Manhattan,West Village,"[5, 4, 91, 0, 0, 0, 0, 0, 161, 95, 14, 73, 212..."
3,The Bronx,Throggs Neck,"[5, 4, 141, 0, 0, 0, 0, 0, 128, 232, 17, 73, 1..."
4,The Bronx,Wakefield-Williamsbridge,"[5, 4, 126, 0, 0, 0, 0, 0, 83, 85, 17, 73, 17,..."


In [24]:
con.sql("""
SELECT
        nyc_neighborhoods.geom,
        nyc_homicides.geom
FROM nyc_neighborhoods AS neighborhoods
JOIN nyc_homicides AS homicides
ON ST_Intersection(neighborhoods.geom, homicides.geom)
;
          
  """)


CatalogException: Catalog Error: Scalar Function with name "st_intersection" is not in the catalog, but it exists in the spatial extension.

Please try installing and loading the spatial extension:
INSTALL spatial;
LOAD spatial;



3. Create a bar chart to visualize the number of homicides in New York City by year from 2003 to 2011. The bar chart title should contain your name.

In [None]:
# Add your code here

![](https://i.imgur.com/HUwUYkY.png)

4. Create a pie chart to visualize the number of homicides in New York City by borough from 2003 to 2011. The pie chart title should contain your name.

In [None]:
# Add your code here

![](https://i.imgur.com/px7UYTF.png)