# Exercises

Use the spatial data you've imported in this lesson to try the below analyses:

1. Earlier, you found which US county has the largest area. Now, aggregate the county data to find the area of each state in square miles. (Use the `statefp` column in the `us_counties_2019_shp` table.) How many states are bigger than the Yukon-Koyukuk area?
2. Using `ST_Distance()`, determine how many miles separate these two farmers' markets: The Oakleaf Greenmarket (9700 Argyle Forest Blvd, Jacksonville, Florida) & Columbia Farmers Market (1701 West Ash Street, Columbia, Missouri). You'll need to first find the coordinates for both in the `farmers_markets` table.
3. More than 500 rows in the `farmers_markets` table are missing a value in the `county` column, which is an example of dirty government data. Using the `us_counties_2019_shp` table & the `ST_Intersects()` function, perform a spatial join to find the missing county names based on the longitude & latitude of each market. Because `geog_point` in `farmers_markets` is of the `geography` type & its SRID is `4326`, you'll need to cast `geom` in the census table to the `geography` type & change its SRID using `ST_SetSRID()`.
4. The `nyc_yellow_taxi_trips` table contains the longitude & latitude where each trip began & ended. Use PostGIS functions to turn the drop-off cordinates into a `geometry` type & count the state/county pairs where each drop_off occurred. As with the previous exercise, you'll need to join to the `us_counties_2019_shp` table & use its `geom` column for the spatial join.

---

# 1. 

```
WITH county_areas 
AS (
	SELECT pop.state_name,
		   pop.county_name,
		   (ST_Area(shp.geom::geography) / 
		   	   2589988.110336)::numeric AS sq_mile_area
	FROM us_counties_2019_shp AS shp
	JOIN us_counties_pop_est_2019 AS pop
		ON shp.statefp = pop.state_fips 
			AND shp.countyfp = pop.county_fips
	ORDER BY sq_mile_area DESC
	)
SELECT state_name,
	   round(sum(sq_mile_area), 2) AS total_sq_mile_area
FROM county_areas
GROUP BY state_name
HAVING sum(sq_mile_area) > (
	SELECT sq_mile_area 
	FROM county_areas
	WHERE county_name = 'Yukon-Koyukuk Census Area'
)
ORDER BY total_sq_mile_area DESC;
```

Three states: Alaska (the Yukon-Koyukuk borough lies within Alaska), Texas, & California. But holy hell, I never realised Alaska was so big. I always knew in terms of largest states, it was Alaska, Texas, & California, but I never knew at scale.

<img src = "Exercise Images/States Larger than Yukon-Koyukuk Borough.png" width = "600" style = "margin:auto"/>

# 2.

First, let's see where both of these farmer's markets are.

```
SELECT market_name, geog_point
FROM farmers_markets
WHERE market_name = 'The Oakleaf Greenmarket';

SELECT market_name, geog_point
FROM farmers_markets
WHERE market_name = 'Columbia Farmers Market';
```

This is the location of The Oakleaf Greenmarket.

<img src = "Exercise Images/The Oakleaf Greenmarket.png" width = "600" style = "margin:auto"/>

The Columbia Farmer's Market:

<img src = "Exercise Images/Columbia Farmers Market.png" width = "600" style = "margin:auto"/>

We'll then use `ST_Distance()` to find the distance between the two farmers markets, dividing by 1609.344 to get a result that is a distance of miles instead of meters.

```
SELECT round((ST_Distance(
	(SELECT geog_point
	FROM farmers_markets
	WHERE market_name = 'The Oakleaf Greenmarket'),
	(SELECT geog_point
	FROM farmers_markets
	WHERE market_name = 'Columbia Farmers Market')
) / 1609.344)::numeric, 2) AS distance_miles;
```

The distance between The Oakleaf Greenmarket & the Columbia Farmers Market is approximately 850.53 miles.

<img src = "Exercise Images/Distance Between Oakleaf Greenmarket & Columbia Farmers Market.png" width = "600" style = "margin:auto"/>

# 3.

Let's confirm that there are indeed more than 500 rows with a missing `county` column.

```
SELECT *
FROM farmers_markets
WHERE county IS NULL;
```

<img src = "Exercise Images/Missing county Column in farmers_markets Table.png" width = "600" style = "margin:auto"/>

There are 523 farmers markets with missing `county` names.

The equivalent to the `county` column in the `farmers_markets` table is the `name` column in the `us_counties_2019_shp` table.

```
SELECT fm.county, c.name	   
FROM farmers_markets AS fm
JOIN us_counties_2019_shp AS c
	ON ST_Intersects(fm.geog_point, 
		ST_SetSRID(c.geom, 4326))
WHERE fm.county IS NULL;
```

After performing the intersection, we find that 496 of the 523 farmers markets with missing county names exist within counties listed within the `us_counties_2019_shp` table. You can find the county names under the `name` column of the result.

<img src = "Exercise Images/ST_Intersects to Find Missing County Names.png" width = "600" style = "margin:auto"/>

# 4.

First, I'll check how many records are in the `nyc_yellow_taxi_trips` table.

```
SELECT count(*)
FROM nyc_yellow_taxi_trips;
```

There should be 368,774 records in the table.

```
SELECT pop.state_name, 
	   pop.county_name,
	   ST_AsText(ST_MakePoint(taxi.dropoff_longitude, 
	   taxi.dropoff_latitude)) AS dropoff_point
FROM nyc_yellow_taxi_trips AS taxi
JOIN us_counties_2019_shp AS c
	ON ST_Intersects(ST_SetSRID(ST_MakePoint(
		dropoff_longitude, dropoff_latitude), 4269), 
		c.geom)
JOIN us_counties_pop_est_2019 AS pop
	ON c.statefp = pop.state_fips
		AND c.countyfp = pop.county_fips;
```

After performing the intersection, we find that 364,519 of the 368,774 taxi trips exist within the counties listed within the `us_counties_2019_shp` table. However, if we want the names instead of the fips code of the states & counties, we join the `us_counties_pop_est_2019` table as well.

<img src = "Exercise Images/NYC Taxi Dropoff Locations.png" width = "600" style = "margin:auto"/>

We can use the result of the above query to aggregate & count the number of combinations of drop-off counties.

```
WITH dropoff_locations
AS (
	SELECT pop.state_name, 
		   pop.county_name,
		   ST_AsText(ST_MakePoint(taxi.dropoff_longitude, 
		   taxi.dropoff_latitude)) AS dropoff_point
	FROM nyc_yellow_taxi_trips AS taxi
	JOIN us_counties_2019_shp AS c
		ON ST_Intersects(ST_SetSRID(ST_MakePoint(
			dropoff_longitude, dropoff_latitude), 4269), 
			c.geom)
	JOIN us_counties_pop_est_2019 AS pop
		ON c.statefp = pop.state_fips
			AND c.countyfp = pop.county_fips
)
SELECT state_name, county_name,
	   count(*)
FROM dropoff_locations
GROUP BY state_name, county_name
ORDER BY count(*) DESC;
```

Most of the drop-offs are within New York -- Big surprise, `nyc` is in the table's name -- but there are a few dropoffs to counties in New Jersey, New Hampshire, Connecticut, Virginia even.

<img src = "Exercise Images/Dropoff County Count.png" width = "600" style = "margin:auto"/>