# Lab 4

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/giswqs/geog-312/blob/main/book/labs/lab_04.ipynb)

This lab will help you solidify your understanding of working with `NumPy`, `Pandas`, and `GeoPandas` for geospatial data analysis. Through these exercises, you will perform data manipulation, spatial analysis, and visualizations by combining these powerful libraries.

## Exercise 1: NumPy Array Operations and Geospatial Coordinates

In this exercise, you will work with NumPy arrays representing geospatial coordinates (latitude and longitude) and perform basic array operations.

1. Create a 2D NumPy array containing the latitude and longitude of the following cities: Tokyo (35.6895, 139.6917), New York (40.7128, -74.0060), London (51.5074, -0.1278), and Paris (48.8566, 2.3522).
2. Convert the latitude and longitude values from degrees to radians using np.radians().
3. Calculate the element-wise difference between Tokyo and the other cities' latitude and longitude in radians.

In [93]:
import numpy as n

# 1     | Tokyo, New York, London, Paris
coord_list = n.array([(35.6895, 139.6917), (40.7128, -74.0060), (51.5074, -0.1278), (48.8566, 2.3522)])

# 2
rad = n.radians(coord_list)
print(rad, end="\n\n")

# 3
diff = n.diff(rad, axis=0)
print(diff)

[[ 6.22899283e-01  2.43808010e+00]
 [ 7.10572408e-01 -1.29164837e+00]
 [ 8.98973719e-01 -2.23053078e-03]
 [ 8.52708531e-01  4.10536347e-02]]

[[ 0.08767312 -3.72972847]
 [ 0.18840131  1.28941784]
 [-0.04626519  0.04328417]]


## Exercise 2: Pandas DataFrame Operations with Geospatial Data

In this exercise, you'll use Pandas to load and manipulate a dataset containing city population data, and then calculate and visualize statistics.

1. Load the world cities dataset from this URL using Pandas: https://github.com/opengeos/datasets/releases/download/world/world_cities.csv
2. Display the first 5 rows and check for missing values.
3. Filter the dataset to only include cities with a population greater than 1 million.
4. Group the cities by their country and calculate the total population for each country.
5. Sort the cities by population in descending order and display the top 10 cities.

In [94]:
import pandas as p

url = "https://github.com/opengeos/datasets/releases/download/world/world_cities.csv"

#1
df_data = p.read_csv(url)

#2
df_top5 = df_data.head(n=5)
print(df_top5, end="\n\n")

#3
df_filtered = df_data[df_data["population"] > 1_000_000].head(n=10)
print(df_filtered, end="\n\n")

#4
df_grouped = df_filtered.groupby("country")["population"].sum().sort_values(ascending=False)
print(df_grouped, end="\n")

#5
df_filtered.sort_values("population", ascending=False)

   id         name country  latitude  longitude  population
0   1        Bombo     UGA    0.5833    32.5333       75000
1   2  Fort Portal     UGA    0.6710    30.2750       42670
2   3      Potenza     ITA   40.6420    15.7990       69060
3   4   Campobasso     ITA   41.5630    14.6560       50762
4   5        Aosta     ITA   45.7370     7.3150       34062

      id            name country  latitude  longitude  population
97    98           Turin     ITA  45.07039    7.66996     1652000
103  104           Lille     FRA  50.64997    3.08001     1044000
123  124  San Bernardino     USA  34.12038 -117.30003     1745000
124  125      Bridgeport     USA  41.17998  -73.19996     1018000
126  127      Manchester     GBR  53.50042   -2.24799     2230000
127  128      Gujranwala     PAK  32.16043   74.18502     1513000
128  129         Incheon     KOR  37.47615  126.64223     2550000
129  130      Benin City     NGA   6.34048    5.62001     1190000
130  131          Xiamen     CHN  24.44999  1

Unnamed: 0,id,name,country,latitude,longitude,population
128,129,Incheon,KOR,37.47615,126.64223,2550000
130,131,Xiamen,CHN,24.44999,118.08002,2519000
126,127,Manchester,GBR,53.50042,-2.24799,2230000
131,132,Nanchong,CHN,30.78043,106.13,2174000
123,124,San Bernardino,USA,34.12038,-117.30003,1745000
97,98,Turin,ITA,45.07039,7.66996,1652000
127,128,Gujranwala,PAK,32.16043,74.18502,1513000
129,130,Benin City,NGA,6.34048,5.62001,1190000
103,104,Lille,FRA,50.64997,3.08001,1044000
124,125,Bridgeport,USA,41.17998,-73.19996,1018000


## Exercise 3: Creating and Manipulating GeoDataFrames with GeoPandas

This exercise focuses on creating and manipulating GeoDataFrames, performing spatial operations, and visualizing the data.

1. Load the New York City building dataset from the GeoJSON file using GeoPandas: https://github.com/opengeos/datasets/releases/download/places/nyc_buildings.geojson
2. Create a plot of the building footprints and color them based on the building height (use the `height_MS` column).
3. Create an interactive map of the building footprints and color them based on the building height (use the `height_MS` column).
4. Calculate the average building height (use the `height_MS` column).
5. Select buildings with a height greater than the average height.
6. Save the GeoDataFrame to a new GeoJSON file.

## Exercise 4: Combining NumPy, Pandas, and GeoPandas

This exercise requires you to combine the power of NumPy, Pandas, and GeoPandas to analyze and visualize spatial data.

1. Use Pandas to load the world cities dataset from this URL: https://github.com/opengeos/datasets/releases/download/world/world_cities.csv
2. Filter the dataset to include only cities with latitude values between -40 and 60 (i.e., cities located in the Northern Hemisphere or near the equator).
3. Create a GeoDataFrame from the filtered dataset by converting the latitude and longitude into geometries.
4. Reproject the GeoDataFrame to the Mercator projection (EPSG:3857).
5. Calculate the distance (in meters) between each city and the city of Paris.
6. Plot the cities on a world map, coloring the points by their distance from Paris.