# Lab 4

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/giswqs/geog-312/blob/main/book/labs/lab_04.ipynb)

This lab will help you solidify your understanding of working with `NumPy`, `Pandas`, and `GeoPandas` for geospatial data analysis. Through these exercises, you will perform data manipulation, spatial analysis, and visualizations by combining these powerful libraries.

## Exercise 1: NumPy Array Operations and Geospatial Coordinates

In this exercise, you will work with NumPy arrays representing geospatial coordinates (latitude and longitude) and perform basic array operations.

1. Create a 2D NumPy array containing the latitude and longitude of the following cities: Tokyo (35.6895, 139.6917), New York (40.7128, -74.0060), London (51.5074, -0.1278), and Paris (48.8566, 2.3522).
2. Convert the latitude and longitude values from degrees to radians using np.radians().
3. Calculate the element-wise difference between Tokyo and the other cities' latitude and longitude in radians.

In [1]:
import numpy as np

In [6]:
arr_2d = np.array([[35.6895, 139.6917], [40.7128, -78.0060], [51.5074, -0.1278], [48.8566, 2.3522]])
conv_arr = np.radians(arr_2d)
print (conv_arr)

[[ 6.22899283e-01  2.43808010e+00]
 [ 7.10572408e-01 -1.36146154e+00]
 [ 8.98973719e-01 -2.23053078e-03]
 [ 8.52708531e-01  4.10536347e-02]]


In [8]:
arr_2d[0]

array([ 35.6895, 139.6917])

In [10]:
tokyo_arr = arr_2d[0]

for i in arr_2d:
    for element in i:
        calc = np.radians(tokyo_arr - arr_2d)
    
print(calc)


[[ 0.          0.        ]
 [-0.08767312  3.79954164]
 [-0.27607444  2.44031063]
 [-0.22980925  2.39702647]]


## Exercise 2: Pandas DataFrame Operations with Geospatial Data

In this exercise, you'll use Pandas to load and manipulate a dataset containing city population data, and then calculate and visualize statistics.

1. Load the world cities dataset from this URL using Pandas: https://github.com/opengeos/datasets/releases/download/world/world_cities.csv
2. Display the first 5 rows and check for missing values.
3. Filter the dataset to only include cities with a population greater than 1 million.
4. Group the cities by their country and calculate the total population for each country.
5. Sort the cities by population in descending order and display the top 10 cities.

In [13]:
import pandas as pd

In [16]:
url = "https://github.com/opengeos/datasets/releases/download/world/world_cities.csv"
df = pd.read_csv(url)
df.head()

Unnamed: 0,id,name,country,latitude,longitude,population
0,1,Bombo,UGA,0.5833,32.5333,75000
1,2,Fort Portal,UGA,0.671,30.275,42670
2,3,Potenza,ITA,40.642,15.799,69060
3,4,Campobasso,ITA,41.563,14.656,50762
4,5,Aosta,ITA,45.737,7.315,34062


In [20]:
new_df = df[df['population'] > 1000000]
print(new_df)

        id            name country  latitude  longitude  population
97      98           Turin     ITA  45.07039    7.66996     1652000
103    104           Lille     FRA  50.64997    3.08001     1044000
123    124  San Bernardino     USA  34.12038 -117.30003     1745000
124    125      Bridgeport     USA  41.17998  -73.19996     1018000
126    127      Manchester     GBR  53.50042   -2.24799     2230000
...    ...             ...     ...       ...        ...         ...
1244  1245  Rio de Janeiro     BRA -22.92502  -43.22502    11748000
1245  1246       Sao Paulo     BRA -23.55868  -46.62502    18845000
1246  1247          Sydney     AUS -33.92001  151.18518     4630000
1247  1248       Singapore     SGP   1.29303  103.85582     5183700
1248  1249       Hong Kong     CHN  22.30498  114.18501     7206000

[392 rows x 6 columns]


In [24]:
df_grouped = df.groupby('country')['population'].sum()
print(df_grouped)

country
AFG     4931702
AGO     6821544
ALB      895350
ALD       10682
AND       53998
         ...   
WSM       61916
YEM     3759000
ZAF    13373789
ZMB     2326947
ZWE     2611745
Name: population, Length: 200, dtype: int64


In [28]:
df_filter = df.groupby('name')['population'].sum().sort_values(ascending=False)
df_filter.head(10)

name
Tokyo           35676000
New York        19040000
Mexico City     19028000
Mumbai          18978000
Sao Paulo       18845000
Delhi           15926000
Shanghai        14987000
Kolkata         14787000
Dhaka           12797394
Buenos Aires    12795000
Name: population, dtype: int64

## Exercise 3: Creating and Manipulating GeoDataFrames with GeoPandas

This exercise focuses on creating and manipulating GeoDataFrames, performing spatial operations, and visualizing the data.

1. Load the New York City building dataset from the GeoJSON file using GeoPandas: https://github.com/opengeos/datasets/releases/download/places/nyc_buildings.geojson
2. Create a plot of the building footprints and color them based on the building height (use the `height_MS` column).
3. Create an interactive map of the building footprints and color them based on the building height (use the `height_MS` column).
4. Calculate the average building height (use the `height_MS` column).
5. Select buildings with a height greater than the average height.
6. Save the GeoDataFrame to a new GeoJSON file.

In [None]:
url = "https://github.com/opengeos/datasets/releases/download/places/nyc_buildings.geojson"


## Exercise 4: Combining NumPy, Pandas, and GeoPandas

This exercise requires you to combine the power of NumPy, Pandas, and GeoPandas to analyze and visualize spatial data.

1. Use Pandas to load the world cities dataset from this URL: https://github.com/opengeos/datasets/releases/download/world/world_cities.csv
2. Filter the dataset to include only cities with latitude values between -40 and 60 (i.e., cities located in the Northern Hemisphere or near the equator).
3. Create a GeoDataFrame from the filtered dataset by converting the latitude and longitude into geometries.
4. Reproject the GeoDataFrame to the Mercator projection (EPSG:3857).
5. Calculate the distance (in meters) between each city and the city of Paris.
6. Plot the cities on a world map, coloring the points by their distance from Paris.