# Lab 5

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/notebooks/basic_features_overview.ipynb)

## Exercise 1: Calculating Distances with Functions

- Define a function `calculate_distance` that takes two geographic coordinates (latitude and longitude) and returns the distance between them using the Haversine formula.
- Use this function to calculate the distance between multiple pairs of coordinates.

In [2]:
import math

# First, define the function to calculate distance using the Haversine formula
def calculate_distance(coord1, coord2):
    """
    Calculate the great-circle distance between two points on the Earth 
    using the Haversine formula.
    
    Parameters:
    coord1 (tuple): (latitude, longitude) of first point
    coord2 (tuple): (latitude, longitude) of second point
    
    Returns:
    float: Distance in kilometers
    """
    # Radius of Earth in kilometers according to NASA
    R = 6378.0  

    # Convert latitude and longitude from degrees to radians
    lat1, lon1 = math.radians(coord1[0]), math.radians(coord1[1])
    lat2, lon2 = math.radians(coord2[0]), math.radians(coord2[1])

    # Haversine formula
    dlat = lat2 - lat1
    dlon = lon2 - lon1
    a = math.sin(dlat / 2)**2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon / 2)**2
    c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))

    # Compute the distance
    distance = R * c
    return distance

# Next, define multiple pairs of coordinates
coordinates_pairs = [
    ((40.7128, -74.0060), (34.0522, -118.2437)),  # New York to Los Angeles
    ((51.5074, -0.1278), (48.8566, 2.3522)),      # London to Paris
    ((35.6895, 139.6917), (37.7749, -122.4194)),  # Tokyo to San Francisco
]

# Finally, calculate and print distances for each pair
for pair in coordinates_pairs:
    distance = calculate_distance(pair[0], pair[1])
    print(f"Distance between {pair[0]} and {pair[1]}: {distance:.2f} km")


Distance between (40.7128, -74.006) and (34.0522, -118.2437): 3940.07 km
Distance between (51.5074, -0.1278) and (48.8566, 2.3522): 343.93 km
Distance between (35.6895, 139.6917) and (37.7749, -122.4194): 8279.80 km


## Exercise 2: Batch Distance Calculation

- Create a function `batch_distance_calculation` that accepts a list of coordinate pairs and returns a list of distances between consecutive pairs.
- Test the function with a list of coordinates representing several cities.

In [None]:
# Step 1: Define the batch distance calculation function
def batch_distance_calculation(coord_list):
    """
    Calculate distances between consecutive coordinate pairs in a list using the Haversine formula.
    
    Parameters:
    coord_list (list of tuples): List of geographic coordinates (latitude, longitude).
    
    Returns:
    list: List of distances (in kilometers) between consecutive coordinate pairs.
    """
    distances = []
    
    # Iterate through consecutive pairs in the list
    for i in range(len(coord_list) - 1):
        distance = calculate_distance(coord_list[i], coord_list[i + 1])
        distances.append(distance)
    
    return distances

# Step 2: Define a list of coordinates representing cities
city_coordinates = [
    (40.7128, -74.0060),  # New York City
    (34.0522, -118.2437), # Los Angeles
    (41.8781, -87.6298),  # Chicago
    (29.7604, -95.3698),  # Houston
    (48.8566, 2.3522),    # Paris
]

# Step 3: Call the function and store results
distances_between_cities = batch_distance_calculation(city_coordinates)

# Step 4: Print the distances between consecutive city pairs
for i in range(len(city_coordinates) - 1):
    print(f"Distance from {city_coordinates[i]} to {city_coordinates[i+1]}: {distances_between_cities[i]:.2f} km")


## Exercise 3: Creating and Using a Point Class

- Define a `Point` class to represent a geographic point with attributes `latitude`, `longitude`, and `name`.
- Add a method `distance_to` that calculates the distance from one point to another.
- Instantiate several `Point` objects and calculate the distance between them.

In [None]:
import math

# First, Define the Point class
class Point:
    def __init__(self, name, latitude, longitude):
        self.name = name
        self.latitude = latitude
        self.longitude = longitude

# Then, Define a method to calculate the distance to another Point using the Haversine formula
    def distance_to(self, other_point):
        R = 6371.0  # Radius of the Earth in kilometers

        # Convert latitude and longitude from degrees to radians
        lat1 = math.radians(self.latitude)
        lon1 = math.radians(self.longitude)
        lat2 = math.radians(other_point.latitude)
        lon2 = math.radians(other_point.longitude)

        # Haversine formula
        dlat = lat2 - lat1
        dlon = lon2 - lon1
        a = math.sin(dlat / 2)**2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon / 2)**2
        c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
        distance = R * c

        return distance

# Next, several Point objects
point1 = Point("New York City", 40.7128, -74.0060)
point2 = Point("Los Angeles", 34.0522, -118.2437)
point3 = Point("Paris", 48.8566, 2.3522)

# Calculate and print distance between the points 
print(f"Distance from {point1.name} to {point2.name}: {point1.distance_to(point2):.2f} km")
print(f"Distance from {point1.name} to {point3.name}: {point1.distance_to(point3):.2f} km")
print(f"Distance from {point2.name} to {point3.name}: {point2.distance_to(point3):.2f} km")


Distance from New York City to Los Angeles: 3935.75 km
Distance from New York City to Paris: 5837.24 km
Distance from Los Angeles to Paris: 9085.51 km


## Exercise 4: Reading and Writing Files

- Write a function `read_coordinates` that reads a file containing a list of coordinates (latitude, longitude) and returns them as a list of tuples.
- Write another function `write_coordinates` that takes a list of coordinates and writes them to a new file.
- Ensure that both functions handle exceptions, such as missing files or improperly formatted data.

In [4]:
import os

# Function to read coordinates from a file and return a list of tuples
def read_coordinates(file_path):
    coordinates = []
    try:
        with open(file_path, 'r') as file:
            for line in file:
                try:
                    lat, lon = map(float, line.strip().split(','))
                    coordinates.append((lat, lon))
                except ValueError:
                    print(f"Skipping malformed line: {line.strip()}")
    except FileNotFoundError:
        print(f"Error: File not found - {file_path}")
    return coordinates

# Function to write a list of coordinate tuples to a file
def write_coordinates(file_path, coordinates):
    try:
        with open(file_path, 'w') as file:
            for lat, lon in coordinates:
                file.write(f"{lat},{lon}\n")
    except Exception as e:
        print(f"Error writing to file: {e}")

# Test the functions with sample data (write first, then read)
sample_coords = [(40.7128, -74.0060), (34.0522, -118.2437), (48.8566, 2.3522)]
write_path = "/mnt/data/sample_coords.txt"

# Write coordinates to file
write_coordinates(write_path, sample_coords)

# Read coodinates from the file
read_coords = read_coordinates(write_path)
read_coords


Error writing to file: [Errno 2] No such file or directory: '/mnt/data/sample_coords.txt'
Error: File not found - /mnt/data/sample_coords.txt


[]

## Exercise 5: Processing Coordinates from a File

- Create a function that reads coordinates from a file and uses the `Point` class to create `Point` objects.
- Calculate the distance between each consecutive pair of points and write the results to a new file.
- Ensure the function handles file-related exceptions and gracefully handles improperly formatted lines.

In [7]:
# Create a sample coordinates.txt file
sample_data = """35.6895,139.6917
34.0522,-118.2437
51.5074,-0.1278
-33.8688,151.2093
48.8566,2.3522"""

output_file = "coordinates.txt"

try:
    with open(output_file, "w") as file:
        file.write(sample_data)
    print(f"Sample file '{output_file}' has been created successfully.")
except Exception as e:
    print(f"An error occurred while creating the file: {e}")



Sample file 'coordinates.txt' has been created successfully.


In [8]:
import math

# Define the Point class
class Point:
    def __init__(self, name, latitude, longitude):
        self.name = name
        self.latitude = latitude
        self.longitude = longitude

    def distance_to(self, other_point):
        R = 6371.0  # Earth radius in kilometers
        lat1 = math.radians(self.latitude)
        lon1 = math.radians(self.longitude)
        lat2 = math.radians(other_point.latitude)
        lon2 = math.radians(other_point.longitude)
        dlat = lat2 - lat1
        dlon = lon2 - lon1
        a = math.sin(dlat / 2)**2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon / 2)**2
        c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
        return R * c

# Function to read from a file and process distances between consecutive points
def process_points_from_file(input_file, output_file):
    points = []
    try:
        with open(input_file, 'r') as file:
            for index, line in enumerate(file.readlines()):
                try:
                    lat, lon = map(float, line.strip().split(','))
                    point = Point(f"Point {index + 1}", lat, lon)
                    points.append(point)
                except ValueError:
                    print(f"Skipping malformed line: {line.strip()}")
    except FileNotFoundError:
        print(f"File not found: {input_file}")
        return

    # Calculate distances between consecutive points and write to output file
    try:
        with open(output_file, 'w') as out_file:
            for i in range(len(points) - 1):
                p1 = points[i]
                p2 = points[i + 1]
                distance = p1.distance_to(p2)
                out_file.write(f"Distance from {p1.name} to {p2.name}: {distance:.2f} km\n")
    except Exception as e:
        print(f"An error occurred while writing to the output file: {e}")

# Create a sample coordinates.txt file
sample_data = """35.6895,139.6917
34.0522,-118.2437
51.5074,-0.1278
-33.8688,151.2093
48.8566,2.3522"""

input_file_path = "/mnt/data/coordinates.txt"
output_file_path = "/mnt/data/output_distances.txt"

# Write the sample coordinates to a file
try:
    with open(input_file_path, "w") as file:
        file.write(sample_data)
except Exception as e:
    print(f"An error occurred while creating the file: {e}")

# Process the points and calculate distances
process_points_from_file(input_file_path, output_file_path)

# Show the output file content
with open(output_file_path, 'r') as result_file:
    output_lines = result_file.readlines()
output_lines

An error occurred while creating the file: [Errno 2] No such file or directory: '/mnt/data/coordinates.txt'
File not found: /mnt/data/coordinates.txt


FileNotFoundError: [Errno 2] No such file or directory: '/mnt/data/output_distances.txt'

## Exercise 6: Exception Handling in Data Processing

- Modify the `batch_distance_calculation` function to handle exceptions that might occur during the calculation, such as invalid coordinates.
- Ensure the function skips invalid data and continues processing the remaining data.

In [9]:
import math

# Reuse the Haversine distance function
def calculate_distance(coord1, coord2):
    try:
        # Radius of Earth in kilometers
        R = 6371.0

        # Convert latitude and longitude from degrees to radians
        lat1, lon1 = math.radians(coord1[0]), math.radians(coord1[1])
        lat2, lon2 = math.radians(coord2[0]), math.radians(coord2[1])

        # Haversine formula
        dlat = lat2 - lat1
        dlon = lon2 - lon1
        a = math.sin(dlat / 2)**2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon / 2)**2
        c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
        return R * c
    except Exception as e:
        print(f"Error calculating distance between {coord1} and {coord2}: {e}")
        return None

# Modified function with exception handling
def batch_distance_calculation(coord_list):
    distances = []
    for i in range(len(coord_list) - 1):
        try:
            distance = calculate_distance(coord_list[i], coord_list[i + 1])
            if distance is not None:
                distances.append(distance)
        except Exception as e:
            print(f"Skipping pair {coord_list[i]} to {coord_list[i + 1]} due to error: {e}")
            continue
    return distances

# Test with a list containing valid and invalid data
test_coordinates = [
    (40.7128, -74.0060),   # New York
    (34.0522, -118.2437),  # Los Angeles
    "Invalid",             # Invalid entry
    (51.5074, -0.1278),    # London
    (None, 139.6917),      # Invalid latitude
    (48.8566, 2.3522)      # Paris
]

# Clean the list by removing obviously invalid entries before processing
clean_coords = [c for c in test_coordinates if isinstance(c, tuple) and len(c) == 2 and all(isinstance(val, (int, float)) for val in c)]

# Run the modified batch distance calculation
batch_distances = batch_distance_calculation(clean_coords)
batch_distances


[3935.746254609723, 8755.602341157259, 343.55606034104153]

## Exercise 7: NumPy Array Operations and Geospatial Coordinates

In this exercise, you will work with NumPy arrays representing geospatial coordinates (latitude and longitude) and perform basic array operations.

1. Create a 2D NumPy array containing the latitude and longitude of the following cities: Tokyo (35.6895, 139.6917), New York (40.7128, -74.0060), London (51.5074, -0.1278), and Paris (48.8566, 2.3522).
2. Convert the latitude and longitude values from degrees to radians using np.radians().
3. Calculate the element-wise difference between Tokyo and the other cities' latitude and longitude in radians.

In [10]:
import numpy as np

# Step 1: Create a 2D NumPy array of city coordinates (latitude, longitude)
cities_deg = np.array([
    [35.6895, 139.6917],   # Tokyo
    [40.7128, -74.0060],   # New York
    [51.5074, -0.1278],    # London
    [48.8566, 2.3522]      # Paris
])

# Step 2: Convert the coordinates from degrees to radians
cities_rad = np.radians(cities_deg)

# Step 3: Calculate the element-wise difference in radians between Tokyo and the other cities
# Tokyo is at index 0
tokyo_rad = cities_rad[0]
diff_from_tokyo = cities_rad[1:] - tokyo_rad  # Subtract Tokyo's lat/lon from the rest

# Display results
{
    "Cities in Radians": cities_rad,
    "Difference from Tokyo (in radians)": diff_from_tokyo
}


{'Cities in Radians': array([[ 6.22899283e-01,  2.43808010e+00],
        [ 7.10572408e-01, -1.29164837e+00],
        [ 8.98973719e-01, -2.23053078e-03],
        [ 8.52708531e-01,  4.10536347e-02]]),
 'Difference from Tokyo (in radians)': array([[ 0.08767312, -3.72972847],
        [ 0.27607444, -2.44031063],
        [ 0.22980925, -2.39702647]])}

## Exercise 8: Pandas DataFrame Operations with Geospatial Data

In this exercise, you'll use Pandas to load and manipulate a dataset containing city population data, and then calculate and visualize statistics.

1. Load the world cities dataset from this URL using Pandas: https://github.com/opengeos/datasets/releases/download/world/world_cities.csv
2. Display the first 5 rows and check for missing values.
3. Filter the dataset to only include cities with a population greater than 1 million.
4. Group the cities by their country and calculate the total population for each country.
5. Sort the cities by population in descending order and display the top 10 cities.

In [15]:
#  Make sure pandas has been installed on your machine. 

import pandas as pd

# Step 1: Load the dataset
url = "https://github.com/opengeos/datasets/releases/download/world/world_cities.csv"
df = pd.read_csv(url)

# Step 2: Display the first 5 rows and check for missing values
print("First 5 rows of the dataset:")
print(df.head())
print("\nSummary of missing values:")
print(df.isnull().sum())

# Step 3: Filter cities with population greater than 1 million
df_filtered = df[df['population'] > 1_000_000]

# Step 4: Group by country and calculate total population
country_population = df_filtered.groupby('country')['population'].sum().reset_index()
country_population = country_population.sort_values(by='population', ascending=False)
print("\nTotal population per country (for cities > 1 million):")
print(country_population)

# Step 5: Sort cities by population in descending order and display the top 10
top_cities = df_filtered.sort_values(by='population', ascending=False).head(10)
print("\nTop 10 cities by population (greater than 1 million):")
print(top_cities[['name', 'country', 'population']])



URLError: <urlopen error [Errno 11001] getaddrinfo failed>

## Exercise 9: Creating and Manipulating GeoDataFrames with GeoPandas

This exercise focuses on creating and manipulating GeoDataFrames, performing spatial operations, and visualizing the data.

1. Load the New York City building dataset from the GeoJSON file using GeoPandas: https://github.com/opengeos/datasets/releases/download/places/nyc_buildings.geojson
2. Create a plot of the building footprints and color them based on the building height (use the `height_MS` column).
3. Create an interactive map of the building footprints and color them based on the building height (use the `height_MS` column).
4. Calculate the average building height (use the `height_MS` column).
5. Select buildings with a height greater than the average height.
6. Save the GeoDataFrame to a new GeoJSON file.

## Exercise 10: Combining NumPy, Pandas, and GeoPandas

This exercise requires you to combine the power of NumPy, Pandas, and GeoPandas to analyze and visualize spatial data.

1. Use Pandas to load the world cities dataset from this URL: https://github.com/opengeos/datasets/releases/download/world/world_cities.csv
2. Filter the dataset to include only cities with latitude values between -40 and 60 (i.e., cities located in the Northern Hemisphere or near the equator).
3. Create a GeoDataFrame from the filtered dataset by converting the latitude and longitude into geometries.
4. Reproject the GeoDataFrame to the Mercator projection (EPSG:3857).
5. Calculate the distance (in meters) between each city and the city of Paris.
6. Plot the cities on a world map, coloring the points by their distance from Paris.

## Submission Requirements

Complete the exercises above and and upload the notebook to your GitHub repository. Make sure the notebook has a Colab badge at the top so that it can be easily opened in Google Colab. Submit the URL of the notebook to Canvas.