# Functions and Packages

## Overview

1. This lecture first introduces the concepts of functions  in Python, focusing on their application in geospatial programming. Functions allow you to encapsulate code into reusable blocks, making your scripts more modular and easier to maintain. 

2. This lecture introduces [NumPy](https://numpy.org) and [Pandas](https://pandas.pydata.org), two fundamental libraries for data manipulation and analysis in Python, with applications in geospatial programming. `NumPy` is essential for numerical operations and handling arrays, while `Pandas` provides powerful tools for data analysis, particularly when working with tabular data. Understanding these libraries will enable you to perform complex data operations efficiently and effectively in geospatial contexts.

## Learning Objectives

By the end of this lecture, you should be able to:

- Define and use functions to perform specific tasks and promote code reuse in geospatial applications.
- Understand the basics of `NumPy` arrays and how to perform operations on them.
- Utilize `Pandas` DataFrames to organize, analyze, and manipulate tabular data.
- Apply `NumPy` and `Pandas` in geospatial programming to process and analyze geospatial datasets.
- Combine `NumPy` and `Pandas` to streamline data processing workflows.
- Develop the ability to perform complex data operations, such as filtering, aggregating, and transforming geospatial data.

## Functions

Functions are blocks of code that perform a specific task and can be reused multiple times. They allow you to structure your code more efficiently and reduce redundancy.

### Defining a Simple Function

Here's a simple function that adds two numbers:

In [None]:
def add(a, b):
    return a + b


# Example usage
result = add(5, 3)
print(f"Result: {result}")

This function takes two parameters `a` and `b`, and returns their sum. You can call it by passing two values as arguments.

In [None]:
# Function to multiply two numbers
def multiply(a, b):
    return a * b


# Calling the function
result = multiply(4, 5)
print(f"Multiplication Result: {result}")

You can call the multiply function with two numbers, and it will return their product.

### Geospatial Example: Haversine Function

Let's apply these concepts to a geospatial problem. The [Haversine formula](https://en.wikipedia.org/wiki/Haversine_formula) calculates the distance between two points on the Earth’s surface.

![](https://upload.wikimedia.org/wikipedia/commons/c/cb/Illustration_of_great-circle_distance.svg)

In [None]:
from math import radians, sin, cos, sqrt, atan2

In [None]:
def haversine(lat1, lon1, lat2, lon2):
    R = 6371.0  # Earth radius in kilometers
    dlat = radians(lat2 - lat1)
    dlon = radians(lon2 - lon1)
    a = (
        sin(dlat / 2) ** 2
        + cos(radians(lat1)) * cos(radians(lat2)) * sin(dlon / 2) ** 2
    )
    c = 2 * atan2(sqrt(a), sqrt(1 - a))
    distance = R * c
    return distance


# Example usage
distance = haversine(35.6895, 139.6917, 34.0522, -118.2437)
print(f"Distance: {distance:.2f} km")

Now, let's create a function that takes a list of coordinate pairs and returns a list of distances between consecutive points.

In [None]:
def batch_haversine(coord_list):
    distances = []
    for i in range(len(coord_list) - 1):
        lat1, lon1 = coord_list[i]
        lat2, lon2 = coord_list[i + 1]
        distance = haversine(lat1, lon1, lat2, lon2)
        distances.append(distance)
    return distances


# Example usage
coordinates = [(35.6895, 139.6917), (34.0522, -118.2437), (40.7128, -74.0060)]
distances = batch_haversine(coordinates)
print(f"Distances: {distances}")

## Introduction to NumPy

`NumPy` (Numerical Python) is a library used for scientific computing. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays.

### Creating NumPy Arrays

Let's start by creating some basic `NumPy` arrays.

In [1]:
import numpy as np

In [2]:
# Creating a 1D array
arr_1d = np.array([1, 2, 3, 4, 5])
print(f"1D Array: {arr_1d}")

1D Array: [1 2 3 4 5]


In [3]:
# Creating a 2D array
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
print(f"2D Array:\n{arr_2d}")

2D Array:
[[1 2 3]
 [4 5 6]]


In [4]:
# Creating an array of zeros
zeros = np.zeros((3, 3))
print(f"Array of zeros:\n{zeros}")

Array of zeros:
[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


In [5]:
# Creating an array of ones
ones = np.ones((2, 4))
print(f"Array of ones:\n{ones}")

Array of ones:
[[1. 1. 1. 1.]
 [1. 1. 1. 1.]]


In [6]:
# Creating an array with a range of values
range_arr = np.arange(0, 10, 2)
print(f"Range Array: {range_arr}")

Range Array: [0 2 4 6 8]


### Basic Array Operations

`NumPy` allows you to perform element-wise operations on arrays.

In [None]:
# Array addition
arr_sum = arr_1d + 10
print(f"Array after addition: {arr_sum}")

In [None]:
# Array multiplication
arr_product = arr_1d * 2
print(f"Array after multiplication: {arr_product}")

In [None]:
# Element-wise multiplication of two arrays
arr_2d_product = arr_2d * np.array([1, 2, 3])
print(f"Element-wise multiplication of 2D array:\n{arr_2d_product}")

### Working with Geospatial Coordinates

You can use `NumPy` to perform calculations on arrays of geospatial coordinates, such as converting from degrees to radians.

In [7]:
# Array of latitudes and longitudes
coords = np.array([[35.6895, 139.6917], [34.0522, -118.2437], [51.5074, -0.1278]])

# Convert degrees to radians
coords_radians = np.radians(coords)
print(f"Coordinates in radians:\n{coords_radians}")

Coordinates in radians:
[[ 6.22899283e-01  2.43808010e+00]
 [ 5.94323008e-01 -2.06374188e+00]
 [ 8.98973719e-01 -2.23053078e-03]]


## Introduction to Pandas

`Pandas` is a powerful data manipulation library that provides data structures like Series and DataFrames to work with structured data. It is especially useful for handling tabular data.

### Creating Pandas Series and DataFrames

Let's create a `Pandas` Series and DataFrame.

In [9]:
import pandas as pd

In [18]:
# Creating a Series
city_series = pd.Series(["Tokyo", "Los Angeles", "London"], name="City")
print(f"Pandas Series:\n{city_series}\n")

Pandas Series:
0          Tokyo
1    Los Angeles
2         London
Name: City, dtype: object



In [11]:
# Creating a DataFrame
data = {
    "City": ["Tokyo", "Los Angeles", "London"],
    "Latitude": [35.6895, 34.0522, 51.5074],
    "Longitude": [139.6917, -118.2437, -0.1278],
}
df = pd.DataFrame(data)
print(f"Pandas DataFrame:\n{df}")

Pandas DataFrame:
          City  Latitude  Longitude
0        Tokyo   35.6895   139.6917
1  Los Angeles   34.0522  -118.2437
2       London   51.5074    -0.1278


### Basic DataFrame Operations

You can perform various operations on `Pandas` DataFrames, such as filtering, selecting specific columns, and applying functions.

In [12]:
# Selecting a specific column
latitudes = df["Latitude"]
print(f"Latitudes:\n{latitudes}\n")

Latitudes:
0    35.6895
1    34.0522
2    51.5074
Name: Latitude, dtype: float64



In [13]:
# Filtering rows based on a condition
df_filtered = df[df["Longitude"] < 0]
df_filtered

Unnamed: 0,City,Latitude,Longitude
1,Los Angeles,34.0522,-118.2437
2,London,51.5074,-0.1278


In [14]:
# Adding a new column with a calculation
df["Lat_Radians"] = np.radians(df["Latitude"])
df

Unnamed: 0,City,Latitude,Longitude,Lat_Radians
0,Tokyo,35.6895,139.6917,0.622899
1,Los Angeles,34.0522,-118.2437,0.594323
2,London,51.5074,-0.1278,0.898974


## Combining NumPy and Pandas

You can combine `NumPy` and `Pandas` to perform complex data manipulations. For instance, you might want to apply `NumPy` functions to a `Pandas` DataFrame or use `Pandas` to organize and visualize the results of `NumPy` operations.

Let's say you have a dataset of cities, and you want to calculate the average distance from each city to all other cities.

In [15]:
# Define the Haversine formula using NumPy
def haversine_np(lat1, lon1, lat2, lon2):
    R = 6371.0  # Earth radius in kilometers
    dlat = np.radians(lat2 - lat1)
    dlon = np.radians(lon2 - lon1)
    a = (
        np.sin(dlat / 2) ** 2
        + np.cos(np.radians(lat1)) * np.cos(np.radians(lat2)) * np.sin(dlon / 2) ** 2
    )
    c = 2 * np.arctan2(np.sqrt(a), np.sqrt(1 - a))
    distance = R * c
    return distance


# Create a new DataFrame with city pairs
city_pairs = pd.DataFrame(
    {
        "City1": ["Tokyo", "Tokyo", "Los Angeles"],
        "City2": ["Los Angeles", "London", "London"],
        "Lat1": [35.6895, 35.6895, 34.0522],
        "Lon1": [139.6917, 139.6917, -118.2437],
        "Lat2": [34.0522, 51.5074, 51.5074],
        "Lon2": [-118.2437, -0.1278, -0.1278],
    }
)

# Calculate distances between city pairs
city_pairs["Distance_km"] = haversine_np(
    city_pairs["Lat1"], city_pairs["Lon1"], city_pairs["Lat2"], city_pairs["Lon2"]
)
city_pairs

Unnamed: 0,City1,City2,Lat1,Lon1,Lat2,Lon2,Distance_km
0,Tokyo,Los Angeles,35.6895,139.6917,34.0522,-118.2437,8815.473356
1,Tokyo,London,35.6895,139.6917,51.5074,-0.1278,9558.713695
2,Los Angeles,London,34.0522,-118.2437,51.5074,-0.1278,8755.602341


Pandas can read and write data in various formats, such as CSV, Excel, and SQL databases. This makes it easy to load and save data from different sources. For example, you can read a CSV file into a Pandas DataFrame and then perform operations on the data.

Let's read a CSV file from an HTTP URL into a Pandas DataFrame and display the first few rows of the data.

In [16]:
url = "https://github.com/opengeos/datasets/releases/download/world/world_cities.csv"
df = pd.read_csv(url)
df.head()

Unnamed: 0,id,name,country,latitude,longitude,population
0,1,Bombo,UGA,0.5833,32.5333,75000
1,2,Fort Portal,UGA,0.671,30.275,42670
2,3,Potenza,ITA,40.642,15.799,69060
3,4,Campobasso,ITA,41.563,14.656,50762
4,5,Aosta,ITA,45.737,7.315,34062


The DataFrame contains information about world cities, including their names, countries, populations, and geographical coordinates. We can calculate the total population of all cities in the dataset using NumPy and Pandas as follows.

In [17]:
np.sum(df["population"])

1475534501