# Getting Comfortable with Latitude and Longitude

In the next few lectures, we will be working with **geospatial data**---that is, data involving locations on Earth.

Locations on Earth are specified using **latitude** and **longitude**.

- Longitude specifies east-west position.
- Latitude specifies north-south position.

For example, the location below is at a longitude of $\lambda = -45^\circ$ and a latitude of $\phi = 30^\circ$.

<img src="https://camo.githubusercontent.com/4be5598b5c6c48e7dee4fad2718c496985af48da3e3afe9cdc2c467ef7c51301/68747470733a2f2f6769746875622e636f6d2f646c73756e2f706f64732f626c6f622f6d61737465722f31322d47656f7370617469616c2d446174612f696d672f636f6f7264696e6174652e706e673f7261773d31" width="500">

The Earth is split into:

- Northern and Southern Hemispheres by the **equator** (at $0^\circ$ latitude)
- Eastern and Western Hemispheres by the **prime meridian** (at $0^\circ$ longitude).

Latitude and longitude represent angles relative to the equator and prime meridian, respectively.

<img src="https://camo.githubusercontent.com/743975394143e912443d9e8c3a5331056e8b1008ad574d0181640047ab669b32/68747470733a2f2f6769746875622e636f6d2f646c73756e2f706f64732f626c6f622f6d61737465722f31322d47656f7370617469616c2d446174612f696d672f636f6f7264696e6174655f6c6162656c65642e706e673f7261773d31" width="500">

Note that longitudes west of the prime meridian and latitudes south of the equator are negative.

## World Cities Data

We will work with a dataset containing the locations and populations of world "cities". (As you will see, some of the places in this dataset are not really cities.)

In [None]:
import pandas as pd
df_cities = pd.read_csv("http://dlsun.github.io/pods/data/worldcities.csv")
df_cities

### Exercise 1

What are the northernmost and southernmost "cities" in the world?

(You may want to try restricting to places with a population of at least 1000.)

### Exercise 2

Let's call a place a "city" if it has a population of at least 30,000. What are the westernmost and easternmost cities in the world?

Look these cities up on a map. How far apart are these cities really?

### Exercise 3

Which state in the U.S. is closest to Morocco? (Take a guess before you do this exercise.)

You can answer this question by finding the closest cities in the United States to Casablanca, Morocco (made famous by the 1942 movie).

## Problems with Euclidean Distance

In Exercise 3, you used Euclidean distance to measure the distance between two points $\vec x_1 = (\lambda_1, \phi_1)$ and $\vec x_2 = (\lambda_2, \phi_2)$. That is, you calculated the distance as:

$$ d(\vec x_1, \vec x_2) = \sqrt{(\lambda_1 - \lambda_2)^2 + (\phi_1 - \phi_2)^2}. $$

However, there are problems with Euclidean distance, as we now explore.

Which city is closer to London: Vancouver or Singapore? If we use Euclidean distance:

In [None]:
from sklearn.metrics import pairwise_distances

df_cities_indexed = df_cities.set_index(["city", "country"])
cities = [("London", "United Kingdom"),
          ("Vancouver", "Canada"),
          ("Singapore", "Singapore")]

pairwise_distances(df_cities_indexed.loc[cities, ["lat", "lng"]])

Singapore appears to be closer than Vancouver (distance of 115 vs. 123). But is it really? A flight between London and Vancouver takes 9-10 hours; a flight between London and Singapore takes 13-15 hours.

Because the Earth is (approximately) a sphere, the shortest path between two locations may not be a straight line, which is what Euclidean distance measures.

Instead, the shortest path is along the **great circle** passing through the two points. A great circle is any line going around the Earth that divides the sphere into two equal hemispheres.

<img src="https://camo.githubusercontent.com/445680e66ecafe2c93e04c723bdd8ee2e7c9189bb78ae90e78d5f74b66d4d642/68747470733a2f2f6769746875622e636f6d2f646c73756e2f706f64732f626c6f622f6d61737465722f31322d47656f7370617469616c2d446174612f696d672f686176657273696e652e706e673f7261773d31" />

To appreciate the difference between the shortest path and the straight line, take a look at [this app](https://academo.org/demos/geodesics/). This phenomenon is likely familiar to you if you have taken a transcontinental flight.

To calculate the length of the shortest path, we need a new distance metric.

**Haversine distance** calculates the distance between two points on a sphere. It is defined as:
$$ d({\bf x}_1, {\bf x}_2) = 2r \arcsin\left( \sqrt{\sin^2\left( \frac{\phi_1 - \phi_2}{2} \right) + \cos(\phi_1) \cos(\phi_2) \sin^2\left( \frac{\lambda_1 - \lambda_2}{2} \right)} \right),$$
where $r$ is the radius of the sphere. Note that the latitudes $\phi_j$ and longitudes $\lambda_j$ must be in _radians_.

Because the Earth is not actually a sphere, Haversine distance is only approximate for measuring distance between two locations on Earth. But it is much better than Euclidean distance.

We can calculate Haversine distance using Scikit-Learn by specifying `metric="haversine"`. Don't forget to convert latitude and longitude, which are in degrees, to radians by multiplying by $\pi / 180$.

In [None]:
# convert latitude and longitude to radians
import numpy as np
radians = np.pi / 180 * df_cities_indexed.loc[cities, ["lat", "lng"]]

pairwise_distances(radians, metric="haversine")

Now, Vancouver is correctly identified as being closer to London (distance of 1.19) than Singapore (distance of 1.70).

Note that Scikit-Learn's Haversine distance assumes that the radius is $r=1$. If you want the distance in miles (kilometers), you have to multiply by the radius of the Earth, which is 3,960 miles (6,378 kilometers).

### Exercise 4

Let's do Exercise 3 again, but correctly this time. What cities in the United States are actually closest to Morocco using Haversine distance?