In [1]:
import requests
from IPython.display import Markdown

url = 'https://kata.geosci.ai/challenge/birthquakes'

r = requests.get(url)
print('Status', r.status_code)

Markdown(r.text)

Status 200


# Birthquakes

We are going to look at earthquakes, on your birthdate. Birthquakes!

We will also be implementing the haversine formula for determining the distance between two ponts on the earth's surface.

This challenge is a bit different from the previous ones. You can use any old string for your key, as usual, but if you use a date, you'll get data for that date. For example:

      url = 'https://kata.geosci.ai/challenge/birthquakes'
      params = {'key': '1980-06-30'}  # <-- The key can be a date.
      r = requests.get(url, params)

Your challenge input is now `r.text`. There is a header row containing the names of the columns, plus a number of data rows or 'records'. Each row has 13 columns, and represents the data for a single earthquake.

You need to answer the following questions:

1. How many records (i.e. earthquakes) are there?
2. What is the depth **in metres** of the earthquake with the largest **Magnitude**?
3. What is the great circle distance **to the nearest km**, as given by the haversine formula, between the epicentres of the two **largest** earthquakes, as measured by magnitude?
4. Consider all pairs of events. How many pairs are within 100 km of each other?

Note that because we're asking about epicentres, so you don't need to worry about depth when calculating great circle distances.

For Question 4, only count unique pairs. For example, inthe diagram below there are 15 pairs of points altogether, of which there are 7 pairs with a mutual distance of <100 km here &mdash; 1 pair on the left and 6 on the right:

      
      x                  x
                            x
         x              x  x
            ==========
              100 km


## Haversine formula

There are several formulas for computing [great circle distance](https://en.wikipedia.org/wiki/Great-circle_distance) on a sphere. The simplest accurate one is the haversine formula, which is described here.

Given two points with (_latitude_, _longitude_), we'll denote point 1 with $(\varphi_1, \lambda_1)$ and point 2 with $(\varphi_2, \lambda_2)$. Then distance _d_ is related to radius _r_ by:

$$   d  = 2r \arcsin\left(\sqrt{\sin^2\left(\frac{\varphi_2 - \varphi_1}{2}\right) + \cos(\varphi_1) \cos(\varphi_2)\sin^2\left(\frac{\lambda_2 - \lambda_1}{2}\right)}\right)$$

Some hints about implementing this in Python:

- Use $r = 6371\ \mathrm{km}$ for the radius of the earth.
- $\sin^2(x)$ means $\sin(x) \times \sin(x)$.
- Both the `math` module and NumPy have the functions `sin()`, `cos()`; these functions expect radians, so an angle in degrees must be converted to radians with `radians()` before giving it to the function.
- The arcsine function in `math` is called `asin()`; in NumPy it's `arcsin()`.
- You should get the following results from your function:
  - The distance **to the nearest km** from (0, 0) to (0, 1) is 111 km.
  - The distance **to the nearest km** from (0, 2.35) to (90, 2.35) is 10008 km. [(Why?)](https://en.wikipedia.org/wiki/History_of_the_metre)
  - The distance **to the nearest km** from (44.65, -63.58) to (53.73, -1.86) is 4448 km.


## A quick reminder how this works

You can retrieve your data by choosing any date (or any old Python string to choose a random date) as a **`<KEY>`** and substituting here:
    
    https://kata.geosci.ai/challenge/birthquakes?key=<KEY>
                                                     ^^^^^
                                                     you can use a date, e.g. 2001-01-01

To answer question 1, make a request like:

    https://kata.geosci.ai/challenge/birthquakes?key=<KEY>&question=1&answer=123
                                                     ^^^^^          ^        ^^^
                                                     your key       Q        your answer

[Complete instructions at kata.geosci.ai](https://kata.geosci.ai/challenge)

----

© 2020 Agile Scientific, licensed CC-BY

## Load the input data

I found this problem to be a little buggy... I tried using 'scibbatical' and '1982-12-25' as my_key, but my answer for question 4 returned as 'Incorrect'. I was pretty sure I had the correct answer in both cases, so I tried 'huh' and things worked.

I think the trouble might be that some distances between earthquakes were very close to 100 km (100.1... km). Maybe there is some rounding error somewhere or difference in the haversine calculation?

In [8]:
my_key = '1982-12-25' #'huh' # Not "scibbatical" this time!

params = {'key': my_key}

r = requests.get(url, params)

# Look at the first bit of the input:
r.text[:200]

'#EventID|Time|Latitude|Longitude|Depth/km|Author|Catalog|Contributor|ContributorID|MagType|Magnitude|MagAuthor|EventLocationName\nci500531|1982-12-25T23:44:43.810|36.109|-117.808|4.611|ci|ci|ci|ci50053'

### Parse the input

There are '\n' so separate lines, and the table cells are separated by '|'. I think using Pandas is a good idea ;)

In [9]:
import pandas as pd

In [10]:
# Carve off the headers from the first line
headers = r.text.split('\n')[0].split('|')

# Define the other rows
interim = r.text.split('\n')[1:]

# BUT! I noticed that the last record finishes with '\n', creating an empty row. Ignore it.
interim = interim[:-1]

In [11]:
# Now define a DataFrame for the earthquakes:
eqs = pd.DataFrame([x.split('|') for x in interim], columns=headers)

In [12]:
eqs.tail()

Unnamed: 0,#EventID,Time,Latitude,Longitude,Depth/km,Author,Catalog,Contributor,ContributorID,MagType,Magnitude,MagAuthor,EventLocationName
112,hv19824966,1982-12-25T01:30:28.690,19.3863333,-155.2813333,1.719,hv,hv,hv,hv19824966,ml,2.07,hv,"Hawaii region, Hawaii"
113,nc1083441,1982-12-25T01:04:54.950,37.6205,-118.88,1.526,nc,nc,nc,nc1083441,md,1.32,nc,"Long Valley area, California"
114,ci500502,1982-12-25T01:03:00.830,33.777,-115.966,5.58,ci,ci,ci,ci500502,mc,1.81,ci,"22km ENE of Coachella, CA"
115,ci500501,1982-12-25T00:41:01.060,33.56,-116.711,3.756,ci,ci,ci,ci500501,mc,1.82,ci,"4km W of Anza, CA"
116,usp0001rq8,1982-12-25T00:02:27.790,-15.392,-73.648,99.1,us,us,us,usp0001rq8,mb,4.9,us,southern Peru


## Question 1

_How many records (i.e. earthquakes) are there?_

In [13]:
answer1 = eqs.shape[0]

print('There are', answer1, 'earthquake records on', my_key, '.')

There are 117 earthquake records on 1982-12-25 .


Submit answer 1:

In [14]:
params = {'key': my_key,   # <--- must be the same key as before
          'question': 1,   # <--- which question you're answering
          'answer': answer1,  # <--- your answer to that question
         }

r = requests.get(url, params)

r.text

'Correct'

## Question 2

_What is the depth in metres of the earthquake with the largest Magnitude?_

Well, first of all, we need to convert the strings held in the Magnitude and Depth/km cells into numbers. And before we do that, we'll need to fill in some empty cells in the Magnitude column.

In [15]:
import numpy as np

# Replace empty cells with nan
eqs.Magnitude.loc[eqs.Magnitude == ''] = np.nan

# Now replace the strings in these columns with numbers 
eqs = eqs.astype({'Latitude': 'float',
            'Longitude': 'float',
            'Depth/km': 'float',
            'Magnitude': 'float'
           })

In [16]:
answer2 = eqs['Depth/km'].loc[eqs.Magnitude == eqs.Magnitude.max()].values[0] * 1000

print('The largest earthquake had a magnitude of', eqs.Magnitude.max(),
      ', and a depth of', answer2,'m.')

The largest earthquake had a magnitude of 5.9 , and a depth of 33000.0 m.


Sumbit answer 2:

In [17]:
params = {'key': my_key,   # <--- must be the same key as before
          'question': 2,   # <--- which question you're answering
          'answer': answer2,  # <--- your answer to that question
         }

r = requests.get(url, params)

r.text

'Correct'

## Question 3

_What is the great circle distance to the nearest km, as given by the haversine formula, between the epicentres of the two largest earthquakes, as measured by magnitude?_

Okay, so I need to code up the haversine formula...? Okay, maybe I just copy it from https://gist.github.com/rochacbruno/2883505!

In [18]:
import math

def havers(origin, destination):
    lat1, lon1 = origin
    lat2, lon2 = destination
    radius = 6371 # km - this copied funciton has the same radius suggested by Agile!

    dlat = math.radians(lat2-lat1)
    dlon = math.radians(lon2-lon1)
    a = math.sin(dlat/2) * math.sin(dlat/2) + math.cos(math.radians(lat1)) \
        * math.cos(math.radians(lat2)) * math.sin(dlon/2) * math.sin(dlon/2)
    c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))
    d = radius * c

    return d

def havers2(origin, destination):
    lat1, lon1 = origin
    lat2, lon2 = destination
    radius = 6371 # km - this copied funciton has the same radius suggested by Agile!

    dlat = math.radians(lat2-lat1)
    dlon = math.radians(lon2-lon1)
    a = math.sin(dlat/2) * math.sin(dlat/2) + math.cos(math.radians(lat1)) \
        * math.cos(math.radians(lat2)) * math.sin(dlon/2) * math.sin(dlon/2)
    c = 2 * radius * math.asin(math.sqrt(a))

    return c

First, let's sort the DataFrame by Magnitude size and then find the distance between the first two entries:

In [19]:
big_locs = eqs.sort_values('Magnitude', ascending=False, axis=0)[['Latitude', 'Longitude']].iloc[:2].values

big_dist = round(havers(big_locs[0], big_locs[1]))

print('The distance between the two largest earthquakes is', big_dist,'km.')

The distance between the two largest earthquakes is 1023 km.


Submit answer 3:

In [20]:
params = {'key': my_key,   # <--- must be the same key as before
          'question': 3,   # <--- which question you're answering
          'answer': big_dist,  # <--- your answer to that question
         }

r = requests.get(url, params)

r.text

'Correct'

## Question 4

_Consider all pairs of events. How many pairs are within 100 km of each other?_

We'll create an array to hold the distance values between each earthquake, and then populate it with the distance values.

In [21]:
# Create an empty array, then fill it with Nans
eq_dist = np.ones((answer1,answer1))
eq_dist = eq_dist * 9999

# Now calculate the distances
for i in range(answer1):
    for j in range(i):
        if j < i: # we'll only populate the bottom half of the array
            eq_dist[i,j] = havers(eqs[['Latitude', 'Longitude']].iloc[i].values,
                             eqs[['Latitude', 'Longitude']].iloc[j].values)

Now count how many distances are less than 100 km:

In [29]:
answer4 = len(np.where(eq_dist < 99.6)[0])

print(answer4, 'earthquakes occurred within 100 km of each other.')

633 earthquakes occurred within 100 km of each other.


Submit answer 4:

In [97]:
params = {'key': my_key,   # <--- must be the same key as before
          'question': 4,   # <--- which question you're answering
          'answer': answer4,  # <--- your answer to that question
         }

r = requests.get(url, params)

r.text

'Correct! The next challenge is: https://kata.geosci.ai/challenge/fossil-hunting - good luck!'

In [35]:
np.where(eq_dist.round(1) == 99.7)

(array([89]), array([80]))

In [31]:
eq_dist.round(1)

array([[9.9990e+03, 9.9990e+03, 9.9990e+03, ..., 9.9990e+03, 9.9990e+03,
        9.9990e+03],
       [8.0000e-01, 9.9990e+03, 9.9990e+03, ..., 9.9990e+03, 9.9990e+03,
        9.9990e+03],
       [1.2000e+00, 5.0000e-01, 9.9990e+03, ..., 9.9990e+03, 9.9990e+03,
        9.9990e+03],
       ...,
       [3.0890e+02, 3.0830e+02, 3.0780e+02, ..., 9.9990e+03, 9.9990e+03,
        9.9990e+03],
       [3.0060e+02, 2.9990e+02, 2.9940e+02, ..., 7.3000e+01, 9.9990e+03,
        9.9990e+03],
       [7.3692e+03, 7.3688e+03, 7.3683e+03, ..., 7.0694e+03, 7.1031e+03,
        9.9990e+03]])

In [38]:
eqs.iloc[[80,89]][['Latitude','Longitude']].values

array([[  33.47 , -116.589],
       [  34.297, -117.005]])

In [41]:
weird_locs = eqs.iloc[[80,89]][['Latitude','Longitude']].values

havers2(weird_locs[0], weird_locs[1])

99.65397699412607