In [1]:
import requests
from IPython.display import Markdown

import io
import itertools
import numpy as np
import pandas as pd

**Warning!** This is a soultion. If you are looking to do these 
           [Agile Geosciences](https://agilescientific.com/blog/2020/4/16/geoscientist-challenge-thyself) 
           challenges on your own then please visit this
           [Jupyter Notebook](https://colab.research.google.com/drive/1eP68NTV-GA3R-BYUh-CUxcgYDQ5IuetS)
           to get started.

In [2]:
def get_data(url, key):
    params = {'key':my_key}
    r = requests.get(url, params)
    return r.text

def get_question(url):
    r = requests.get(url)
    return r.text

def check_answer(questionNum,answer):
    params = {'key':my_key,
              'question':questionNum,
              'answer':answer
             }
    result = requests.get(url, params)
    return result.text

## Request Challenge Description

In [3]:
url = 'https://kata.geosci.ai/challenge/birthquakes' 
r = get_question(url)

Markdown(r)

# Birthquakes

We are going to look at earthquakes, on your birthdate. Birthquakes!

We will also be implementing the haversine formula for determining the distance between two ponts on the earth's surface.

This challenge is a bit different from the previous ones. You can use any old string for your key, as usual, but if you use a date, you'll get data for that date. For example:

      url = 'https://kata.geosci.ai/challenge/birthquakes'
      params = {'key': '1980-06-30'}  # <-- The key can be a date.
      r = requests.get(url, params)

Your challenge input is now `r.text`. There is a header row containing the names of the columns, plus a number of data rows or 'records'. Each row has 13 columns, and represents the data for a single earthquake.

You need to answer the following questions:

1. How many records (i.e. earthquakes) are there?
2. What is the depth **in metres** of the earthquake with the largest **Magnitude**? (If there's more than one, give the deepest.)
3. What is the great circle distance **to the nearest km**, as given by the haversine formula, between the epicentres of the two **largest** earthquakes, as measured by magnitude?
4. Consider all pairs of events. How many pairs are less than 100 km of each other? (Exactly 100 km would **not** be included.)

Note that because we're asking about epicentres, so you don't need to worry about depth when calculating great circle distances.

For Question 4, only count unique pairs. For example, inthe diagram below there are 15 pairs of points altogether, of which there are 7 pairs with a mutual distance of < 100 km here &mdash; 1 pair on the left and 6 on the right:

      
      x                  x
                            x
         x              x  x
            ==========
              100 km


## Haversine formula

There are several formulas for computing [great circle distance](https://en.wikipedia.org/wiki/Great-circle_distance) on a sphere. The simplest accurate one is the haversine formula, which is described here.

Given two points with (_latitude_, _longitude_), we'll denote point 1 with $(\varphi_1, \lambda_1)$ and point 2 with $(\varphi_2, \lambda_2)$. Then distance _d_ is related to radius _r_ by:

$$   d  = 2r \arcsin\left(\sqrt{\sin^2\left(\frac{\varphi_2 - \varphi_1}{2}\right) + \cos(\varphi_1) \cos(\varphi_2)\sin^2\left(\frac{\lambda_2 - \lambda_1}{2}\right)}\right)$$

Some hints about implementing this in Python:

- Use $r = 6371\ \mathrm{km}$ for the radius of the earth.
- $\sin^2(x)$ means $\sin(x) \times \sin(x)$.
- Both the `math` module and NumPy have the functions `sin()`, `cos()`; these functions expect radians, so an angle in degrees must be converted to radians with `radians()` before giving it to the function.
- The arcsine function in `math` is called `asin()`; in NumPy it's `arcsin()`.
- The function should return distances **to the nearest km**.
- You should get the following results from your function:
  - The distance from (0, 0) to (0, 1) is 111 km.
  - The distance from (0, 2.35) to (90, 2.35) is 10008 km. [(Why?)](https://en.wikipedia.org/wiki/History_of_the_metre)
  - The distance from (44.65, -63.58) to (53.73, -1.86) is 4448 km.


## A quick reminder how this works

You can retrieve your data by choosing any date (or any old Python string to choose a random date) as a **`<KEY>`** and substituting here:
    
    https://kata.geosci.ai/challenge/birthquakes?key=<KEY>
                                                     ^^^^^
                                                     you can use a date, e.g. 2001-01-01

To answer question 1, make a request like:

    https://kata.geosci.ai/challenge/birthquakes?key=<KEY>&question=1&answer=123
                                                     ^^^^^          ^        ^^^
                                                     your key       Q        your answer

[Complete instructions at kata.geosci.ai](https://kata.geosci.ai/challenge)

----

© 2020 Agile Scientific, licensed CC-BY

## My solution

For this challenge, the data will come from a date if entered in the same format as the date below. Feel free to change it to check the consistency of answers!

In [4]:
my_key = '1990-08-27'

## Input
r = get_data(url, my_key)

r[:1000]

'#EventID|Time|Latitude|Longitude|Depth/km|Author|Catalog|Contributor|ContributorID|MagType|Magnitude|MagAuthor|EventLocationName\nusp0004dzs|1990-08-27T23:59:48.120|59.71|-152.799|88.7|ags|us|us|usp0004dzs|||us|Southern Alaska\nuu50052925|1990-08-27T23:58:29.860|41.9908333|-111.9688333|0.23|uu|uu|uu|uu50052925|md|1.34|uu|Utah\nuu50052920|1990-08-27T23:56:41.670|41.993|-111.974|2.8|uu|uu|uu|uu50052920|md|1.47|uu|Utah\nak990aznfv39|1990-08-27T23:51:08.210|63.1598|-150.7097|129.9|ak|ak|ak|ak990aznfv39|ml|3|ak|66 km SE of Denali National Park, Alaska\nuu50052915|1990-08-27T23:45:28.950|41.991|-111.974|0.8|uu|uu|uu|uu50052915|md|1.19|uu|Utah\nnc1179798|1990-08-27T23:30:42.180|37.6746667|-118.8746667|3.006|nc|nc|nc|nc1179798|md|0.85|nc|Long Valley area, California\nci2003577|1990-08-27T23:30:27.480|33.236|-117.173|-0.781|ci|ci|ci|ci2003577|||ci|8km ENE of Vista, CA\nusp0004dzq|1990-08-27T23:27:34.350|-31.905|-72.189|33|us|us|us|usp0004dzq|||us|offshore Coquimbo, Chile\nusp0004dzp|1990-08-27

In [5]:
## read text to pandas
data = pd.read_csv(io.StringIO(r),sep='|') #load text to pandas

## Strip comment character out of headers
cols = data.columns
data.columns = [c.strip('#') for c in cols] # strip comment characters out of columns
data.head()

Unnamed: 0,EventID,Time,Latitude,Longitude,Depth/km,Author,Catalog,Contributor,ContributorID,MagType,Magnitude,MagAuthor,EventLocationName
0,usp0004dzs,1990-08-27T23:59:48.120,59.71,-152.799,88.7,ags,us,us,usp0004dzs,,,us,Southern Alaska
1,uu50052925,1990-08-27T23:58:29.860,41.990833,-111.968833,0.23,uu,uu,uu,uu50052925,md,1.34,uu,Utah
2,uu50052920,1990-08-27T23:56:41.670,41.993,-111.974,2.8,uu,uu,uu,uu50052920,md,1.47,uu,Utah
3,ak990aznfv39,1990-08-27T23:51:08.210,63.1598,-150.7097,129.9,ak,ak,ak,ak990aznfv39,ml,3.0,ak,"66 km SE of Denali National Park, Alaska"
4,uu50052915,1990-08-27T23:45:28.950,41.991,-111.974,0.8,uu,uu,uu,uu50052915,md,1.19,uu,Utah


## Question 1
How many records do we have?

In [10]:
answer1 = len(data)
print(f'There are {answer1} samples.\n')

## Check
questionNum = 1
result = check_answer(questionNum,answer1)
print(f'Your answer is {result.lower()}!')

There are 171 samples.

Your answer is correct!


## Question 2
What is the depth of the largest earthquake?

In [11]:
## Sort by magnitude and depth to make sure we match the right criteria
data.sort_values(by=['Magnitude','Depth/km'], ascending=False, inplace=True)

## Pull the top record and get the depth in meters.
depth = data.iloc[0]['Depth/km']*1e3
answer2 = round(depth)
print(f'The depth of the largest earthquake was {answer2} m.\n')

## Check
questionNum = 2
result = check_answer(questionNum,answer2)
print(f'Your answer is {result.lower()}!')

The depth of the largest earthquake was 506400 m.

Your answer is correct!


## Question 3
Now we will define the Haversin formula. We then use this formula to calculate the distance between the two largest earthquakes from the datafrane we sorted in question 2.

In [15]:
def haversine(loc1, loc2, r=6371.0):
    '''
    Takes lat/lons in degrees and gives the distance between two points in km.
    '''
    lat1, lon1 = np.radians(loc1.astype(float))
    lat2, lon2 = np.radians(loc2.astype(float))
    a = np.sin((lat2-lat1)/2)**2
    b = np.cos(lat1)*np.cos(lat2)*np.sin((lon2-lon1)/2)**2
    d = 2*r*np.arcsin(np.sqrt(a+b))
    return round(d)

loc1 = data.iloc[0][['Latitude','Longitude']].values
loc2 = data.iloc[1][['Latitude','Longitude']].values
answer3 = int(haversine(loc1, loc2))

print(f'The two largest earthquakes were {answer1} km apart.\n')

## Check
questionNum = 3
result = check_answer(questionNum,answer3)
print(f'Your answer is {result.lower()}!')

The two largest earthquakes were 171 km apart.

Your answer is correct!


## Question 4
Lastly, let's check how many pairs of earthquakes were within 100 km of each other on this day.

In [16]:
count = 0
comb = itertools.combinations(data.loc[:,['Latitude','Longitude']].values,2)

for loc1, loc2 in comb:
    if haversine(loc1,loc2) < 100:
        count += 1

answer4 = int(count)

print(f'{answer4} pairs of earthquakes were within 100 km of eachother.\n')

## Check
questionNum = 4
result = check_answer(questionNum,answer4)
print(f'Your answer is {result.lower()}!')

741 pairs of earthquakes were within 100 km of eachother.

Your answer is correct! the next challenge is: https://kata.geosci.ai/challenge/fossil-hunting - good luck!!
