In [1]:
import requests
from IPython.display import Markdown

import re
import itertools
import numpy as np

**Warning!** This is a soultion. If you are looking to do these 
           [Agile Geosciences](https://agilescientific.com/blog/2020/4/16/geoscientist-challenge-thyself) 
           challenges on your own then please visit this
           [Jupyter Notebook](https://colab.research.google.com/drive/1eP68NTV-GA3R-BYUh-CUxcgYDQ5IuetS)
           to get started.


## Functions for URL requests
First a few functions to use along the way...

In [2]:
def get_data(url, key):
    params = {'key':my_key}
    r = requests.get(url, params)
    return r.text

def get_question(url):
    r = requests.get(url)
    return r.text

def check_answer(questionNum,answer):
    params = {'key':my_key,
              'question':questionNum,
              'answer':answer
             }
    result = requests.get(url, params)
    return result.text

## Request Challenge Description

In [3]:
url = 'https://kata.geosci.ai/challenge/boreholes' 
r = get_question(url)

Markdown(r)

# Boreholes

You have a list of boreholes. Each one has an (x, y) location. The locations are given as a Python string, and look like this:

    ..., (12.1, 34.3), (56.5, 78.7), (90.9, 12.1),...
    
Your data, when you receive it, will be longer than this.
    
We're going to analyse these locations. We need the answers to the following questions:

1. How many boreholes are there? We'll call this number _n_.
2. What's the distance, **to the nearest metre** between the first two boreholes in the list?
3. What is the mean straight-line distance between all pairs of boreholes **to the nearest metre**? Call this _m_.
4. There is a clump of boreholes. How many boreholes are in the clump? (A borehole is defined to be in a clump if the mean distance to its nearest _n_ / 5 neighbours is _m_ / 4 or less.)

Please note that all your answers must be integers. If you get a float for an answer, round it.


## Example

Here are the locations of some boreholes:

      (1, 4), (5, 4), (9, 3), (2, 8), (6, 4), (9, 9), (5, 5), (4, 3), (4, 5), (2, 1)
      
If we plot them, they look like this:

    y
    ^
    9 - - - - - - - - - 0
    8 - - 0 - - - - - - -
    7 - - - - - - - - - -
    6 - - - - - - - - - -
    5 - - - - 0 0 - - - -
    4 - 0 - - - 0 0 - - -
    3 - - - - 0 - - - - 0
    2 - - - - - - - - - -
    1 - - 0 - - - - - - -
    0 - - - - - - - - - -
      0 1 2 3 4 5 6 7 8 9 > x
    
Here's how we'd answer the questions for this small dataset:

- In this example, there are **10** wells (marked `0` on the plot above).
- The distance between the first two boreholes in the list, (1, 4) and (5, 4), is **4**.
- The mean distance between boreholes is 4.58... which to the nearest metre is **5**.
- There are **4** wells in the clump. See below.

Wells in the clump are marked `X` here (the borehole marked `O` does not meet the criterion):

    y
    ^
    9 - - - - - - - - - 0
    8 - - 0 - - - - - - -
    7 - - - - - - - - - -
    6 - - - - - - - - - -
    5 - - - - X X - - - -
    4 - 0 - - - X X - - -
    3 - - - - O - - - - 0
    2 - - - - - - - - - -
    1 - - 0 - - - - - - -
    0 - - - - - - - - - -
      0 1 2 3 4 5 6 7 8 9 > x


## A quick reminder how this works

You can retrieve your data by choosing any Python string as a **`<KEY>`** and substituting here:
    
    https://kata.geosci.ai/challenge/boreholes?key=<KEY>
                                                   ^^^^^
                                                   use your own string here

To answer question 1, make a request like:

    https://kata.geosci.ai/challenge/boreholes?key=<KEY>&question=1&answer=1234
                                                   ^^^^^          ^        ^^^^
                                                   your key       Q        your answer

[Complete instructions at kata.geosci.ai](https://kata.geosci.ai/challenge)

----

© 2020 Agile Scientific, licensed CC-BY

## My solution

Let's enter a seed phrase and get the data.

In [4]:
my_key = 'armstrys'

## Input
r = get_data(url, my_key)

r[:1000]

'(2269.98, 8363.1), (2849.86, 11984.2), (3755.78, 1237.35), (4519.45, 2194.88), (21136.84, 4414.17), (4036.98, 22688.91), (7320.03, 19218.4), (5886.19, 2726.28), (5170.13, 5276.42), (13195.08, 4954.25), (18490.19, 12699.52), (4442.62, 5062.02), (24969.76, 22470.84), (19755.23, 11811.29), (5854.66, 4121.88), (12132.53, 3167.46), (5745.32, 4277.12), (17631.58, 15221.84), (18708.8, 3330.9), (7065.91, 5432.68), (15431.77, 772.07), (3905.25, 3751.2), (3282.77, 1435.17), (6522.95, 3834.09), (24496.22, 246.77), (18258.67, 1412.23), (2233.48, 12986.24), (11693.8, 12318.72), (9507.82, 3981.35), (13953.97, 18100.89), (4599.58, 1144.23), (16913.75, 17182.38), (5577.41, 19368.72), (24923.87, 16183.78), (2929.49, 5521.66), (7728.52, 3436.43), (1945.08, 12396.35), (8369.58, 9317.44), (276.54, 1123.47), (23045.73, 17061.15), (2352.93, 24881.91), (5586.28, 7208.26), (4703.2, 22305.87), (18572.32, 12190.91), (13510.96, 21322.06), (16503.18, 554.36), (22402.0, 15547.56), (798.99, 4841.3), (21417.57, 108

## Formatting the string of data
Our text is very conveniently written in python-friendly syntax. We can directly evaluate the text using `eval()` to form couples and then cast to a numpy array.

In [6]:
## read to array
boreholes = np.array(eval(r))

boreholes[:5,:]

array([[ 2269.98,  8363.1 ],
       [ 2849.86, 11984.2 ],
       [ 3755.78,  1237.35],
       [ 4519.45,  2194.88],
       [21136.84,  4414.17]])

## Question 1
How many boreholes do we have? Let's take the length.

In [9]:
answer1 = len(boreholes)
print(f'There are {answer1} boreholes.\n')

## Check
questionNum = 1
result = check_answer(questionNum,answer1)
print(f'Your answer is {result.lower()}!')

There are 555 boreholes.

Your answer is correct!


## Question 2
To get the distance between the first two boreholes we will create a simple function to calculate the distance given our boreholes list and two IDs.

In [13]:
def distance(boreholes,bID1,bID2):
    borehole1 = boreholes[bID1] 
    borehole2 = boreholes[bID2]
    d = np.sqrt(np.sum((borehole1 - borehole2)**2))
    return d

answer2 = int(round(distance(boreholes,0,1)))
print(f'The distance between the first two boreholes is ~{answer2} m.\n')

## Check
questionNum = 2
result = check_answer(questionNum,answer2)
print(f'Your answer is {result.lower()}!')

The distance between the first two boreholes is ~3667 m.

Your answer is correct!


## Question 3
Glad we created the function above! We will create a new function to loop through all combinations of borehole pairs and calculate a mean distance.

In [16]:

def mean_distance(boreholes):
    '''
    Calculates mean distance between boreholes.

    Args:
        boreholes (array): the coordinate array of the boreholes - our data.
    
    Returns:
        meanDist (float): mean distance of all well coordinates listed in boreholes.

    '''
    distances = []
    comb = itertools.combinations(list(range(len(boreholes))),2)
    comb, comb_count = itertools.tee(comb)
    numComb = sum(1 for ignore in comb_count)

    for _, pair in enumerate(comb):
        x, y = pair
        distances.append(distance(boreholes,x,y))

    meanDist = np.mean(distances)

    return meanDist

answer3 = int(round(mean_distance(boreholes)))
print(f'The mean distance between boreholes is {answer3} m.\n')

## Check
questionNum = 3
result = check_answer(questionNum,answer3)
print(f'Your answer is {result.lower()}!')

The mean distance between boreholes is 12854 m.

Your answer is correct!


## Question 4
Time to find some clumps of boreholes! The method below is likely not the most efficient, but it will get the job done. The function loops through all possible combination of boreholes i and j. We then check the closest boreholes to see how many of them are close enough and whether the well is considered to be in a clump.

In [17]:
def flag_clump(boreholes, number, distance_thresh):
    '''
    Determine whether each well is in a clump as defined by a distance threshold
    for a certain number of neighboring wells to be within.

    Args:
        boreholes (array): the coordinate array of the boreholes - our data.
        number (int): number of wells that need to be within x distance to
            be considered a neighbor.
        distance_thresh (float): distance cutoff to determine number of
            neighboring wells to count.
    
    Returns:
        clump (array): a flag array with the same dimensions as the first
            dimension of the boreholes array. Flag is True for well being in
            clump and false if well is not in a clump.
    '''
    clump = np.zeros(len(boreholes))
    for i in range(len(boreholes)):
        dx = np.zeros(len(boreholes))
        for j in range(len(boreholes)):
            if i==j:  # comparison to itself
                dx[j]=np.nan
            else:
                dx[j] = distance(boreholes,i,j)
        dx.sort()
        clump[i] = (np.mean(dx[0:round(number)]) < distance_thresh)
    return clump

answer4 = int(sum(flag_clump(boreholes,answer1/5,answer3/4)))

print(f'There are {answer4} wells that are considered in a clump.\n')

## Check
questionNum = 4
result = check_answer(questionNum,answer4)
print(f'Your answer is {result.lower()}!')

There are 142 wells that are considered in a clump.

Your answer is correct! the next challenge is: https://kata.geosci.ai/challenge/sample-names - good luck!!
