In [1]:
import requests
from IPython.display import Markdown

import re
import itertools

**Warning!** This is a soultion. If you are looking to do these 
           [Agile Geosciences](https://agilescientific.com/blog/2020/4/16/geoscientist-challenge-thyself) 
           challenges on your own then please visit this
           [Jupyter Notebook](https://colab.research.google.com/drive/1eP68NTV-GA3R-BYUh-CUxcgYDQ5IuetS)
           to get started.


## Functions for URL requests
First a few functions to use along the way...

In [2]:
def get_data(url, key):
    params = {'key':my_key}
    r = requests.get(url, params)
    return r.text

def get_question(url):
    r = requests.get(url)
    return r.text

def check_answer(questionNum,answer):
    params = {'key':my_key,
              'question':questionNum,
              'answer':answer
             }
    result = requests.get(url, params)
    return Markdown(result.text)

## Request Challenge Description

In [3]:
url = 'https://kata.geosci.ai/challenge/sequence' 
r = get_question(url)

Markdown(r)

# Sequence

You have a string of lithology codes, reading from the **bottom up** of a geological section. There is a sample every metre. There are three lithologies:

- **M**udstone
- **F**ine sandstone or siltstone
- **S**andstone

The strings look like this:

      ...MFFSSFSSSS...

Your data, when you receive it, will be much longer than this.

We need to get some geological information from this string of codes. Specifically, you need to answer 3 questions:

1. What is the total thickess in metres of sandstone (`S`)? Each sample represents one metre.
2. How many sandstone beds are there? A bed is a contiguous group of one lithology, so `MMFFF` is 2 beds, one of `M` and one of `F`.
3. How many times does the most common *upwards* bed transition occur? Do not include transitions from a lithology to itself.

Remember that the sequence is given to you from the bottom up. So an upwards transition is equivalent to a transition to the right.


## Example

Here is some example input:

      SSMMFFFFFFFFSSMFFSSFSSSSFMFSSSSFFSSFFFMM
      ^^          ^^   ^^ ^^^^   ^^^^  ^^

And the answers to the 3 questions:

- In this example, the total thickess of sandstone is 16 m. So the required answer is: **16**
- There are 6 sandstone beds in the sequence (marked above). The answer is: **6**
- The most common bed transition is `F` to `S`, which occurs 5 times. So the answer is: **5**


## A quick reminder how this works

You can retrieve your data by choosing any Python string as a **`<KEY>`** and substituting here:
    
    https://kata.geosci.ai/challenge/sequence?key=<KEY>
                                                  ^^^^^
                                                  use your own string here

To answer question 1, make a request like:

    https://kata.geosci.ai/challenge/sequence?key=<KEY>&question=1&answer=1234
                                                  ^^^^^          ^        ^^^^
                                                  your key       Q        your answer

[Complete instructions at kata.geosci.ai](https://kata.geosci.ai/challenge)

----

© 2020 Agile Scientific, licensed CC-BY

## My solution

Let's enter a seed phrase and get the data.

In [4]:
my_key = 'armstrys'

## Input
r = get_data(url, my_key)

r[:400]

'FFSMMFFMMMMMMFFFMFFFFMMFFFFMMFSFFSMFFMFFFSSSFFFSMMMMFMFFFFFFSSFFFFFFSMMMMMMMFFFMMFSSMFSMFSFMMFFSFFSSFFMMMFFSMMMFSSFFFFFSSMMMMMMMFFFFFFFSMFFSSMMFFSMMMMMMMFMMSSSMMFFFSSSSSSSSSSSFFFMFFMFFSSFSSMFSSMMSSSSSFFFFFFSSMSSMMMSFFMMMMMFFFMMMMFSMFSFSSFFFFMMFFMFFFFSFFFSSSFFSSMSSSFFFMSMMMMMFFSMMMMFMFSMFFSSFMMFFFFFFMFFSSSSSMMFFFMFSSMMSMFMMMMMMFFFMMMFMFFFFSSSSMFFMMMMMFFFSSSSSFSSSMMFFMMMMMFSMMFFFFSFMMMFFSSSFFFFFSMMM'

We should be able to do most of what we need directly with the string. No real processing needed yet.

In [5]:
sequence = r

## Question 1
We want to get the total thickess of the sandstone facies 'S'. Luckily, there is already a function for that. We just need to count the number of occurences and multiply by the sample rate, 1 meter. Use the select box to see the other facies thicknesses.

In [9]:
## Facies to count thickness of
lith_thick = 'S'

## calculate thickness
answer1 = sequence.count(lith_thick) #counting occurences in str

Markdown(f'Total thickness of facies {lith_thick} is **{answer1}** m.\n')


Total thickness of facies S is **5712** m.


In [11]:
## Check
questionNum = 1
check_answer(questionNum,answer1)

Correct

## Question 2
We want to count the total number of beds of a given facies, in this case, sandstone, 'S'. We can do this by using a regex pattern. Try playing around with the slider and select box to see how counts change.

In [12]:
thresh = 1
lith_count = 'S'

## find repeating occurences of lith and count
regex_pattern = lith_count+'{'+str(thresh)+',}'
answer2 = re.subn(regex_pattern,'',sequence)[1]

print(f'There are {answer2} beds of facies {lith_count} with a sample threshold of {thresh}.\n')

## Check
questionNum = 2
result = check_answer(questionNum,answer2)
print(f'Your answer is {result.lower()}!')

There are **2290** beds of facies S with a sample threshold of 1.


In [14]:
## Check
questionNum = 2
check_answer(questionNum,answer2)


Correct

## Question 3
To calculate the most common facies transition, we will just create a dictionary to count all facies transitions and then take the maximum. We can iterate over all permutations of the facies list that we created earlier and use regex to flag all occurences of that facies change. For example, a sand to mud (`S` -> `M`) transition will find all occurences of `SM` in our sequence.

In [24]:
facies = list(set(sequence))

transition_dict = {}
for transition in itertools.permutations(facies,2):
    regex_pattern = ''.join(transition)
    transition_dict[regex_pattern] = re.subn(regex_pattern,'',sequence)[1]

answer3 = max(transition_dict.values())
answer3ID = max(transition_dict.keys(), key=(lambda k: transition_dict[k]))

print('Lithology transitions:',transition_dict,'\n')
Markdown(f'The most common transision is {answer3ID} and occurs **{answer3}** times!\n')


Lithology transitions: {'SF': 979, 'SM': 1311, 'FS': 1910, 'FM': 1347, 'MS': 380, 'MF': 2277} 



The most common transision is MF and occurs **2277** times!


In [26]:
## Check
questionNum = 3
check_answer(questionNum,answer3)

Correct! The next challenge is: https://kata.geosci.ai/challenge/boreholes - good luck!