# Info from the web

**This notebook goes with [a blog post at Agile*](http://ageo.co/xlines02).**

We're going to get some info from a web service, and from Wikipedia. We'll make good use of [the `requests` library](http://docs.python-requests.org/en/master/), a really nicely designed Python library for making web requests in Python.

## Simple http server

Before we do anything else, let's have a look at running a very simple server right on your computer.

Let's say you want to share a file with someone on your local network.

You can use Python's [`http.server`](https://docs.python.org/3/library/http.server.html) package for this. 

### 1. Open a terminal

Then change directory to the location from which you want to share files.

### 2. Find your IP address

Your local IP address is given by `ipconfig` on Windows, or `ifconfig` on Mac or Linux (you might need to install something first).

### 3. Start server

Type:

    python -m http.server
    
(Note the use of the `-m` switch to tell Python to run a module in its path directly.)

### 4. Visit the IP address

Visit the IP address on another device on the same network. Don't forget to add the port, usually `8000`, to the end of the IP address.

**You can see the files on the host computer!**

## Using the curvenam.es web API

[`curvenam.es`](http://curvenam.es) is a little web app for looking up curve mnemonics from LAS files.

Here's what [the demo request from the site](http://curvenam.es/lookup) looks like:

    http://curvenam.es/lookup?mnemonic=TEST&method=fuzzy&limit=5&maxdist=2
    
We split this into the URL, and the query parameters:

----
## A note about proxies

If you are in a corporate environment, you probably connect to the Internet through another computer called a 'proxy'. You will need the URL of this proxy; it might look like `https://proxy.my-company.net:8080`. Then use it in your Python environment like this:

    proxies  = {'https': 'https://proxy.my-company.net:8080'}
    r = requests.get(url, proxies=proxies)
    
Each time you use `requests.get()` you will need to pass the `proxies` dictionary in this way.

----

In [1]:
import requests

In [2]:
url = 'http://curvenam.es/lookup'

In [3]:
params = {'mnemonic': 'DT4P',
          'method': 'fuzzy',
          'limit': 1,
          'maxdist': 2
         }

In [4]:
r = requests.get(url, params)

If we were successful, the server sends back a `200` status code:

In [5]:
r.status_code

200

In [5]:
r.headers

{'Cache-Control': 'no-cache', 'Content-Type': 'application/vnd.api+json', 'X-Cloud-Trace-Context': '1271d066be86ef3e055ee404174e97ac;o=1', 'Date': 'Thu, 15 Nov 2018 14:16:31 GMT', 'Server': 'Google Frontend', 'Content-Length': '359'}

The result of the query is in the `text` attribute of the result:

In [6]:
r.text

'{"mnemonic": "DT4P", "maxdist": "2", "limit": "1", "result": [{"mnemonic": "DT4P", "distance": 0, "curve": {"mnemonic": "DT4P", "model": "", "unittype": "AcousticSlowness", "description": "Delta-T Compressional - Monopole P&S", "units": "", "company": "Schlumberger", "type": "Curve", "method": "Wireline"}}], "time": 0.0072801113128662109, "method": "fuzzy"}'

There's a convenient `json()` method to give us the result as JSON:

In [7]:
r.json()

{'limit': '1',
 'maxdist': '2',
 'method': 'fuzzy',
 'mnemonic': 'DT4P',
 'result': [{'curve': {'company': 'Schlumberger',
    'description': 'Delta-T Compressional - Monopole P&S',
    'method': 'Wireline',
    'mnemonic': 'DT4P',
    'model': '',
    'type': 'Curve',
    'units': '',
    'unittype': 'AcousticSlowness'},
   'distance': 0,
   'mnemonic': 'DT4P'}],
 'time': 0.007280111312866211}

In [8]:
try:
    print(r.json()['result'][0]['curve']['description'])
except:
    print("No results")

Delta-T Compressional - Monopole P&S


----

## Scraping geological ages from Wikipedia

Sometimes there isn't a nice API and we have to get what we want from unstructured data. Let's use the task of getting geological ages from Wikipedia as an example.

We'll start with the Jurassic, then generalize.

In [9]:
url = "http://en.wikipedia.org/wiki/Jurassic"

I used `View Source` in my browser to figure out where the age range is on the page, and what it looks like. The most predictable spot, that will work on every period's page, is in the infobox. It's given as a range, in italic text, with "million years ago" right after it.

Try to find the same string here.

In [10]:
r = requests.get(url)

Now we have the entire text of the webpage, along with some metadata. The text is stored in `r.text`, and I happen to know that the piece of text we need contains "million years ago":

In [14]:
r.text.find('million years ago')

8856

In [15]:
r.text[8800:8900]

'87,231)"><b>Jurassic Period</b><br />\n<i>201.3–145&#160;million years ago</i><br />\n<div id="Timelin'

We can get at that bit of text using a [regular expression](https://docs.python.org/2/library/re.html):

In [16]:
import re

s = re.search(r'<i>(.+?million years ago)</i>', r.text)
text = s.group(1)
text

'201.3–145&#160;million years ago'

And if we're really cunning, we can get the start and end ages:

In [17]:
start, end = re.search(r'<i>([\.0-9]+)–([\.0-9]+)&#160;million years ago</i>', r.text).groups()
duration = float(start) - float(end)

print("According to Wikipedia, the Jurassic lasted {:.2f} Ma.".format(duration))

According to Wikipedia, the Jurassic lasted 56.30 Ma.


### Exercise

- Make a function to get the start and end ages of *any* geologic period, taking the name of the period as an argument.

In [19]:
def get_age(period):
    url =  "http://en.wikipedia.org/wiki/" + period
    r = requests.get(url)
    start, end = re.search(r'<i>([\.0-9]+)–([\.0-9]+)&#160;million years ago</i>', r.text).groups()
    return float(start), float(end)

You should be able to call your function like this:

In [24]:
get_age('Cretaceous')

(145.0, 66.0)

### Exercise

- Make a function that returns the sentence we made before, with the duration, calling the function you just wrote:

In [22]:
def duration(period):
    t0, t1 = get_age(period)
    duration = t0 - t1
    response = "According to Wikipedia, the {0} lasted {1:.2f} Ma.".format(period, duration)
    return response

In [23]:
duration('Cretaceous')

'According to Wikipedia, the Cretaceous lasted 79.00 Ma.'

## Downloading an image file

In [None]:
import requests

image = "https://virtualexplorer.com.au/article/2002/53/evolution-of-the-western-mediterranean/media/figure07.png"

r = requests.get(image)

r.content

In [None]:
import io

f = io.BytesIO(r.content)

In [None]:
import matplotlib.image as mpimg

img = mpimg.imread(f)

In [None]:
import matplotlib.pyplot as plt

plt.imshow(img)

## Get predictions from an ML model

In [None]:
import requests

url = "https://geofignet.geosci.ai/api"
params = {'image': image}
headers = {"Accept": "application/json"}

response = requests.get(url, headers=headers, params=params)
response.json()

<hr />

<div>
<img src="https://avatars1.githubusercontent.com/u/1692321?s=50"><p style="text-align:center">© Agile Geoscience 2016</p>
</div>