# Web Scraping Mini Task

**Author:** Ties de Kok ([Personal Website](http://www.tiesdekok.com))  
**Last updated:** 15 May 2018  
**Python version:** Python 3.6  
**License:** MIT License  

## *Introduction*

In this notebook I will provide you with "tasks" that you can try to solve.  

Most of what you need is discussed in the tutorial notebooks, the rest you will have to Google (which is an important exercise in itself).

## *Relevant notebooks*

1) [`0_python_basics.ipynb`](https://nbviewer.jupyter.org/github/TiesdeKok/LearnPythonforResearch/blob/master/0_python_basics.ipynb)  


2) [`2_handling_data.ipynb`](https://nbviewer.jupyter.org/github/TiesdeKok/LearnPythonforResearch/blob/master/2_handling_data.ipynb)  


3) [`4_web_scraping.ipynb`](https://nbviewer.jupyter.org/github/TiesdeKok/LearnPythonforResearch/blob/master/4_web_scraping.ipynb)  

## Web Scraping Mini Task <br> -----------------------------------

The goal of this mini-task is to get hands-on experience with gathering data from the Web using `Requests` and `Requests-HTML`.

The tasks below are split up into three sections:  

1. API tasks  

2. Web scraping tasks  

3. *Extra challenge:* HTTP requests

## Import required packages

In [48]:
import requests
from requests_html import HTMLSession

## API Tasks <br> --------------

### Retrieve the current price of the "Dogecoin" cryptocurrency in Euros

You can use the cryptonator API: https://www.cryptonator.com/api

### Follow up: Create a function that retrieves the current price in Euros for a given cryptocurrency "ticker"

Make sure that it can handle invalid tickers and HTTP errors (*hint:* use `.status_code`)

### Write a function that takes an artist and song title and returns the lyrics

Use this API: http://docs.lyricsovh.apiary.io/#reference/0/lyrics-of-a-song/search

### Write a function that guesses the gender based on first name

Use this API: https://genderize.io/

**NOTE:** it might be that this API is down if you get a "too many requests message"

## Web Scraping Task <br> ----------------------------

Your goal is to create a dataset with details for all the University of Bristol faculty and staff. 

This page serves as the starting point: http://www.bristol.ac.uk/efm/people/allstaff.html

Recommendation: use `requests-html`

### Step 1: write a function that can extract information from a staff members profile page

For example: http://www.bristol.ac.uk/efm/people/mark-a-clatworthy/overview.html

Retrieve the following details:  

1. URL to profile picture  
2. Their departement (based on their departement link)
3. Latest publication

**Note:** Make sure it can handle people without publications / profile pictures / departement links

*Hint 1:* this might be relevant: https://www.w3schools.com/cssref/sel_attribute_value_contains.asp

### Step 2: retrieve a list of all faculty and staff members

Save the following details:  
1. Name  
2. Job title  
3. Email  
4. Phone number  
5. **Link to their page**

Recommendation: make sure to end up with a Pandas Dataframe so that you can save it easily to an Excel sheet!

*Hint 1:* this might be relevant: https://www.w3schools.com/cssref/sel_attribute_value_contains.asp

## Step 3: run the function from step 1 on all the urls gathered in step 2

**Note:** if it takes a long time to run you can also just run it on a small subset of the data from step 2.

#### Bonus task:

Add the details to the initial dataframe that you created in Step 2

## Extra Challenge Task <br> --------------------------------

The Bristol city council has a "Neighbourhood-search":

https://www.bristol.gov.uk/my-neighbourhood-search

Try if you can create a function that takes a string and returns the points of interest at that string.  

Hint: look for words like "api" or "rest" in the results of the `NetworkSniffer` Chrome extension. 