![UKDS Logo](./images/UKDS_Logos_Col_Grey_300dpi.png)

# Collecting Data II: APIs

Welcome to the <a href="https://ukdataservice.ac.uk/" target=_blank>UK Data Service</a> training series on *New Forms of Data for Social Science Research*. This series guides you through some of the most common and valuable new sources of data available for social science research: data collected from websites, social media platorms, text data, conducting simulations (agent based modelling), to name a few. To help you get to grips with these new forms of data, we provide webinars, interactive notebooks containing live programming code, reading lists and more.

* To access training materials for the entire series: <a href="https://github.com/UKDataServiceOpen/new-forms-of-data" target=_blank>[Training Materials]</a>

* To keep up to date with upcoming and past training events: <a href="https://ukdataservice.ac.uk/news-and-events/events" target=_blank>[Events]</a>

* To get in contact with feedback, ideas or to seek assistance: <a href="https://ukdataservice.ac.uk/help.aspx" target=_blank>[Help]</a>

<a href="https://www.research.manchester.ac.uk/portal/julia.kasmire.html" target=_blank>Dr Julia Kasmire</a> and <a href="https://www.research.manchester.ac.uk/portal/diarmuid.mcdonnell.html" target=_blank>Dr Diarmuid McDonnell</a> <br />
UK Data Service  <br />
University of Manchester <br />
May 2020

## Introduction

Computational methods for collecting, cleaning and analysing data are an increasingly important component of a social scientist’s toolkit. Central to engaging in these methods is the ability to write readable and effective code using a programming language.

In this training series we demonstrate core programming concepts and methods through the use of social science examples. In particular we focus on four areas of programming/computational social science:
1. Introduction to Python.
2. Collecting data I: web-scraping. 
3. Collecting data II: APIs. [Focus of this notebook]
4. Setting up your computational environment.

### Aims

This lesson - **Collecting data II: APIs** - has two aims:
1. Demonstrate how to use Python to download data from the web through an Application Programming Interface (API).
2. Cultivate your computational thinking skills through coding examples. In particular, how to define and solve a data collection problem using a computational method.

### Lesson details

* **Level**: Introductory
* **Time**: 30-60 minutes
* **Pre-requisites**: None, though you may find it useful to work through our <a href="https://github.com/UKDataServiceOpen/code-demos/blob/master/code/ukds-intro-to-python-2020-05-06.ipynb" target=_blank>*Introduction to Python for social scientists*</a> and <a href="https://github.com/UKDataServiceOpen/code-demos/blob/master/code/ukds-web-scraping-2020-05-13.ipynb" target=_blank>*Collecting data I: web-scraping*</a>  lessons first.
* **Audience**: Researchers and analysts from any disciplinary background. The materials are slightly tailored for social scientists through the use of social data.
* **Learning outcomes**:
    1. Understand what an Application Programming Interface (API) is.
    2. Understand the key steps and requirements for collecting data from the web through an API.
    3. Be able to use Python for requesting, processing and saving data accessed through an API.

## Guide to using this resource

This learning resource was built using <a href="https://jupyter.org/" target=_blank>Jupyter Notebook</a>, an open-source software application that allows you to mix code, results and narrative in a single document. As <a href="https://jupyter4edu.github.io/jupyter-edu-book/" target=_blank>Barba et al. (2019)</a> espouse:
> In a world where every subject matter can have a data-supported treatment, where computational devices are omnipresent and pervasive, the union of natural language and computation creates compelling communication and learning opportunities.

If you are familiar with Jupyter notebooks then skip ahead to the main content (*What is an API?*). Otherwise, the following is a quick guide to navigating and interacting with the notebook.

### Interaction

**You only need to execute the code that is contained in sections which are marked by `In []`.**

To execute a cell, click or double-click the cell and press the `Run` button on the top toolbar (you can also use the keyboard shortcut Shift + Enter).

Try it for yourself:

In [None]:
print("Enter your name and press enter:")
name = input()
print("\r")
print("Hello {}, enjoy learning more about Python and web-scraping!".format(name))

### Learn more

Jupyter notebooks provide rich, flexible features for conducting and documenting your data analysis workflow. To learn more about additional notebook features, we recommend working through some of the <a href="https://github.com/darribas/gds19/blob/master/content/labs/lab_00.ipynb" target=_blank>materials</a> provided by Dani Arribas-Bel at the University of Liverpool. 

## What is an API?

An Application Programming Interface (API) is
> a set of functions and procedures allowing the creation of applications that access the features or data of an operating system, application, or other service" (Oxford English Dictionary). 

In essence: an API acts as an intermediary between software applications. Think of an API's role as similar to that of a translator faciliating a conversation between two individuals who do not speak the same language. Neither individual needs to know the other's language, just how to formulate their response in a way the translator can understand. Similarly, an API **simplifies** how applications communicate with each other.

It performs this role by providing a set of protocols/standards for making *requests* and formulating *responses* between applications. For example, a smart phone application might need real-time traffic data from an online database. An API can validate the application's request for data, and handle the online database's response (i.e., the transfer of data to the application). In the absence of an API, the smart phone application would need to know a lot more technical information about the online database in order to communicate with it (e.g., what commands does the database understand?). But thanks to the API, the smart phone application only needs to know how to formulate a request that the API understands, which then communicates the request to the database and handles the response.

Run the code below for a graphical representation of how an API works.

In [None]:
from IPython.display import IFrame
IFrame("./images/ukds-apis-slides.mp4", width=900, height=600)

### Why would you want to use an API?

Many public, private and charitable institutions collect and share data of value to social scientists. Often they deposit their data to a data portal - e.g., <a href="https://data.gov.uk/" target=_blank>UK Government Open Data</a> -, allowing you to download the files as and when needed. However, another approach they can adopt is to allow access to the underlying information that is stored in their database through an API. Using this method, individuals can send a customised *request* for information to the database; if the request is valid, the database *responds* by providing you with the information you asked for. Think of using an API as the difference between downloading a raw data file which then needs to be filtered to arrive at the information you need, and performing the filtering when you request the data, so only what you need is returned (the API method).

### What is the general approach for accessing data through an API?

We begin by identifying an online database containing information of interest. Then we need to **know** the following:
1. The location of the API (i.e., web address) through which the database can be accessed. For example, the UK Police API can be accessed via <a href="https://data.police.uk/api" target=_blank>https://data.police.uk/api</a>.
2. The terms of use associated with the API. Many APIs restrict the number of requests you can make over a given time period, while others require registration in order to authenticate who is trying to access the data. For example, the UK Police API does not require you to provide authentication but restricts the number of requests for data you can make (15 per second) - the number of allowable requests is known as the *rate limit*.
3. The location of the data of interest on the API. For example, data on street-level crime from the UK Police API is available at: <a href="https://data.police.uk/api/crimes-street" target=_blank>https://data.police.uk/api/crimes-street</a>. The location of the data is known as its *endpoint*.

We can usually find all of the information we need by reading the API's documentation e.g., <a href="https://data.police.uk/docs/" target=_blank>https://data.police.uk/docs/</a>.

Then we need to **do** the following:
4. Register your use of the API (if required).
5. Request data from the endpoint of interest, supplying authentication if required. This process is known as *making a call* to the API.
6. Write this data to a file for future use.

For any programming task, it is useful to write out the steps needed to solve the problem: we call this *pseudo-code*, as it is captures the main tasks and the order in which they need to be executed.

## A social science example

Let's work through the steps in our general approach using a real API, one that provides data on policing activities and street-level crime in England and Wales.

###  Locating the API

The UK Police API can be accessed via the following web address or link: <a href="https://data.police.uk/api" target=_blank>https://data.police.uk/api</a>.

Note that you cannot request this web address through your browser; this is because this link acts as the *base* web address from which you can access the different data sets. For example, we can access a list of all the police forces whose data is available via the API using the following web address: <a href="https://data.police.uk/api/forces" target=_blank>https://data.police.uk/api/forces</a>.

**TASK**: Try it yourself: click on the above link to see what happens when you request data on police forces.

Before delving further into requesting data, let's understand the terms of use/restrictions associated with the UK Police API.

### API terms of use

The UK Police API is reasonably well documented (not always the case, unfortunately) and we can clearly identify what is required in order to interact with it. Firstly, the API does not require authentication: you do not need to register your use of the API, nor provide a password (known as an API key) whenever you request data.

Secondly, the API allows you to make up to 15 calls (requests) per second on average, though you can make up to 30 in a single second. If you are using the API for research purposes, it is highly unlikely you'll exceed this limit (but who knows what data requirements you have).

See <a href="https://data.police.uk/docs/api-call-limits/" target=_blank>https://data.police.uk/docs/api-call-limits/</a> for full information on the API's call limits.

### Locating data

The UK Police API allows access to over twenty endpoints (data sets), grouped under the following headings:
* *Forces* e.g., senior officers
* *Crime* e.g., crime categories
* *Neighbourhoods* e.g., boundaries, events
* *Stop and search* e.g., by area or force

See <a href="https://data.police.uk/docs/" target=_blank>https://data.police.uk/docs//</a> for a complete list of endpoints accessible through this API.

### Registering use of API

We can skip this step as the UK Police API does not require us to register or provide any form of authentication (a good example of *open data*).

### Requesting data

We're ready for the interesting bit: requesting data through the API. To focus our activities, we'll attempt to do the following:
1. Download a list of police forces in the UK.
2. For each force, download its stop-and-search data.
3. Save each data set to a file for future use.

Before we download data, we need to ensure Python has the functionality it needs to interact with the API.

In [None]:
# Import modules

import os # module for navigating your machine (e.g., file directories)
import requests # module for requesting urls
import json # module for working with JSON data structures
from datetime import datetime # module for working with dates and time
print("Succesfully imported necessary modules")

Modules are additional techniques or functions that are not present when you launch Python. Some do not even come with Python when you download it and must be installed on your machine separately - think of using `ssc install <package>` in Stata, or `install.packages(<package>)` in R. For now just understand that many useful modules need to be imported every time you start a new Python session.

In [None]:
# Define web address and search terms

baseurl = "https://data.police.uk/api/" # base web address
forces = "forces" # endpoint where forces data is located

webadd = baseurl + forces # construct web address to request
print(webadd)

# Make call to API

response = requests.get(webadd) # request the web address
response.status_code # check if API was requested successfully

Let's unpack the above code. First, we define a variable (also known as an 'object' in Python) called `baseurl` that contains the base web address of the UK Police API. Then we define a variable containing the endpoint we want to access data from (`forces`). Finally we concatenate these separate elements to form a valid web address that can be requested from the API (`webadd`).

The next step is to use the `get()` method of the `requests` module to request the web address, and in the same line of code, we store the results of the request in a variable called `response`. Finally, we check whether the request was successful by calling on the `status_code` attribute of the `response` variable.

We get a status code of *200*, which means the request was successful. A status code in the *400s* or *500s* represent an unsuccessful attempt at requesting a web address (see <a href="https://www.textbook.ds100.org/ch/07/web_http.html" target=_blank>Lau, Gonzalez and Nolan</a> for a succinct description of different types of response status codes).

You may be wondering exactly what it is we requested. To see the content of our request i.e., the data, we can call the `json()` method on the `response` variable:

In [30]:
forces_data = response.json()
forces_data

[{'id': 'avon-and-somerset', 'name': 'Avon and Somerset Constabulary'},
 {'id': 'bedfordshire', 'name': 'Bedfordshire Police'},
 {'id': 'cambridgeshire', 'name': 'Cambridgeshire Constabulary'},
 {'id': 'cheshire', 'name': 'Cheshire Constabulary'},
 {'id': 'city-of-london', 'name': 'City of London Police'},
 {'id': 'cleveland', 'name': 'Cleveland Police'},
 {'id': 'cumbria', 'name': 'Cumbria Constabulary'},
 {'id': 'derbyshire', 'name': 'Derbyshire Constabulary'},
 {'id': 'devon-and-cornwall', 'name': 'Devon & Cornwall Police'},
 {'id': 'dorset', 'name': 'Dorset Police'},
 {'id': 'durham', 'name': 'Durham Constabulary'},
 {'id': 'dyfed-powys', 'name': 'Dyfed-Powys Police'},
 {'id': 'essex', 'name': 'Essex Police'},
 {'id': 'gloucestershire', 'name': 'Gloucestershire Constabulary'},
 {'id': 'greater-manchester', 'name': 'Greater Manchester Police'},
 {'id': 'gwent', 'name': 'Gwent Police'},
 {'id': 'hampshire', 'name': 'Hampshire Constabulary'},
 {'id': 'hertfordshire', 'name': 'Hertford

And Voila, we have a list of police forces in the UK (excluding Scotland and the British Transport Police).

We hope you agree that requesting data from an API is a relatively simple task. The real challenge lies with the way the data are *structured* in response to your request. While sometimes you may be able to request data in a tabular format (e.g., a CSV or Excel file), most of the time it arrives looking a bit different than you may be familiar with. For instance, we currently have a list of police forces, and for each one there are two fields: `id` and `name`. 

Therefore we need to figure out how to navigate these results and extract information of interest. Thankfully Python provides some intuitive methods for performing this task.

#### Working with lists

A *list* is a data type in Python that contains ordered, mutable sequences of elements. Think of it as a variable that contains a certain type of value, just a like an *integer* variable can only contain whole numbers, or a *string* variable contains characters that should be treated as text. Knowing which data type you're working with is crucial as it determines the kind of operations you can perform on the variable:

In [31]:
my_number = 25
my_string = "Hello there!"

print(my_number + 50) # this will work
print(my_string + 50) # this will not

75


TypeError: can only concatenate str (not "int") to str

The first thing to know is that we can confirm what data type a variable is:

In [32]:
type(forces_data)

list

We can also identify a list by the presence of opening and closing square brackets (`[]`).

Next, we can count how many elements a list contains like so:

In [33]:
len(forces_data)

44

And we can view each element in a list as follows: 

In [34]:
for el in forces_data:
    print(el)
    print("\r")

{'id': 'avon-and-somerset', 'name': 'Avon and Somerset Constabulary'}

{'id': 'bedfordshire', 'name': 'Bedfordshire Police'}

{'id': 'cambridgeshire', 'name': 'Cambridgeshire Constabulary'}

{'id': 'cheshire', 'name': 'Cheshire Constabulary'}

{'id': 'city-of-london', 'name': 'City of London Police'}

{'id': 'cleveland', 'name': 'Cleveland Police'}

{'id': 'cumbria', 'name': 'Cumbria Constabulary'}

{'id': 'derbyshire', 'name': 'Derbyshire Constabulary'}

{'id': 'devon-and-cornwall', 'name': 'Devon & Cornwall Police'}

{'id': 'dorset', 'name': 'Dorset Police'}

{'id': 'durham', 'name': 'Durham Constabulary'}

{'id': 'dyfed-powys', 'name': 'Dyfed-Powys Police'}

{'id': 'essex', 'name': 'Essex Police'}

{'id': 'gloucestershire', 'name': 'Gloucestershire Constabulary'}

{'id': 'greater-manchester', 'name': 'Greater Manchester Police'}

{'id': 'gwent', 'name': 'Gwent Police'}

{'id': 'hampshire', 'name': 'Hampshire Constabulary'}

{'id': 'hertfordshire', 'name': 'Hertfords

Finally, we can access a particular element in a list by referring to its location (i.e., positional value or index). For example, which police force is located in position 10 in the list?

In [35]:
forces_data[9]

{'id': 'dorset', 'name': 'Dorset Police'}

Python begins counting at zero, hence why the value "9" refers to position "10" in the list. Simple rule of thumb: element *n* is located in position *n-1* of the list.

**TASK**: extract a different police force from the list using another index value.

In [36]:
forces_data[INSERT_INDEX_VALUE]

NameError: name 'INSERT_INDEX_VALUE' is not defined

OK, now that we're familiar with lists we can extract the `id` for each force and store them in a separate list like so:

In [37]:
force_ids = [el["id"] for el in forces_data]
force_ids

['avon-and-somerset',
 'bedfordshire',
 'cambridgeshire',
 'cheshire',
 'city-of-london',
 'cleveland',
 'cumbria',
 'derbyshire',
 'devon-and-cornwall',
 'dorset',
 'durham',
 'dyfed-powys',
 'essex',
 'gloucestershire',
 'greater-manchester',
 'gwent',
 'hampshire',
 'hertfordshire',
 'humberside',
 'kent',
 'lancashire',
 'leicestershire',
 'lincolnshire',
 'merseyside',
 'metropolitan',
 'norfolk',
 'north-wales',
 'north-yorkshire',
 'northamptonshire',
 'northumbria',
 'nottinghamshire',
 'northern-ireland',
 'south-wales',
 'south-yorkshire',
 'staffordshire',
 'suffolk',
 'surrey',
 'sussex',
 'thames-valley',
 'warwickshire',
 'west-mercia',
 'west-midlands',
 'west-yorkshire',
 'wiltshire']

To construct our list of force ids, we've made use of an intermediate technique in Python: *list comprehension*.

We create a new list called `force_ids`, and we populate this variable with the values from the `id` field for each element (`el`) in the list (`forces_data`).

#### Stop-and-search data

Now that we have a list of force ids we can request their respective stop-and-search data:

In [43]:
baseurl = "https://data.police.uk/api/" # base web address
sas = "stops-force" # stop-and-search endpoint

webadd_list = [] # create a blank list for storing web addresses
for el in force_ids: # for each id in the list
    webadd = baseurl + sas + "?force=" + el # construct the web address for that force
    webadd_list.append(webadd) # append the web address to the list
    
webadd_list # view the list of web addresses

['https://data.police.uk/api/stops-force?force=avon-and-somerset',
 'https://data.police.uk/api/stops-force?force=bedfordshire',
 'https://data.police.uk/api/stops-force?force=cambridgeshire',
 'https://data.police.uk/api/stops-force?force=cheshire',
 'https://data.police.uk/api/stops-force?force=city-of-london',
 'https://data.police.uk/api/stops-force?force=cleveland',
 'https://data.police.uk/api/stops-force?force=cumbria',
 'https://data.police.uk/api/stops-force?force=derbyshire',
 'https://data.police.uk/api/stops-force?force=devon-and-cornwall',
 'https://data.police.uk/api/stops-force?force=dorset',
 'https://data.police.uk/api/stops-force?force=durham',
 'https://data.police.uk/api/stops-force?force=dyfed-powys',
 'https://data.police.uk/api/stops-force?force=essex',
 'https://data.police.uk/api/stops-force?force=gloucestershire',
 'https://data.police.uk/api/stops-force?force=greater-manchester',
 'https://data.police.uk/api/stops-force?force=gwent',
 'https://data.police.uk/

### Saving results from the scrape

Let's conclude by saving the scraped data to a file for future use.

In [None]:
# Define a file to store the data

outfile = "./moby-dick-scraped-data.txt" # location and name of file

# Open the file and write (save) the data to it

with open(outfile, "w") as f:
    f.write(data)

How do we know this worked? The simplest way is to check whether a) the file was created, and b) the results were written to it.

In [None]:
# Check presence of file in current folder

os.listdir()

In [None]:
# Open file and read (import) its contents

with open(outfile, "r") as f:
    data = f.read()
    
print(data)  

And Voila, we have successfully scraped a web page!

## A social science example

While the above example is good for learning the basics, scraping research-relevant data from a web page is a little more difficult:
* Data may be spread throughout a web page (or across multiple pages).
* There may be many tags with similar data that need to be filtered in order to get to the information you need.
* And many other potential issues.

Let's look at a social data example to get a better sense of what web-scraping for research involves.

### Collecting charity data

For one of us (Diarmuid), web-scraping provides a means of collecting data that cannot be accessed any other way (other than manually copying-and-pasting from each charity's web page...). In particular, we are interested in collecting data about which policies a charity reportedly has in place. This information is interesting as it can be linked to observed organisational outcomes (e.g., is there a correlation between not having a risk policy and going out of business?).

Once again we'll work through the key steps in our general approach, this time a bit quicker and with less narrative.

In [None]:
# Import modules

import requests # module for requesting urls
import os # module for performing operating system tasks
import pandas as pd # module for working with datasets
from IPython.display import IFrame # module for embedding web pages, documents etc
from bs4 import BeautifulSoup as soup # module for parsing web pages

###  Identifying the web address

We're going to use the Charity Commission for England and Wales' website to capture policy data: <a href="https://beta.charitycommission.gov.uk/" target=_blank>https://beta.charitycommission.gov.uk/</a>

We're going to focus on just one charity for now - Oxfam; therefore the web address looks like this: <a href="https://beta.charitycommission.gov.uk/charity-details/?regId=202918&subId=0" target=_blank>https://beta.charitycommission.gov.uk/charity-details/?regId=202918&subId=0</a>

In [None]:
IFrame("https://beta.charitycommission.gov.uk/charity-details/?regId=202918&subId=0", width="800", height="650")

### Locating information

Policy data is located in the *Documents* tab under a heading called *Policies*, which in terms of the source code is here:

### Requesting the web page

Now that we possess the necessary information, let's begin the process of scraping the web page.

In [None]:
url = "https://beta.charitycommission.gov.uk/charity-details/?regId=202918&subId=0"

response = requests.get(url, allow_redirects=True)
response.status_code

### Parsing the web page



In [None]:
soup_response = soup(response.text, "html.parser")
# soup_response.body # view HTML code

In [None]:
links = soup_response.find_all("a")
links

### Extracting information

The policies are contained within a set of `<div></div>` tags where the *class* attribute equals "pcg-charity-details__block col-lg-6". There are multiple sets of tags with this id, therefore we need to use the `find_all()` method.

In [None]:
sections = soup_response.find_all("div", class_="pcg-charity-details__block col-lg-6")
len(sections) # view how many sets of tags are returned

Multiple sets of tags are returned, therefore how can we identify the correct set?

In [None]:
searchterm = "Policies" # search term identifying section containing list of policies

for section in sections: # for each section contained in the sections list:
    if searchterm in str(section): # if the search term exists in the section
        policy_location = sections.index(section) # store the list location of the correct section
        print(policy_location) # view the location of the policies in the list (i.e, is it the first element in the list?)
    else:
        continue

policy_section = sections[policy_location] # create a new variable containing the correct section
policy_section

Let's unpick the logic of the code above:
1. We know the list of policies is contained in a section (`<div>`) where *class_="pcg-charity-details__block col-lg-6"*.
2. We find all sections where the _class_ attribute equals "pcg-charity-details__block col-lg-6", and navigate to the correct one by evaluating whether it contains a relevant piece of text ("Policies"). This process revealed that the list of policies was contained in the sixth section (remember: lists begin at position 0, so 5 identifies the sixth element of a list). If we knew that the list of beneficiaries was always contained in the fifth section we wouldn't need the use of a search term, but this way is more robust to deviations in the structure and content of each charity's web page.

Now that we have the correct set of `<div></div>` tags, we need to extract the policy data from with the `<span></span>` tags.

In [None]:
policy_list = [] # define a blank list for storing the policy data
charity_name = "Oxfam" # define a variable for storing the charity's name

for tag in policy_section.find_all("span"): # for each set of span tags in the policy section
    policy = tag.text # extract the text from the tag
    observation = [charity_name, policy] # combine charity name and a policy
    policy_list.append(observation) # append the charity name and policy to the blank list
    
policy_list # view list of policies for the charity (long format)

Again, let's unpack the code above:
1. We define a variable called `policy_list` which will store the extracted text; at this point the list is empty. We also define a variable for storing the charity's name (`charity_name`).
2. Then, for each set of `<span></span>` tags in the `policy_section` variable, we extract the text from within the tags. We also define a variable called `observation` with stores a list of values: a charity's name and a given policy; finally we append the information to the empty list.

### Saving results from the scrape

Let's conclude by saving the scraped data to a file for future use. We'll make use of the excellent `pandas` module (shortened to `pd`) for converting the extracted text to a dataset prior to writing to a file.

In [None]:
policy_data = pd.DataFrame(list(policy_list), columns=["charity_name", "policy"])
policy_data

Finally, we can save the data set to a file for future use.

In [None]:
outfile = "./oxfam_policies.csv"

policy_data.to_csv(outfile, index=False)

In [None]:
os.listdir()

In [None]:
data = pd.read_csv(outfile, encoding="ISO-8859-1", index_col=False)
data

## What have we learned?

Let's recap what key skills and techniques we've learned:
* **How to import modules**. You will usually need to import modules into Python to support your work. Python does come with some methods and functions that are ready to use straight away, but for computational social science tasks you'll almost certainly need to import some additional modules.
* **How to request and parse web pages**. You can use Python to request a web page, and the `BeautifulSoup` module to parse its contents.
* **How to read and write data**. You can save the results of your scrape to a file for future use.
* **How to do all of the above in an efficient, clear and effective manner**.

## Conclusion

Web-scraping is a simple yet powerful computational method for collecting data of value for social science research. It provides a relatively gentle introduction to using programming languages, also. However, "with great power comes great responsibility" (sorry). Web-scraping takes you into the realm of data protection, website Terms of Service (ToS), and many murky ethical issues. Wielded sensibly and sensitively, web-scraping is a valuable and exciting social science research method. 

Good luck on your data-driven travels!

## Bibliography

Barba, Lorena A. et al. (2019). *Teaching and Learning with Jupyter*. <a href="https://jupyter4edu.github.io/jupyter-edu-book/" target=_blank>https://jupyter4edu.github.io/jupyter-edu-book/</a>.

Lau, S., Gonzalez, J., & Nolan, D. (n.d.). *Principles and Techniques of Data Science*. https://www.textbook.ds100.org

## Further reading and resources

We hope this brief lession has whetted your appetite for learning more about web-scraping and Python programming in general. There are some fantastic learning materials available to you, many of them free. We highly recommend the materials referenced in the Bibliography.

In addition, you may find the following resources useful:
* <a href="https://github.com/UKDataServiceOpen/web-scraping" target=_blank>**Web-scraping for Social Science Research**</a> - a free UK Data Service training series on web-scraping, with three webinars and lots of detailed coding examples.
* <a href="https://automatetheboringstuff.com/" target=_blank>**Automate the Boring Stuff with Python**</a> - a free ebook covering lots of interesting, practical uses of Python. Chapter 12 covers web-scraping.

--END OF FILE--