![UKDS Logo](./images/UKDS_Logos_Col_Grey_300dpi.png)

# Being a Computational Social Scientist

Welcome to the <a href="https://ukdataservice.ac.uk/" target=_blank>UK Data Service</a> training series on *New Forms of Data for Social Science Research*. This series guides you through some of the most common and valuable new sources of data available for social science research: data collected from websites, social media platorms, text data, conducting simulations (agent based modelling), to name a few. To help you get to grips with these new forms of data, we provide webinars, interactive notebooks containing live programming code, reading lists and more.

* To access training materials for the entire series: <a href="https://github.com/UKDataServiceOpen/new-forms-of-data" target=_blank>[Training Materials]</a>

* To keep up to date with upcoming and past training events: <a href="https://ukdataservice.ac.uk/news-and-events/events" target=_blank>[Events]</a>

* To get in contact with feedback, ideas or to seek assistance: <a href="https://ukdataservice.ac.uk/help.aspx" target=_blank>[Help]</a>

<a href="https://www.research.manchester.ac.uk/portal/julia.kasmire.html" target=_blank>Dr Julia Kasmire</a> and <a href="https://www.research.manchester.ac.uk/portal/diarmuid.mcdonnell.html" target=_blank>Dr Diarmuid McDonnell</a> <br />
UK Data Service  <br />
University of Manchester <br />
May 2020

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Guide-to-using-this-resource" data-toc-modified-id="Guide-to-using-this-resource-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Guide to using this resource</a></span><ul class="toc-item"><li><span><a href="#Interaction" data-toc-modified-id="Interaction-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Interaction</a></span></li><li><span><a href="#Learn-more" data-toc-modified-id="Learn-more-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Learn more</a></span></li></ul></li><li><span><a href="#Writing-code" data-toc-modified-id="Writing-code-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Writing code</a></span><ul class="toc-item"><li><span><a href="#What-is-a-programming-language?" data-toc-modified-id="What-is-a-programming-language?-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>What is a programming language?</a></span></li><li><span><a href="#Making-sense-of-terms" data-toc-modified-id="Making-sense-of-terms-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Making sense of terms</a></span></li><li><span><a href="#How-should-we-code?" data-toc-modified-id="How-should-we-code?-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>How should we code?</a></span></li><li><span><a href="#Coding-as-a-method" data-toc-modified-id="Coding-as-a-method-2.4"><span class="toc-item-num">2.4&nbsp;&nbsp;</span>Coding as a method</a></span></li><li><span><a href="#Social-science-example" data-toc-modified-id="Social-science-example-2.5"><span class="toc-item-num">2.5&nbsp;&nbsp;</span>Social science example</a></span></li></ul></li><li><span><a href="#Bibliography" data-toc-modified-id="Bibliography-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Bibliography</a></span></li><li><span><a href="#Further-reading-and-resources" data-toc-modified-id="Further-reading-and-resources-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Further reading and resources</a></span></li></ul></div>

-------------------------------------

<div style="text-align: center"><i><b>This is notebook 3 of 6 in this lesson</i></b></div>

-------------------------------------

## Guide to using this resource

This learning resource was built using <a href="https://jupyter.org/" target=_blank>Jupyter Notebook</a>, an open-source software application that allows you to mix code, results and narrative in a single document. As <a href="https://jupyter4edu.github.io/jupyter-edu-book/" target=_blank>Barba et al. (2019)</a> espouse:
> In a world where every subject matter can have a data-supported treatment, where computational devices are omnipresent and pervasive, the union of natural language and computation creates compelling communication and learning opportunities.

If you are familiar with Jupyter notebooks then skip ahead to the main content (*Collecting data from online databases using an API*). Otherwise, the following is a quick guide to navigating and interacting with the notebook.

### Interaction

**You only need to execute the code that is contained in sections which are marked by `In []`.**

To execute a cell, click or double-click the cell and press the `Run` button on the top toolbar (you can also use the keyboard shortcut Shift + Enter).

Try it for yourself:

In [1]:
print("Enter your name and press enter:")
name = input()
print("\r")
print("Hello {}, enjoy learning more about Python and computational social science!".format(name)) 

Enter your name and press enter:
Diarmuid

Hello Diarmuid, enjoy learning more about Python and computational social science!


### Learn more

Jupyter notebooks provide rich, flexible features for conducting and documenting your data analysis workflow. To learn more about additional notebook features, we recommend working through some of the <a href="https://github.com/darribas/gds19/blob/master/content/labs/lab_00.ipynb" target=_blank>materials</a> provided by Dani Arribas-Bel at the University of Liverpool. 

## Writing code

Perhaps the most crucial aspect of being a computational social scientist, the ability to write code using a programming language can bring enourmous rewards.

### What is a programming language?

In essence, a *programming language* is a set of instructions through which humans can interact with a computer. Similar to a spoken language, there are grammatical (e.g., specifying commands correctly) and syntactical (e.g., arranging commands in the correct order) rules that need to followed.

### Making sense of terms

Like learning any new language (spoken or programming), much of the difficulty arises from getting to grips with an unfamiliar vocabulary. The following are some general programming terms, adapted from Brooker (2020), that are worth keeping in mind as you progress on your computational oddysey:

* **Programming language** - a means of interacting with and issuing instructions to a computer.
* **Programming** - the practice of using a programming language (also known as *coding*).
* **Code** - the written down instructions that result from programming.
* **Script** - a collection of code.
* **Shell** - a tool that allows you to write and execute code e.g., using R without R Studio, using the Command Line Interface (CLI) on your computer. 
* **Debugging** - fixing errors or issues with your code.
* **Testing** - the process of discovering errors or issues with your code.

### How should we code?

Before delving into the practice of writing code, it is worth taking a step back and considering the type of code we want to write. While code is designed to be *acted on* by computers, it is *written, read and adapted* by humans. **Literate programming** is an approach that emphasises the readability and fluency of code. The father of this approach, Donald Knuth (1984, p. 97), summarises its high level aim:

> Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.

Let's contrast code written in accordance with a **literate programming** approach, with some that does not. First, some bare, unadulterated code:

In [4]:
import requests
var1 = "https://httpbin.org/html"
var2 = requests.get(var1)
var2.text[0:500]

"<!DOCTYPE html>\n<html>\n  <head>\n  </head>\n  <body>\n      <h1>Herman Melville - Moby-Dick</h1>\n\n      <div>\n        <p>\n          Availing himself of the mild, summer-cool weather that now reigned in these latitudes, and in preparation for the peculiarly active pursuits shortly to be anticipated, Perth, the begrimed, blistered old blacksmith, had not removed his portable forge to the hold again, after concluding his contributory work for Ahab's leg, but still retained it on deck, fast lashed to r"

Even if you have some experience of programming and Python, is it clear what the above piece of code is doing? The computer clearly does, as it produces a result (i.e., no error message), but could you adapt this code for your own purposes?

Let's edit the above into a more literate format:

In [5]:
# Import modules

import requests # library for requesting web pages

# Define web address 

web_page = "https://httpbin.org/html"

# Request web page

response = requests.get(web_page)

# View text contained in web page (first 500 characters)

response.text

"<!DOCTYPE html>\n<html>\n  <head>\n  </head>\n  <body>\n      <h1>Herman Melville - Moby-Dick</h1>\n\n      <div>\n        <p>\n          Availing himself of the mild, summer-cool weather that now reigned in these latitudes, and in preparation for the peculiarly active pursuits shortly to be anticipated, Perth, the begrimed, blistered old blacksmith, had not removed his portable forge to the hold again, after concluding his contributory work for Ahab's leg, but still retained it on deck, fast lashed to ringbolts by the foremast; being now almost incessantly invoked by the headsmen, and harpooneers, and bowsmen to do some little job for them; altering, or repairing, or new shaping their various weapons and boat furniture. Often he would be surrounded by an eager circle, all waiting to be served; holding boat-spades, pike-heads, harpoons, and lances, and jealously watching his every sooty movement, as he toiled. Nevertheless, this old man's was a patient hammer wielded by a patient arm.

The code still produces the same results but this time in a more legible manner. Note the addition of comments explaining what the next line of code does, as well as more descriptive variable names, line spacing, and clearer conceptual ordering of the commands.

Literate programming does not mean writing mountains of comments, simply that you keep the human reader of your code in mind as you program.

### Coding as a method

Programming is an increasingly popular social research method and can be conceived as (Brooker, 2019, p.1234):

> a multipurpose toolkit for understanding and intervening in the (digital) social world in lots of different ways.

We can write code to collect data from online sources (e.g., social media platforms, websites), analyse large-scale and unstructured data (e.g., natural language processing), and communicate the results of our research (e.g., by creating a Twitter bot that posts findings from your research when a certain term is mentioned).

Even within a social science research context, learning to code opens up near limitless applications and benefits.

### Social science example

Let's look at a quick example from one of our own research projects (Diarmuid). Data about Australian charities is found on the regulator's website and we would like to scrape some of this data (i.e., about trustees, finances etc). Each charity has a unique organisation id that looks like this:
* 15211513464 (University Of Sydney)

However, a charity's web page is found using a link that looks like this:
* https://www.acnc.gov.au/charity/750082c09988ea696067685d71239110 (University Of Sydney)

Notice how the link doesn't contain the unique organisation id but instead a longer sequence of characters, which we'll call its webpage id. Thus, in order to scrape a charity's details we need to know its webpage id. One solution is to write a function that takes a charity's unique organisation id (`15211513464`) and returns its webpage id (`750082c09988ea696067685d71239110`).

In [6]:
# Collect ACNC webpage ids for charities

def webid_download(abn: str):
 
    # Request web page

    session = requests.Session()

    webadd = "https://www.acnc.gov.au/charity?name_abn%5B0%5D=" + str(abn) 
    response = session.get(webadd)
    
    # Parse web page

    html_org = response.text
    soup_org = soup(html_org, "html.parser")

    # Find webpage id
    
    if soup_org.find('td', {'class': 'views-field views-field-acnc-search-api-title-sort'}):
        charlinkdetails = soup_org.find('td', {'class': 'views-field views-field-acnc-search-api-title-sort'})
        charlink = charlinkdetails.find('a').get('href')
        webid = charlink[9:]
        success = 1

    else:
        webid = "" 
        success = 0

    return abn, webid, success

The above code defines a function, now we need to call it using an example organisation id:

In [9]:
# Import modules

import requests
from bs4 import BeautifulSoup as soup

# Call the function and store the results in two variables

orgid = "86122248122"
abn, webid, success = webid_download(orgid)

# View results

if success==1:
    print("Charity {} has the following web id: {}".format(abn, webid))
else:
    print("Could not find web id for charity {}".format(abn))

Charity 86122248122 has the following web id: bd961f471d8ced41461cec2d0f1b0bc2


## Bibliography

Barba, Lorena A. et al. (2019). *Teaching and Learning with Jupyter*. <a href="https://jupyter4edu.github.io/jupyter-edu-book/" target=_blank>https://jupyter4edu.github.io/jupyter-edu-book/</a>.

Brooker, P. (2020). *Programming with Python for Social Scientists*. London: SAGE Publications Ltd.

Brooker, P. (2019). My unexpectedly militant bots: A case for Programming-as-Social 
-Science. *The Sociological Review, 67*: 1228-1248.

Knuth, D. (1984).  Literate Programming. *The Computer Journal, 27: 97-111*

## Further reading and resources

We hope this brief demonstration has whetted your appetite for learning more about Python and programming in general. There are some fantastic learning materials available to you, many of them free. We highly recommend the materials referenced in the Bibliography. In addition, you may find the following useful:
* <a href="https://github.com/UKDataServiceOpen/code-demos" target=blank>Introduction to Python for Social Scientists</a>
* <a href="http://doi.org/10.5281/zenodo.3474043" target=blank>Code Camp</a>
* <a href="https://assets.digitalocean.com/books/python/how-to-code-in-python.pdf" target=_blank>How to Code in Python 3</a>

<div style="text-align: right"><a href="./bcss-notebook-two-2020-02-12.ipynb" target=_blank><i>Previous section: Thinking computationally</i></a> &nbsp;&nbsp;&nbsp;&nbsp; | &nbsp;&nbsp;&nbsp;&nbsp;<a href="./bcss-notebook-four-2020-02-12.ipynb" target=_blank><i>Next section: Computational environments</i></a></div>