![UKDS Logo](./images/UKDS_Logos_Col_Grey_300dpi.png)

# Being a Computational Social Scientist

Welcome to the <a href="https://ukdataservice.ac.uk/" target=_blank>UK Data Service</a> training series on *New Forms of Data for Social Science Research*. This series guides you through some of the most common and valuable new sources of data available for social science research: data collected from websites, social media platorms, text data, conducting simulations (agent based modelling), to name a few. To help you get to grips with these new forms of data, we provide webinars, interactive notebooks containing live programming code, reading lists and more.

* To access training materials for the entire series: <a href="https://github.com/UKDataServiceOpen/new-forms-of-data" target=_blank>[Training Materials]</a>

* To keep up to date with upcoming and past training events: <a href="https://ukdataservice.ac.uk/news-and-events/events" target=_blank>[Events]</a>

* To get in contact with feedback, ideas or to seek assistance: <a href="https://ukdataservice.ac.uk/help.aspx" target=_blank>[Help]</a>

<a href="https://www.research.manchester.ac.uk/portal/julia.kasmire.html" target=_blank>Dr Julia Kasmire</a> and <a href="https://www.research.manchester.ac.uk/portal/diarmuid.mcdonnell.html" target=_blank>Dr Diarmuid McDonnell</a> <br />
UK Data Service  <br />
University of Manchester <br />
May 2020

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Guide-to-using-this-resource" data-toc-modified-id="Guide-to-using-this-resource-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Guide to using this resource</a></span><ul class="toc-item"><li><span><a href="#Interaction" data-toc-modified-id="Interaction-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Interaction</a></span></li><li><span><a href="#Learn-more" data-toc-modified-id="Learn-more-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Learn more</a></span></li></ul></li><li><span><a href="#Knowing-your-computational-environment" data-toc-modified-id="Knowing-your-computational-environment-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Knowing your computational environment</a></span><ul class="toc-item"><li><span><a href="#File-system-and-working-directory" data-toc-modified-id="File-system-and-working-directory-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>File system and working directory</a></span></li><li><span><a href="#Hardware-and-software" data-toc-modified-id="Hardware-and-software-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Hardware and software</a></span></li></ul></li><li><span><a href="#Bibliography" data-toc-modified-id="Bibliography-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Bibliography</a></span></li><li><span><a href="#Further-reading-and-resources" data-toc-modified-id="Further-reading-and-resources-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Further reading and resources</a></span></li></ul></div>

-------------------------------------

<div style="text-align: center"><i><b>This is notebook 4 of 6 in this lesson</i></b></div>

-------------------------------------

## Guide to using this resource

This learning resource was built using <a href="https://jupyter.org/" target=_blank>Jupyter Notebook</a>, an open-source software application that allows you to mix code, results and narrative in a single document. As <a href="https://jupyter4edu.github.io/jupyter-edu-book/" target=_blank>Barba et al. (2019)</a> espouse:
> In a world where every subject matter can have a data-supported treatment, where computational devices are omnipresent and pervasive, the union of natural language and computation creates compelling communication and learning opportunities.

If you are familiar with Jupyter notebooks then skip ahead to the main content (*Collecting data from online databases using an API*). Otherwise, the following is a quick guide to navigating and interacting with the notebook.

### Interaction

**You only need to execute the code that is contained in sections which are marked by `In []`.**

To execute a cell, click or double-click the cell and press the `Run` button on the top toolbar (you can also use the keyboard shortcut Shift + Enter).

Try it for yourself:

In [None]:
print("Enter your name and press enter:")
name = input()
print("\r")
print("Hello {}, enjoy learning more about Python and computational social science!".format(name)) 

### Learn more

Jupyter notebooks provide rich, flexible features for conducting and documenting your data analysis workflow. To learn more about additional notebook features, we recommend working through some of the <a href="https://github.com/darribas/gds19/blob/master/content/labs/lab_00.ipynb" target=_blank>materials</a> provided by Dani Arribas-Bel at the University of Liverpool. 

## Knowing your computational environment

All computational social science activities are dependent on knowing how to setup, manage and share a computational environment (The Turing Way Community, 2019). This can be as simple as understanding how and where files are located on your machine, to defining and documenting which software packages, versions and configurations are necessary to execute your data analysis. 

Whether you are thinking about scraping a web page or implementing an advanced machine learning algorithm, it all begins with establishing your computational environment. First, let's understand how files are stored and accessed on your machine.

### File system and working directory

It is critical that you think *logicially* and in an *organised* way about how you manage and store files for your project.

First thing to know is that files and folders stored on your machine's hard drive can and be accessed in two ways:
* Absolute path 
* Relative path

First, let's figure out where we are on our machine:

In [2]:
import os

os.getcwd()

'C:\\Users\\t95171dm\\projects\\comp-soc-sci\\code'

Next, let's look at what other files and folders exist where we currently are:

In [3]:
os.listdir()

['.ipynb_checkpoints',
 'bcss-code-2020-05-06.ipynb',
 'bcss-notebook-five-2020-02-12.ipynb',
 'bcss-notebook-four-2020-02-12.ipynb',
 'bcss-notebook-four-extended-2020-02-12.ipynb',
 'bcss-notebook-one-2020-02-12.ipynb',
 'bcss-notebook-three-2020-02-12.ipynb',
 'bcss-notebook-two-2020-02-12.ipynb',
 'data',
 'images',
 'README.md']

Roughly translated, this command says "Using os, list the contents of the directory (here)". If you did not run the commands in the previous cell block (the command to import os), you would get an error here. If so, make sure you go back and run the commands to import os and then try this command block again.

Using the "data" folder as an example, let's view the absolute and relative paths to this directory:

In [4]:
absolute_path = os.path.abspath("data")
absolute_path

'C:\\Users\\t95171dm\\projects\\comp-soc-sci\\code\\data'

In [5]:
os.listdir(absolute_path)

['oxfam-csv-2020-03-16.csv',
 'oxfam-csv-2020-03-16.json',
 'oxfam-csv-2020-03-16.xml']

In [10]:
os.listdir("./data")

['oxfam-csv-2020-03-16.csv',
 'oxfam-csv-2020-03-16.json',
 'oxfam-csv-2020-03-16.xml']

### Hardware and software
Your computational environment consists of hardware (e.g., the physical machine and its Central Processing Unit) and software (e.g., operating system, programming langauges and their versions, files). For instance, here is a snapshot of the environment of one of our work computers as of 2020-03-30. First, the operating system:

And the version of Python running on the computer, plus some of the additional packages (libraries) that were installed:

In [11]:
import sys

sys.version # view current version of Python

'3.7.3 (default, Apr 24 2019, 15:29:51) [MSC v.1915 64 bit (AMD64)]'

In [12]:
!pip freeze # view installed modules on your machine

alabaster==0.7.12
altair==4.0.1
anaconda-client==1.7.2
anaconda-navigator==1.9.7
anaconda-project==0.8.3
asn1crypto==1.3.0
astroid==2.3.3
astropy==4.0
atomicwrites==1.3.0
attrs==19.3.0
Babel==2.8.0
backcall==0.1.0
backports.functools-lru-cache==1.6.1
backports.os==0.1.1
backports.shutil-get-terminal-size==1.0.0
backports.tempfile==1.0
backports.weakref==1.0.post1
beautifulsoup4==4.8.2
bitarray==1.2.1
bkcharts==0.2
bleach==3.1.0
bokeh==2.0.0
boto==2.49.0
Bottleneck==1.3.2
certifi==2019.11.28
cffi==1.14.0
chardet==3.0.4
chart-studio==1.0.0
Click==7.0
cloudpickle==1.3.0
clyent==1.2.2
colorama==0.4.3
colorlover==0.3.0
comtypes==1.1.7
conda==4.8.2
conda-build==3.18.8
conda-package-handling==1.6.0
conda-verify==3.4.2
contextlib2==0.6.0.post1
cryptography==2.8
cufflinks==0.17.3
cycler==0.10.0
Cython==0.29.15
cytoolz==0.10.1
dask==2.12.0
decorator==4.4.2
defusedxml==0.6.0
Deprecated==1.2.9
dicttoxml==1.7.4
distributed==2.12.0
docutils==0.16
entrypoints==0.3
et-xmlfile==1.0.1
fastcache==1.1.0
f

In [13]:
sys.modules.keys() # view imported modules



Computational environments tend to be unique: for example, you may have different software applications installed on your machine compared to your classmate; or some machines in your computer lab run Windows 10, others Windows 7. This customisability presents considerable challenges for conducting, sharing and reproducing scientific work. In the words of the Turing Institute:<sup>[5]</sup>
> The analysis should be *mobile*. Mobility of compute is defined as the ability to define, create, and maintain a workflow locally while remaining confident that the workflow can be executed elsewhere.

Trying and failing to reproduce a piece of work after switching to a new machine is, frankly, soul destroying. Thankfully, there are numerous, simple technological solutions for capturing and sharing your computational environment.

## Bibliography

Barba, Lorena A. et al. (2019). *Teaching and Learning with Jupyter*. <a href="https://jupyter4edu.github.io/jupyter-edu-book/" target=_blank>https://jupyter4edu.github.io/jupyter-edu-book/</a>.

The Turing Way Community. (2019). *The Turing Way: A Handbook for Reproducible Data Science (Version v0.0.4)*. Zenodo. http://doi.org/10.5281/zenodo.3233986

## Further reading and resources

We highly recommend the materials referenced in the Bibliography. In addition, you may find the following useful:
* <a href="https://the-turing-way.netlify.app/reproducible_environments/reproducible_environments.html" target=blank>The Turing Way: A Handbook for Reproducible Data Science [Chapter 10]</a>
* <a href="https://uoa-eresearch.github.io/eresearch-cookbook/recipe/2014/11/26/python-virtual-env/" target=_blank>Python Virtual Environments</a>

<div style="text-align: right"><a href="./bcss-notebook-three-2020-02-12.ipynb" target=_blank><i>Previous section: Writing code</i></a> &nbsp;&nbsp;&nbsp;&nbsp; | &nbsp;&nbsp;&nbsp;&nbsp;<a href="./bcss-notebook-five-2020-02-12.ipynb" target=_blank><i>Next section: Understanding and manipulating data</i></a></div>