<p align="right"><i>Data Analysis for the Social Sciences - Part II - 2022-10-31</i></p>

# Week 8 - Getting Started

Welcome to Part II of Data Analysis for the Social Sciences. In this lab session we will get setup for conducting our first substantial piece of quantitative data analysis (Assessment 2).

We will focus on the following:
1. Installing R and RStudio
2. Setting up a project folder
3. Creating and executing a syntax file
4. Formulating / choosing a research question
5. Understanding the requirements for Assessment 2

### Lesson details

* **Level**: Introductory, for individuals with no prior knowledge or experience of quantitative data analysis.
* **Duration**: 45-60 minutes.
* **Pre-requisites**: None.
* **Programming language**: R.
* **Learning outcomes**:
	1. Understand how to setup your computing environment for conducting quantitative data analysis.
	2. Understand how to produce a robust, high-quality report for Assessment 2.

## Guide to using this notebook

This learning resource was built using <a href="https://jupyter.org/" target=_blank>Jupyter Notebook</a>, an open-source software application that allows you to mix code, results and narrative in a single document. As <a href="https://jupyter4edu.github.io/jupyter-edu-book/" target=_blank>Barba et al. (2019)</a> espouse:
> In a world where every subject matter can have a data-supported treatment, where computational devices are omnipresent and pervasive, the union of natural language and computation creates compelling communication and learning opportunities.

If you are familiar with Jupyter notebooks then skip ahead to the main content (*Installing R and RStudio*). Otherwise, the following is a quick guide to navigating and interacting with the notebook.

### Interaction

**You only need to execute the code that is contained in sections which are marked by `[]`.**

To execute a cell, click or double-click the cell and press the `Play` button next to the cell or select the `Run` button on the top toolbar (*Runtime > Run the focused cell*); you can also use the keyboard shortcuts `Shift + Enter` or `Ctrl + Enter`).

Try it for yourself:

In [None]:
name <- readline(prompt="Enter name: ")
print(paste("Hi,", name, "enjoy learning more about R and exploring data!"))

Notebooks are sequential, meaning code should be executed in order (top to bottom). For example, the following code won't work:

In [None]:
x * 5

As the error message suggests, there is no object (variable) called `x`, therefore we cannot do any calculations with it. 

Let's try a sequential approach:

In [None]:
x <- 10 # create an object called 'x' and give it the value '10'

In [None]:
x * 5 # multiply 'x' by 5

### Learn more

Jupyter notebooks provide rich, flexible features for conducting and documenting your data analysis workflow. To learn more about additional notebook features, we recommend working through some of the <a href="https://github.com/darribas/gds19/blob/master/content/labs/lab_00.ipynb" target=_blank>materials</a> provided by Dani Arribas-Bel at the University of Liverpool. 

### Learner input

Throughout the lessons there times when you need to do the following activities:
* **TASK:** A coding task for you to complete (e.g. create new variables).
* **QUESTION:** A question regarding your interpretation of some code or a technique (e.g. what is the piece of code doing?).
* **EXERCISE:** A data analysis challenge for you to complete.

## Installing R and RStudio

To conduct your own quantitative data analysis you will need two pieces of free software installed on your machine:
1. **R** - this is a programming language that has been customised for statistical analysis.
2. **RStudio** - this is a programming environment that provides a user-friendly interface for running R code.

### Installing R

This is, thankfully, a much simplified task these days. Simply head to https://www.stats.bris.ac.uk/R/ and select the correct link for your operating system. 

For example, if you are using a Windows machine select the [Download R for Windows](https://www.stats.bris.ac.uk/R/) option.

Follow the download instructions and *R* should be installed in no time.

### Installing RStudio

Head to https://www.rstudio.com/products/rstudio/download/ and select the **RStudio Desktop (Free)** option.

Select the correct link for your operating system. Often this appears at the top of the page as a large, blue button.

For example, if you are using a Windows machine you should see a [DOWNLOAD RSTUDIO FOR WINDOWS](https://download1.rstudio.org/desktop/windows/RStudio-2022.07.2-576.exe) option.

Follow the download instructions and *RStudio* should be installed in no time.

## Setting up a project folder

In [1]:
# VIDEO TBA

## Opening and exploring RStudio

Please see the excellent advice below for getting started with RStudio.

https://crimebythenumbers.com/intro-to-r.html#using-rstudio

## Creating and executing a syntax file

In [1]:
# VIDEO TBA

## Research questions

A smart piece of quantitative data analysis needs a focus: many secondary datasets of interest contain 100s of variables and 10,000s of obervations. As social researchers we are not interested in going on a "fishing expedition", conducting meandering, unfocused analyses. Specifying a well-defined research question provides a focus and framework for our analysis.

Therefore you need to formulate or select a research question to answer for Assessment 2. There are two ways you can do this:
1. Select one of the [questions suggested by the UK Data Service](./research-questions/8735_natsal_suggestions_for_research_questions.pdf).
2. Explore the [*Natsal* codebook](./codebook/8735_natsal_teaching_codebook_v1.pdf) for a list of variables that may of interest.

The only research question you cannot choose is the example we have been using throughout this module:

<p><center><i>Is religion associated with differences in sexual attitudes and behaviours among British people?</i></center></p>

## Assessment 2 requirements

Assessment 2 requires you to complete a 3000-word report based on a piece of quantitative data analysis (Scenario A).

To successfully complete this assessment, follow this framework:
1. Choose a research question.
2. Using the codebook, select 5-7 variables that help you answer this research question. One or two of these variables should be designated your **dependent variable(s)** and the others should be designated your **independent variables**.
3. Ensure R and RStudio are setup on your machine, and you have a sensible project folder created that contains the dataset, documentation, syntax files etc.
4. Conduct your data analysis using a syntax file (i.e., no drop-down menus); specifically we are looking for three types of analysis:
    * Univariate analysis
    * Bivariate analysis
    * Multivariate analysis
5. Write your report by following the structure and guidance provided in Week 12. You should also consult the [example report]() for inspiration.

### Data resources

You can find the dataset needed for your assessment on the *Quantitative Data* page under the Journey tab on [Aula](https://uws.aula.education/?#/dashboard/66f737f1-69c9-4e50-90aa-037f35c2d6ce/journey/materials/671e0a11-9b6f-4d5e-89c6-14c04045c883).