<p align="right"><i>Data Analysis for the Social Sciences - Part II - 2022-10-31</i></p>

# Week 8 - Getting Started

Welcome to Part II of Data Analysis for the Social Sciences. In this lab session we will get setup for conducting our first substantial piece of quantitative data analysis (Assessment 2).

We will focus on the following:
1. Installing R and RStudio
2. Setting up a project folder
3. Creating and executing a syntax file
4. Formulating / choosing a research question
5. Understanding the requirements for Assessment 2

### Lesson details

* **Level**: Introductory, for individuals with no prior knowledge or experience of quantitative data analysis.
* **Duration**: 45-60 minutes.
* **Pre-requisites**: None.
* **Programming language**: R.
* **Learning outcomes**:
	1. Understand how to setup your computing environment for conducting quantitative data analysis.
	2. Understand how to produce a robust, high-quality report for Assessment 2.

## Installing R and RStudio

To conduct your own quantitative data analysis you will need two pieces of free software installed on your machine:
1. **R** - this is a programming language that has been customised for statistical analysis.
2. **RStudio** - this is a programming environment that provides a user-friendly interface for running R code.

### Installing R

This is, thankfully, a much simplified task these days. Simply head to https://www.stats.bris.ac.uk/R/ and select the correct link for your operating system. 

For example, if you are using a Windows machine select the [Download R for Windows](https://www.stats.bris.ac.uk/R/bin/windows/base/R-4.2.1-win.exe) option.

Follow the download instructions and *R* should be installed in no time.

### Installing RStudio

Head to https://www.rstudio.com/products/rstudio/download/ and select the **RStudio Desktop (Free)** option.

Select the correct link for your operating system. Often this appears at the top of the page as a large, blue button.

For example, if you are using a Windows machine you should see a [DOWNLOAD RSTUDIO FOR WINDOWS](https://download1.rstudio.org/desktop/windows/RStudio-2022.07.2-576.exe) option.

Follow the download instructions and *RStudio* should be installed in no time.

## Setting up a project folder

It is important that you organise your data analysis files and workflow. Even for a self-contained project like Assessment 2, you are likely to generate / collect a number of files including:
* The dataset underpinning the assessment
* The syntax files associated with each week's lesson
* The syntax file that produces your data analysis for the assessment report
* The codebook and any example reports
* Miscellaneous files and documents

Therefore it saves a lot of time and stress to put some thought into a sensible folder structure for your assessment. Say you are using your personal machine to conduct the data analysis; the following is an example folder structure you could use:

![Example folder structure](./images/example-folder-structure.png)

## Opening and exploring RStudio

Please see the excellent advice below for getting started with RStudio.

https://crimebythenumbers.com/intro-to-r.html#using-rstudio

## Creating and executing a syntax file

A syntax file - or script in RStudio - is a text document containing the *R* code that produces your data analysis.

Please see Christopher Barrie's (2022) concise guidance for creating and executing a syntax file in RStudio (ignore the *Loading packages* section).

https://cjbarrie.github.io/CS-ED/intror.html#getting-started-in-rstudio

You may wonder why we use syntax files: surely we can just type the code into the *R* / RStudio consoles, or use a dropdown menu? As I'm sure you can appreciate, the latter approaches suffer from a lack of **reproducibility**. That is, you need to retype the code each time you want to execute it, which introduces inefficiencies and inaccuracies.

With a syntax file you only need to write the code once, then it is executable whenever you need it. Even better, you can reuse code elsewhere in the syntax file by copy-and-pasting it e.g., executing the `summary()` function multiple times but for different variables.

As the final major benefit is all of your analysis is contained in a single file that can be saved and shared. The benefit of this will become apparant as you start writing up your analysis for the assessment report.

### Opening a syntax file

Once a syntax file exists on your machine, it is a simple task to open it in RStudio. For example, if you had a syntax file named *data-analysis.R*, you would do the following:

`File --> Open File --> data-analysis.R`

See the screenshots below for a visual representation of this process, this time for opening the syntax file associated with the Week 3 lab session.

![Opening a syntax file - step one](./images/rstudio-open-syntax-file.png)

![Opening a syntax file - step two](./images/rstudio-open-syntax-file-step-two.png)

![Opening a syntax file - step three](./images/rstudio-open-syntax-file-step-three.png)

## Research questions

A smart piece of quantitative data analysis needs a focus: many secondary datasets of interest contain 100s of variables and 10,000s of obervations. As social researchers we are not interested in going on a "fishing expedition", conducting meandering, unfocused analyses. Specifying a well-defined research question provides a focus and framework for our analysis.

Therefore you need to formulate or select a research question to answer for Assessment 2. There are two ways you can do this:
1. Select one of the [questions suggested by the UK Data Service](./research-questions/8735_natsal_suggestions_for_research_questions.pdf).
2. Explore the [*Natsal* codebook](./codebook/8735_natsal_teaching_codebook_v1.pdf) for a list of variables that may of interest.

The only research question you cannot choose is the example we have been using throughout this module:

<p><center><i>Is religion associated with differences in sexual attitudes and behaviours among British people?</i></center></p>

## Assessment 2 requirements

Assessment 2 requires you to complete a 3000-word report based on a piece of quantitative data analysis (Scenario A).

To successfully complete this assessment, follow this framework:
1. Choose a research question.
2. Using the codebook, select 5-7 variables that help you answer this research question. One or two of these variables should be designated your **dependent variable(s)** and the others should be designated your **independent variables**.
3. Ensure R and RStudio are setup on your machine, and you have a sensible project folder created that contains the dataset, documentation, syntax files etc.
4. Conduct your data analysis using a syntax file (i.e., no drop-down menus); specifically we are looking for three types of analysis:
    * Univariate analysis
    * Bivariate analysis
    * Multivariate analysis
5. Write your report by following the structure and guidance provided in Week 12. You should also consult the [example report]() for inspiration.

### Data resources

You can find the dataset needed for your assessment on the *Quantitative Data* page under the Journey tab on [Aula](https://uws.aula.education/?#/dashboard/66f737f1-69c9-4e50-90aa-037f35c2d6ce/journey/materials/671e0a11-9b6f-4d5e-89c6-14c04045c883).