Skip to content
Data analysis in social science 3, University of Exeter, 2020
Branch: master
Clone or download
Pull request Compare This branch is 30 commits ahead of dataanalysis3:master.
Latest commit 1f498a0 Feb 19, 2020
Type Name Latest commit message Commit time
Failed to load latest commit information.
class11_files/figure-markdown_github class11 md file Mar 29, 2019
class6_files class 6 update Feb 19, 2020
class7_files/figure-markdown_github class 7 update Mar 1, 2019
class8_files class8 update Mar 12, 2019
.gitignore remove.html from .gitignore Jan 17, 2019 edit README Jan 15, 2020
class1.Rmd class1 Rmd and html update Jan 15, 2020
class1.html class1 Rmd and html update Jan 15, 2020
class10.Rmd class10 commit Mar 28, 2019
class11.Rmd class11 commit Mar 28, 2019 class11 md file Mar 29, 2019
class2.Rmd class 2 update Jan 22, 2020
class2.html class 2 update Jan 22, 2020
class3.Rmd class 3 update Jan 29, 2020 class 3 update Jan 29, 2020
class4.Rmd class4 update Feb 5, 2020
class5.Rmd Feb 19, 2020
class6.Rmd class 6 update Feb 19, 2020 class 6 update Feb 19, 2020
class7.Rmd class 7 update Mar 1, 2019 class 7 update Mar 1, 2019
class8.Rmd correct class8 files Mar 21, 2019 correct class8 files Mar 21, 2019
class9.Rmd class9 update Mar 15, 2019 update Jan 15, 2020

Data analysis in social science 3, University of Exeter (2020)

Module outline


  • Wednesday, 9.30-11.30am, Queens LT4.2


  • Dr Alexey Bessudnov (a.bessudnov [at]
  • Teaching assistant: Yiyang Gao (yg319 [at]

Office hours

  • Location: Clayden 1.05
  • Monday, 10-11am
  • Friday, 10-11am
  • Yiyang's office hours: Clayden, Wednesday, 11.30am - 1.30pm

Aims of the module

This is a fourth module in the data analysis in social science series. In the Introduction to Social Data you learned the basics of descriptive statistics and R. Data Analysis 1 introduced you to statistical inference. Data Analysis 2 covered linear regression analysis. In Data Analysis 3 we are not going to learn new statistical techniques, but will focus on how to apply the techniques you already know to the analysis of real-life data sets and how to produce good statistical reports.

This is a skill that you may need in a variety of jobs where data analytic expertise is required, such as market analysis, policy analysis in various fields, web analytics, data journalism, academic research, etc.

You already know how to use R to describe data and estimate simple statistical models. However, real-life data rarely come in the form of a perfectly formatted csv file ready for the analysis. The real life data sets often need to be reshaped, merged, recoded, aggregated and modified in various ways before you can even start your analysis. Unless you know how to do this you will not be able to conduct independent statistical analysis.

In this module we will use data from the Understanding Society, a large household panel study conducted in the UK. We will work with longitudinal data, which introduces a number of technical challenges.

We will use R, and you are expected to know the basics of data analysis in R already. The pre-requisites for this module are POL/SOC1041 and POL/SOC2077.

The only way to learn data analysis is to do data analysis. I will not be able to teach you this, but I can guide your independent learning. I use the "flipped classroom" model of teaching. This means that you are expected to read and master the required material BEFORE you come to class and we will often use class time to do exercises and check solutions rather than introduce new material.


You will need to use your own laptops in class. Plese install the following software.

  • R (please update your distribution if you have got it already installed)
  • R Studio
  • Git
  • LaTeX

All this software is free.


This is a technical module, and it will require effort and time commitment from you. As with other technical skills, missing some initial bits means that you may not be able to catch up. Attendance in this module is crucial. If you do not attend classes you will not be able to do well in this module.


The assessment for this module is a statistical report of 2,000 words (50% of your mark) and five short statistical assignments (10% each).

Statistical assignments will usually be programming exercises. You will need to complete them using R and Github and submit using Github Classroom so that each assignment is a separate repository. You will also need to submit links to your repositories via eBart. I will explain the procedure in more detail in class, and we will have a test (not graded) submission to make sure you understand how it works.

I will release problems for assignments 6 days before each deadline. Assignments must be your individual work and you are not allowed to work on them together with other students.

The deadlines for statistical assignments are as following.

  • Test submission: 28 January, 2pm (formative)
  • Assignment 1: 4 February, 2pm
  • Assignment 2: 11 February, 2pm
  • Assignment 3: 18 February, 2pm
  • Assignment 4: 3 March, 2pm
  • Assignment 5: 17 March, 2pm

The marking criteria for statistical assignments are correctness of your code and of substantive interpretations (where applicable).

The 3-week turnaround rule applies to statistical assignments, but our goal is to mark them and give you feedback as soon as possible.

For the final statistical report you will conduct independent analysis of the Understanding Society data and produce a report of 2,000 words describing the results of your analysis.

The deadline for the final statistical report is 28 April, 2pm. I will publish more details describing your task at least one month before the submission deadline. You will receive your marks and feedback by 12 June.

The marking criteria for statistical reports are the following: originality of approach, complexity of analysis, correctness of code, correctness of interpretations, knowledge of background literature, style and accuracy.

Students with ILP

Students with ILPs sometimes request extensions for statistical assignments. Please note that you can only do the same assignment as other students if your extension is for no longer than one week. All deadlines for statistical assignments are set on Tuesdays, and the answers will be discussed in class on Wednesdays the following week. After the solutions have been discussed in class you can no longer submit the same assignment. In this case you need to contact me as soon as possible and you will be given a new assignment.


I assign homework for each class and you need to complete it before coming to class.

Syllabus plan

The plan below is flexible and I may change some topics and dates as we proceed.

  • 15 January. Introduction to the module.
  • 22 January. Data analysis workflow. Reproducible research. R Markdown. Understanding Society data.
  • 29 January. Data transformation.
  • 5 February. Relational data.
  • 12 February. Tidy data and reshaping.
  • 19 February. Data visualisation (1).
  • 26 February. Data visualisation (2).
  • 4 March. Conditional statements and iteration.
  • 11 March. Functions.
  • 18 March. Data types. Factors.
  • 25 March. Strings.

Reading list

The module has a website: Please note that the website accompanied the module as delivered in 2018, and in 2020 we will do some things differently.

The main text for this module is Grolemund and Wickham's R for Data Science.

Solutions for the exercises in R for Data Science:

For details on how to use R Markdown see:

The guide on using Git and Github with R Studio:

Data visualisation.

R Programming.

Machine learning.

There are many other resources that can help you with R. DataCamp is an online learning platform that covers most topics in this module. Also see a list of other resources here

Full documentation for the Understanding Society is available at

You can’t perform that action at this time.