This course is aimed at professionals and students of Social Sciences, interested in the use of quantitative methods for the analysis of big data, with emphasis on the analysis of unstructured data and social networks. The objective is to provide a systematic exposition of the fundamental concepts for those who want to participate in the increasingly close link between Data Analysis and the Social Sciences.
The seminar is structured in 10 lectures of 3 hours. The lectures will be complemented by a laboratory/practical session during the afternoon. It will be held in R/R-Studio.
Online classes according to Mexico city time
Date: August 30th to September 10th, 2021
Schedule: 9:00 to 12:00
Lab Sessions: 13:00 to 15:00
Language: Taught in English
I am a Researcher at the @Twitter Civic Integrity and Misinformation Team and Ph.D. Candidate (expected October 2021) in the Department of Government and Politics at the University of Maryland, College Park. My research lies in the fields of Comparative Political Behavior and Political Communication. My dissertation explores issues of criminal violence, inequality, and preferences for harsh-on-crime policies in Latin America, while my agenda in political communication focuses on digital field and survey experimentation, news sharing, behavioral effects of social media consumption, and empirical applications of text-as-data models.
Throughout my Ph.D., I have been an leading member of the Interdisciplinary Laboratory for Computational Social Science (iLCSS) at UMD. While at the iLCSS, I have collaborated on a variety of projects, with several already published at top-ranked journals. My research has been published at Electoral Studies, Digital Journalism, Journal of Elections Public Opinion and Parties, Latin American Politics and Society, The Journal of Quantitative Description (Digital Media), and The SAGE Handbook of Research Methods in Political Science and International Relations . And my projects have been funded through grants and fellowships from the University of Maryland, Russel Sage Foundation, EGAP, the Inter-American Development Bank, CAPES (Brazilian Government).
I am also passionate about teaching and sharing my experience working on computational social science with other colleagues. I have taught several workshops at both graduate and undergraduate levels, including a full semester seminar on Introduction to Computational Social Science to undegraduate students. I was also the organizer of the first Summer Institute in Computational Social Science in Brazil during the Summer 2021.
You can contact me through my email (ventura@umd.edu). And you can know more about my research at my website
During week one, we cover several topics I would put under a Crash Course do Data Science kind of umbrella. We will start with a broader introduction to computational social science. Then we will go through a introduction basics of R, data manipulation, funcional programming, and the concept of tidy data. After that, we will see what I consider one of the key skills a data scientist should have: how to access, download, and work with digital data. We will cover both scrapping and APIs. To conclude, I will give you a intro do text-analysis in R.
The structure of the course is simple. The morning will be focused on lectures and coding example. We will embed some coding exercises during our morning lectures, however, most of the time we will focus on me going through some code and presentation with you. For the afternoons, you will have some practice questions, as well as the opportunity to work through the code we discussed during the morning lectures.
Our syllabus is available here. The presentation and code for each section goes below.
I prepared a tutorial where you can find some help on how to install the softwares we will be using in this workshop. Please, run through this tutorial before the first day of class.
The tutorial is here
-
Introduction to Computational Social Science (Presentation)
-
Intro to R (Presentation, Code)
Exercises and Readings for the Lab
- Chapter 5 of Hands-On Programming with R
- Carpentry Classes on Data Types and Structures, Data Subsetting, Read and Write
.csvfiles
- Introduction to Tidyverse (Presentation, Code)
Exercises and Readings for the Lab
R for Data Science
- Ch. 5: Data Transformation -
r4ds - Ch. 13: Relational Data -
r4ds - Ch. 18: Pipes -
r4ds - Ch. 10: Tibbles -
r4ds
-
TTidyverse II: Tidy Data + Strings in R (Presentation, Code)
-
Functional Programming + Loops (Presentation, Code)
Exercises and Readings for the Lab
R for Data Science:
Rebeccas Barter Tutorial on Purrr and Scopped Verbs in R
-
Webscrapping (Presentation, Code)
-
APIs (Presentation, Code)
-
Twitter API (Presentation, Code)
Exercises and Readings for the Lab
Check out all the excelente materials by Chris Bail on text analysis in R.
- Text analysis with tidy text (Presentation,Code, Data)
Exercises and Readings for the Lab
Text Minning with R
- Ch. 1: The tidy text format
- Ch. 2: Sentiment analysis with tidy data
- Ch. 3: Analyzing word and document frequency: tf-idf
- Ch. 5: Converting to and from non-tidy formats
- Ch. 6: Topic modeling
All the materials are available in this repo. If you are familiar with git, you can just clone this repository locally, and get access to everything in just one place.
This course will use R, which is a free and open-source programming language primarily used for statistics and data analysis. We will also use RStudio, which is an easy-to-use interface to R. Make sure you install R and RStudio before the first day of class.