In [1]:
from IPython.core.display import HTML

def css_styling():
    styles = open("data/www/styles/custom.css", "r").read()
    return HTML(styles)
css_styling()

# Computational Social Science: Methods and Applications

## First of all, Welcome!

The research world you are entering is a markedly different one than 20 years ago. What was once supplemental, the computer, has become ubiquitous throughout research. Further, the need for programming as a skill, either to increase productivity, enhance reproducibility, or unlock innovative research programmes, has moved from nearly non-existent to almost expected. 

As a part of this course, we will attempt to define computational social science, where it came from and what it means now, as well as highlighting and growing three main skill areas to aid you in independent research. These broad areas are:

 * The web
 * Networks
 * Text analysis
 
Through primary research literature and hands-on work we will develop both an understanding and skill at executing research techniques in these areas.

## Class style

During class time we will use active learning, coding together and solving exercises with short periods of lecture and discussion. I structure the class because the act of 'programming' is incidental to the goal of learning how to conduct research in these areas. In class, I will go over the 'big idea' and discuss implementation (making sure we all start and end at the same place of understanding. 

## Before class preparation

As stated in the syllabus, this is **not** the class where you are taught how to program. You **must** already know the basics of how to program in Python. I do not expect you to be an expert, but I do expect that you can figure out how to solve basic issues and errors. 

My main focus in this class is maintaining momentum and equipping you with the skills necessary to confidently conduct independent research using these techniques. To accomplish this goal, there will not be time to help with remedial lessons on how to program. 

Because programming is very similar to speaking a foreign language, I have built in some 'flex' time into the first two weeks for everyone to 'dust some cobwebs' from their brain. However, this time will not exist after that period and students are expected to be able to keep pace.

If you feel like you need a directed guide to get back in the swing of things, I of course recommend the materials I use to teach my Introductory class on programming and data analysis (which are available at http://bit.ly/nico101). Similarly, if you are curious about what I consider to be the 'basics' of programming you can refer to the topics covered in that course. 

# Course Grades

* **Assignments (40%)** - Three programming assignments and one research project proposal. 

* **Participation (10%)** - One paper presentation and in-class discussion.

* **Final Project (50%)** - You will conduct an independent research project that involves the analysis of novel data. This project will commence week 4 and will be due at the end of the quarter. The deliverable will be a paper, written in academic style, that describes your research question and results. A write up of how you arrived at these conclusions (i.e. how you coded it) will comprise the methods section.


# Accessing and understanding human behavior on the web

#### Week 0. Introduction and digital trace data
1. [What is Computational Social Science?](presentations/intro_lecture.pdf)
2. [Digital trace data - APIs](lessons/Digital_Trace_Data-APIs.ipynb)
1. [Web scraping and crawling](lessons/Web-scraping.ipynb)
4. [Homework 1](homeworks/Week1Homework.ipynb)
5. Homework quiz (save html of result for next week) http://hexaco.org/hexaco-online

#### Week 1. Data extraction from digital trace data
0. [Homework 1 Peer Review](homeworks/peer_review.ipynb)
1. [Discussion - "The role of mentorship on protégé performance."](https://www.nature.com/articles/nature09040)
1. [Web structure](lessons/Web-Structure.ipynb)
1. [Discussion - "Computational social science: Obstacles and opportunities"](https://science.sciencemag.org/content/369/6507/1060?rss=1&__cf_chl_jschl_tk__=116e742ab9cdf4ef4c6ccd199959744380bd9c46-1605705546-0-AcO2POW-MIaF7gI39gw5vtgAASjq3latZDntczBswTRYaq0f4i1zoujjqVdI7z_Do6K0kBVcSCGsZ_oHVGhOKRbRZul11YJIShzJ_-HNstEYtgl3QZczOVYAJO-H8Z1MYVk6dBiLTQfvFsPM6eqSjv88tIGvT3CLvJWP_gxCAuSvjbiwVCKYm0Q_VOqPKMUzvvzTy8-yLm55WbX1LSEE_KNJBP2ScAlWPclx0MR_MBqJKA2CTPcUenW8s0SEoH3S3rRBCsWIe8fW_MVK6014s1ckd8yLGpLRjlO6gAyfHEWz70hqC0rirDGZp6jhueL_eNqBOrlVcKO7xW7Dsim2uwk)
2. [Processing web pages](lessons/Processing_Web_Pages.ipynb)
3. [Homework 2](homeworks/Week2Homework.ipynb)


# Fundamental analysis of our words

#### Week 2. Structuring unstructured data
1. Project Proposal Discussion
1. [Fundamentals of text processing](lessons/Text-processing.ipynb)
2. [Discussion - "A universal information theoretic approach to the identification of stopwords."](https://www.nature.com/articles/s42256-019-0112-6)
2. [Information theory and change](lessons/Information-Theory.ipynb) (On your own)
2. [Homework 4](homeworks/Week4Homework.ipynb)
3. Homework 3 - Research project proposal

#### Week 3. Connectedness
1. Homework 4 Peer Review
1. [Networks](lessons/Networks-graphtool.ipynb)
2. [Discussion - "Weaving the fabric of science: Dynamic network models of science's unfolding structure"](https://www.knowledgelab.org/publications/)

#### Week 4. Prediction as a task
1. Project Check-in
2. [Basics of sentiment analysis](lessons/Sentiment-Analysis.ipynb)
3. [Discussion - "Temporal Patterns of Happiness and Information in a Global Social Network: Hedonometrics and Twitter"](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0026752)
4. [Learned sentiment](lessons/Learned-Sentiment.ipynb)
5. [Scaling training data](lessons/Scaling-Training-Data.ipynb)

#### Week 5. Social Networks
0. [Social Networks](lessons/Social-Networks.ipynb)
1. [Discussion - "Collective dynamics of ‘small-world’ networks"](https://www.nature.com/articles/30918.)
1. [Discussion - "Emergence of Scaling in Random Networks"](https://science.sciencemag.org/content/286/5439/509.abstract?casa_token=V_GwJkY7SSIAAAAA:pZleXxZvZGzdJ22iqHNqCrI3-1os7zvXMEunDZ-HE9KDHD452VEmBUJ_OLXPxwoFTxQQEpitkYg)
2. [Discussion -- "The spread of obesity in a large social network over 32 years."](https://www.nejm.org/doi/full/10.1056/nejmsa066082)



# Global structure of the unstructured

#### Week 6. Mapping document clusters
1. [Discussion - "Latent Dirichlet Allocation"](https://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf)
1. [Discussion - "Reading Tea Leaves"](https://proceedings.neurips.cc/paper/2009/file/f92586a25bb3145facd64ab20fd554ff-Paper.pdf)
1. [Topic maps](lessons/topic-maps.ipynb)

#### Week 7. Text vectors and concept universality
1. [Discussion - "Distributed representations of words and phrases and their compositionality"](https://papers.nips.cc/paper/2013/file/9aa42b31882ec039965f3c4923ce901b-Paper.pdf)
2. [Discussion - "Semantics derived automatically from language corpora contain human-like biases"](https://science.sciencemag.org/content/356/6334/183)
1. [Word embeddings](lessons/text-vectorization.ipynb)

#### Week 8. Community Detection
1. [Discussion - "Benchmark graphs for testing community detection algorithms"](https://journals.aps.org/pre/abstract/10.1103/PhysRevE.78.046110)
2. [Discussion - "A network approach to topic models"](https://advances.sciencemag.org/content/4/7/eaaq1360?intcmp=trendmd-adv&utm_source=TrendMD&utm_medium=cpc&utm_campaign=TrendMD_1)
1. [Community detection](lessons/Community-detection.ipynb)
<!-- [Null models and bootstrapping](lessons/Null-models.ipynb) -->





