In [1]:
from IPython.core.display import HTML

def css_styling():
    styles = open("data/www/styles/custom.css", "r").read()
    return HTML(styles)
css_styling()

# Computational Social Science: Methods and Applications

## First of all, Welcome!

The research world you are entering is a markedly different one than 20 years ago. What was once supplemental, the computer, has become ubiquitous throughout research. Further, the need for programming as a skill, either to increase productivity, enhance reproducibility, or unlock innovative research programmes, has moved from nearly non-existent to almost expected. 

As a part of this course, we will attempt to define computational social science, where it came from and what it means now, as well as highlighting and growing three main skill areas to aid you in independent research. These broad areas are:

 * The Web
 * Networks
 * Text analysis
 
Through primary research literature and hands-on work we will develop both an understanding and skill at executing research techniques in these areas.

## Class style

During class time we will use active learning, coding together and solving exercises with short periods of lecture and discussion. I structure the class because the act of 'programming' is incidental to the goal of learning how to conduct research in these areas. In class, I will go over the 'big idea' and discuss implementation (making sure we all start and end at the same place of understanding. 


# Refreshing Python Fundamentals

#### Introduction

1. [What is Computational Social Science?](presentations/intro_lecture.pdf)
2. [Basic Data Types](lessons/fundamentals/Data-Types.ipynb)
3. [Collections](lessons/fundamentals/Lists-Tuples-and-Sets.ipynb)
4. [Homework](homeworks/data_types_homework.ipynb)


#### Data processing with hash maps

0. [Homework Peer Review](homeworks/peer_review.ipynb)
1. [Flow Control](lessons/fundamentals/Flow-Control.ipynb)
2. [Dictionaries](lessons/fundamentals/dictionaries.ipynb)
3. [Homework](homeworks/dictionaries_homework.ipynb)

#### Tabular data management

0. [HOmework Peer Review]
1. [Pandas](lessons/fundamentals/Structured-Data-Analysis-Pt1.ipynb)
2. [Regression](lesson/fundamentals/Structured-Data-Analysis-Pt2.ipynb)
3. [Homework](homeworks/pandas_homework.ipynb)


# Accessing and understanding human behavior on the web

#### Digital trace data acquisition


2. [Digital trace data - APIs](lessons/web/Digital_Trace_Data-APIs.ipynb)
1. [Web scraping and crawling](lessons/web/Web-scraping.ipynb)
4. [Homework](homeworks/api_homework.ipynb)

#### Data extraction from the web

0. Homework Peer Review
1. [Web structure](lessons/web/Web-Structure.ipynb)
2. [Processing web pages](lessons/web/Processing_Web_Pages.ipynb)
3. [Homework](homeworks/html_extraction_homework.ipynb)
4. Project (home)work - Research project proposal

#### Data extraction from PDF

0. Homework Peer Review
1. [PDF Extraction](lessons/web/PDF_Extraction.ipynb)
1. Project (home)work -- Project Roundtable

# Fundamentals of unstructured data

#### Structuring unstructured data

1. [Fundamentals of text processing](lessons/Text-processing.ipynb)
2. [Discussion - "A universal information theoretic approach to the identification of stopwords."](https://www.nature.com/articles/s42256-019-0112-6)
2. [Information theory and change](lessons/Information-Theory.ipynb) (On your own)


#### Connectedness
1. [Networks](lessons/Networks-graphtool.ipynb)
2. [Discussion - "Weaving the fabric of science: Dynamic network models of science's unfolding structure"](https://www.knowledgelab.org/publications/)

#### Prediction as a task
2. [Basics of sentiment analysis](lessons/Sentiment-Analysis.ipynb)
3. [Discussion - "Temporal Patterns of Happiness and Information in a Global Social Network: Hedonometrics and Twitter"](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0026752)
4. [Learned sentiment](lessons/Learned-Sentiment.ipynb)
5. [Scaling training data](lessons/Scaling-Training-Data.ipynb)

#### Social Networks
0. [Social Networks](lessons/Social-Networks.ipynb)
1. [Discussion - "Collective dynamics of ‘small-world’ networks"](https://www.nature.com/articles/30918.)
1. [Discussion - "Emergence of Scaling in Random Networks"](https://science.sciencemag.org/content/286/5439/509.abstract?casa_token=V_GwJkY7SSIAAAAA:pZleXxZvZGzdJ22iqHNqCrI3-1os7zvXMEunDZ-HE9KDHD452VEmBUJ_OLXPxwoFTxQQEpitkYg)
2. [Discussion -- "The spread of obesity in a large social network over 32 years."](https://www.nejm.org/doi/full/10.1056/nejmsa066082)



# Global structure of the unstructured

#### Mapping document clusters
1. [Discussion - "Latent Dirichlet Allocation"](https://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf)
1. [Discussion - "Reading Tea Leaves"](https://proceedings.neurips.cc/paper/2009/file/f92586a25bb3145facd64ab20fd554ff-Paper.pdf)
1. [Topic maps](lessons/topic-maps.ipynb)

#### Text vectors and concept universality
1. [Discussion - "Distributed representations of words and phrases and their compositionality"](https://papers.nips.cc/paper/2013/file/9aa42b31882ec039965f3c4923ce901b-Paper.pdf)
2. [Discussion - "Semantics derived automatically from language corpora contain human-like biases"](https://science.sciencemag.org/content/356/6334/183)
1. [Word embeddings](lessons/text-vectorization.ipynb)

#### Week 8. NER and Disambiguation
1. [NER and Disambiguation](lessons/ner.ipynb)
2. [Discussion -- "PRESIDE: A Judge Entity Recognition and Disambiguation Model for US District Court Records"](https://www.dropbox.com/s/b5gv3ebtmlespe5/PRESIDE.pdf?dl=0)


#### Community Detection
1. [Discussion - "Benchmark graphs for testing community detection algorithms"](https://journals.aps.org/pre/abstract/10.1103/PhysRevE.78.046110)
2. [Discussion - "A network approach to topic models"](https://advances.sciencemag.org/content/4/7/eaaq1360?intcmp=trendmd-adv&utm_source=TrendMD&utm_medium=cpc&utm_campaign=TrendMD_1)
1. [Community detection](lessons/Community-detection.ipynb)
<!-- [Null models and bootstrapping](lessons/Null-models.ipynb) -->





