MACS 30000: Perspectives on Computational Analysis (Autumn 2018)
|Dr. Richard Evans||Joshua G. Mausolf (TA)||Nora Nickels (TA)|
|Office||208 McGiffert House||204 McGiffert House||205 McGiffert House|
|Office Hours||Tu 10:30a-12:30p||M 1:30p-3:00p||W 2:00p-4:00p|
- Meeting day/time: MW 11:30a-1:20p, 247 Saieh Hall for Economics
- Lab session: W 4:30-5:20p, 247 Saieh Hall for Economics
- Office hours also available by appointment
Course Description, Objectives, and Outcomes
Computational Social Science (CSS) combines the theoretical paradigms of the social sciences with the expanded data and computational methods of computer science. Massive digital traces of human behavior and ubiquitous computation have both extended and altered classical social science inquiry. This course surveys successful social science applications of computational approaches to the representation of complex data, information visualization, and model construction and estimation. We will examine the scientific method in the social sciences in context of both theory development and testing, exploring how computation and digital data enables new answers to classic investigations, the posing of novel questions, and new ethical challenges and opportunities. Students will review fundamental research designs such as observational studies and experiments, statistical summaries, visualization of data, and how computational opportunities can enhance them. The focus of the course is on exploring the wide range of contemporary approaches to computational social science, with problem sets, programming exercises, and written assignments to gain experience with these methods.
- You will be introduced to the major research paradigms in computational social science.
- You will read recent seminal papers in CSS.
- You will begin to practice implementing CSS methods through assignments.
- You will write analytical assessments of papers, methods, and approaches.
- [S2018] Salganik, Matthew J., Bit by Bit: Social Research in the Digital Age, Princeton University Press, 2018. free online version
- You should buy a copy of this book BECAUSE there is a free online version. It will also be a valuable reference in your personal library, and will remain relevant for many years.
Grades will be based on your performance on nine assignments, each of which is worth 10 points.
Homework: I will give you 9 assignments. Some of these will be writing assignments. Some of these will be computational exercises.
- You must submit your assignments by committing and pushing them to your fork of this GitHub repository on your personal GitHub account in the appropriate folder (e.g.,
- Assignments will be given on the day listed in the Daily Course Outline section of this syllabus (see below). In general, assignments will be due before class at 11:30am a week after they are assigned. However, exact due dates and times will be listed on the assignment.
- You must submit your assignments by committing and pushing them to your fork of this GitHub repository on your personal GitHub account in the appropriate folder (e.g.,
Plagiarism on writing assignments: Josh and Nora held a Wednesday night lab on what constitutes plagiarism and how to avoid it. Academic honesty is an extremely important principle in academia and at the University of Chicago. See the course Canvas site library reserves for two chapters on plagiarism.
- Writing assignments must put in quotes and cite any excerpts taken from another work.
- If the cited work is the particular paper referenced in the Assignment, no works cited or references are necessary at the end of the composition.
- If the cited work is not the particular paper referenced in the Assignment, you MUST include a works cited or references section at the end of the composition.
- Any copying of other students' work will result in a zero grade and potential further academic discipline.
Late Problem Sets
Late problem sets will be penalized 1 points for every hour they are late. For example, if an assignment is due on Monday at 11:30am, the following points will be deducted based on the time stamp of the last commit.
|Example PR last commit||points deducted|
|11:31am to 12:30pm||-1 points|
|12:31pm to 1:30pm||-2 points|
|1:31pm to 2:30pm||-3 points|
|2:31pm to 3:30pm||-4 points|
|8:31pm and beyond||-10 points (no credit)|
Daily Course Schedule
|Oct. 1||M||Introduction to Comp Soc Sci||Slides|
|Oct. 3||W||Git and GitHub||Notes, Slides||A1|
|Oct. 8||M||Observational data, large data||S2018, Ch. 2|
|Oct. 10||W||Observational data||Slides||A2|
|Oct. 15||M||Observational data||F2015, RW2000|
|Oct. 17||W||Simulated data||Slides||A3|
|Oct. 22||M||Simulated data||M2002|
|Oct. 24||W||Asking questions||S2018, Ch. 3, Slides||A4|
|Oct. 29||M||Asking questions||CE2015, WRGG2015|
|Oct. 31||W||Experiments||S2018, Ch. 4, Slides||A5|
|Nov. 5||M||Experiments||SNCGG2007, AR2014|
|Nov. 7||W||Collaboration||S2018, Ch. 5, Slides||A6|
|Nov. 12||M||Collaboration||W2014, BKV2010|
|Nov. 14||W||Research collaboration||Slides, HJ2018||A7|
|Nov. 19||M||Ethics||S2018, Ch. 6, Slides|
|Nov. 21||W||Ethics||BF2015, Z2010||A8|
|Nov. 26||M||CSS: Sociology||KTE2018, MDSW2017, Slides|
|Nov. 28||W||CSS: Political Science||B2018, GST2018, Slides||A9|
|Dec. 3||M||CSS: Psychology||SMBMYF2018, YSCBGS2014, Slides|
|Dec. 5||W||CSS: Economics||A2018, BS2017, Slides|
- [A2017] Abrahao, Bruno, Paolo Parigi, Alok Gupta, and Karen S. Cook, "Reputation offsets trust judgments based on social biases among Airbnb users," PNAS, 114:37 (September 12, 2017), pp. 9849-9853.
- [AR2014] Alcott, Hunt and Todd Rogers, "The Short-run and Long-run Effects of Behavioral Interventions: Experimental Evidence from Energy Conservation," American Economic Review, 104:10 (Oct. 2014), pp. 3,003-3,037.
- [A1990] Angrist, Joshua D., "Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records," American Economic Review, 80:3 (1990), pp. 313-336.
- [AH2012] Ansolabehere, Stephen and Eitan Hersh, "Validation: What Big Data Reveal about Survey Misreporting and the Real Electorate," Political Analysis, 20:3, (2012), pp. 437-459.
- [A2018] Athey, Susan, "The Impact of Machine Learning on Economics," in The Economics of Artificial Intelligence: An Agenda, eds. Ajay K. Agrawal, Joshua Gans, and Avi Goldfarb, National Bureau of Economic Research (forthcoming, 2018).
- [B2009] Beazley, David M., Python Essential Reference, 4th edition, Addison-Wesley (2009).
- [BKV2010] Bell, Robert M., Yehuda Koren, and Chris Volinsky, "All Together Now: A Perspective on the Netflix Prize," Chance, 23:1 (2010), pp. 24-29.
- [B2014] Blumenstock, Joshua (2014), "Calling for Better Measuremenet: Estimating an Individual's Wealth and Well-Being from Mobile Phone Transaction Records," Presented at KDD--Data Science for Social Good 2014, New York.
- [B2018] Bonica, Adam, "Inferring Roll Call Scores from Campaign Contributions Using Supervised Machine Learning," American Journal of Political Science, (forthcoming, 2018). [link to paper]
- [BS2017] Brumm, Johannes and Simon Scheidegger, "Using Adaptive Sparse Grids to Solve High-dimensional Dynamic Models," Econometrica, 85:5, pp. 1575-1612 (Sep. 2017)
- [BF2015] Burnett, Sam and Nick Feamster, "Encore: Lightweight Measurement of Web Censorship with Cross-Origin Requests," in Proceedings of the 2015 ACM Conference on Special Interest Groups on Data Communication, ACM, London (2015), pp. 653-667.
- [CE2015] Canann, Taylor J. and Richard W. Evans, "Determinants of Short-term Lender Location and Interest Rates," Journal of Financial Services Research, 48:3, (Dec. 2015) pp. 235-262. [link to paper]
- [CS2014] Chacon, Scott and Ben Straub, Pro Git: Everything You Need to Know about Git, 2nd Edition, Apress, 2014. Free online version
- [CK2013] Costa, Dora L. and Matthew E. Kahn, "Energy Conservation Nudges and Environmentalist Ideology: Evidence from a Randomized Residential Electricity Field Experiment," Journal of the European Economic Association, 11:3 (2013), pp. 680-702.
- [DEP2018] DeBacker, Jason and Richard W. Evans and Kerk L. Phillips, "Integrating Microsimulation Models of Tax Policy into a DGE Macroeconomics Framework," Public Finance Review, forthcoming. [link to paper]
- [EKLS2015] Einav, Liran, Theresa Kuchler, Jonathan Levin, Neel Sundaresan, "Assessing Sale Strategies in Online Markets Using Matched Listings," American Economic Journal: Microeconomics, 7:2 (2015), pp. 215-247.
- [EJQ2016] Evans, Richard W., Kenneth L. Judd, and Kramer Quist, "Big Data Techniques as a Solution to Theory Problems," in Conquering Big Data with High Performance Computing, ed. Ritu Arora, Springer (2016). [link to paper]
- [F2015] Farber, Henry S., "Why You Can't Find a Taxi in the Rain and Other Labor Supply Lessons from Cab Drivers," Quarterly Journal of Economics, 130:4 (2015), pp. 1975-2026.
- [GST2018] Gentzkow, Matthew, Jesse M. Shapiro, and Matt Taddy, "Measuring Group Differences in High-dimensional Choices: Mothod and Application to Congressional Speech," NBER Working Paper #22423 (August 2018).
- [G2018] Gopalan, Sushmita, "Predicting Infant Mortality: Minimizing False Negatives," unpublished MACSS thesis (2018). [link to paper]
- [HJ2018] Humpherys, Jeffrey and Tyler J. Jarvis, "Unit Testing," Ch. 7 Labs for Foundations of Applied Mathematics: Python Essentials, Creative Commons, Open Access (2018). [link here]
- [KW2009] Kossinets, Gueorgi and Duncan J. Watts, "Origins of Homophily in an Evolving Social Network," American Journal of Sociology 115:2, (2009), pp. 405-450.
- [KTE2018] Kozlowski, Austin C., Matt Taddy, and James A. Evans, "The Geometry of Culture: Analyzing Meaning through Word Embeddings," working paper, Knowledge Lab, University of Chicago, under review (2018).
- [L2010] Langtangen, Hans Petter, Python Scripting for Computational Science, Texts in Computational Science and Engineering, 3rd edition, Springer (2010).
- [L2006] List, John A., "Friend or Foe? A Natural Experiment of the Prisoner's Dilemma," Review of Economics and Statistics, 88:3 (August 2006), pp. 463-471.
- [L2013] Lutz, Mark, Learning Python, 5th edition, O'Reilly Media, Inc. (2013).
- [MDSW2017] Mao, Andrew, Lili Dworkin, Siddharth Suri, and Duncan J. Watts, "Resilient Cooperators Stabilize Long-run Cooperation in the Finitely Repeated Prisoner’s Dilemma," Nature Communications, p. 13800 (January 2017).
- [MM2009] Mas, Alexandre and Enrico Moretti, "Peers at Work," American Economic Review, 99:1 (2009), pp. 112-145.
- [M2018] McKinney, Wes, Python for Data Analysis, 2nd edition, O'Reilly Media, Inc. (2018).
- [M2002] Moretti, Sabrina, "Computer Simulation in Sociology: What Contribution?" Social Science Computer Review, 20:1 (Spring 2002), pp. 43-57.
- [RW2000] Rosenzweig, Mark R. and Kennith I. Wolpin, "Natural 'Natural Experiments' in Economics," Journal of Economic Literature, 38:4 (Dec. 2000), pp. 827-874.
- [SMBMYF2018] Sanchez, Alessandro, Stephan C. Meylan, Mika Braginsky, Kyle E. MacDonald, Daniel Yurovsky, and Michael C. Frank, "childes-db: a Flexible and Reproducible Interface to the Child Language Data Exchange," under review (2018)
- [SNCGG2007] Schultz, P. Wesley, Jessica M. Nolan, Robert B. Cialdini, Noah J. Goldstein, and Vladas Griskevicius, "The Constructive, Destructive, and Reconstructive Power of Social Norms," Psychological Science, 18:5 (2007), pp. 429-434.
- [S2014] Sugie, Naomi F., "Finding Work: A Smartphone Study of Job Searching, Social Contacts, and Wellbeing After Prison,"" PhD Thesis, Princeton University (2014). [link here]
- [S2016] Sugie, Naomi F., "Utilizing Smartphones to Study Disadvantaged and hard-to-Reach Groups," Sociological Methods & Research, January (2016).
- [WRGG2015] Wang, Wei, David Rothschild, Sharad Goel, and Andrew Gelman, "Forecasting Elections with Non-Representative Polls," International Journal of Forecasting, 31:3 (2015) pp. 980-991.
- [W2014] Watts, Duncan J., "Common Sense and Sociological Explanations," American Journal of Sociology, 120:2 (Sep. 2014), pp. 313-351.
- [WWE2018] Wu, Lingfei, Dashun Wang, and James A. Evans, "Large Teams Have Developed Science and Technology; Small Teams Have Disrupted It," working paper, 2018. [link here]
- [YSCBGS2014] Yourganov, Grigori, Tanya Schmah, Nathan W. Churchill, Marc G. Berman, Cheryl L. Grady, and Stephen C. Strother, "Pattern Classification of fMRI Data: Applications for Analysis of Spatially Distributed Cortical Networks," NeuroImage, 96:1, pp. 117-132 (August 2014).
- [Z2010] Zimmer, Michael, "But the Data is Already Public: On the Ethics of Research in Facebook," Ethics and Information Technology, 12:4 (2010), pp. 313-325.
Jupyter notebooks are files that end with the
*.ipynb suffix. These notebooks are opened in a browser environment and are an open source web application that combines instructional text with live executable and modifyable code for many different programming platforms (e.g., Python, R, Julia). Jupyter notebooks are an ideal tool for teaching programming as they provide the code for a user to execute and they also provide the context and explanation for the code. A number of Jupyter notebooks are provided in the OSM Lab boot camp repository Tutorials folder.
These notebooks used to be Python-specific, and were therefore called iPython notebooks (hence the
*.ipynb suffix). But Jupyter notebooks now support many programming languages, although the name still pays homage to Python with the vestigal "py" in "Jupyter". The notebooks execute code from the kernel of the specific programming language on your local machine.
Jupyter notebooks capability will be automatically installed with your download of the Anaconda distribution of Python. If you did not download the Anaconda distribution of Python, you can download Jupyter notebooks separately by following the instructions on the Jupyter install page.
Opening a Jupyter notebook
Once Jupyter is installed--whether through Anaconda or through the Jupyter website--you can open a Jupyter notebook by the following steps.
- Navigate in your terminal to the folder in which the Jupyter notebook files reside. In the case of the Jupyter notebook tutorials in this repository, you would navigate to the
jupyter notebookat the terminal prompt.
- A Jupyter notebook session will open in your browser, showing the available
*.ipynbfiles in that directory.
- In some cases, you might receive a prompt in the terminal telling you to paste a url into your browser.
- Double click on the Jupyter notebook you would like to open.
It is worth noting that you can also simply navigate to the URL of the Jupyter notebook file in the GitHub repository on the web (e.g., https://github.com/OpenSourceMacro/BootCamp2018/blob/master/Tutorials/PythonReadIn.ipynb). You can read the Jupyter notebook on GitHub.com, but you cannot execute any of the cells. You can only execute the cells in the Jupyter notebook when you follow the steps above and open the file from a Jupyter notebook session in your browser.
Using an open Jupyter notebook
Once you have opened a Jupyter notebook, you will find the notebook has two main types of cells: Markdown cells and Code cells. Markdown cells have formatted Jupyter notebook markdown text, and serve primarily to present context for the coding cells. A reference for the markdown options in Jupyter notebooks is found in the Jupyter markdown documentation page.
You can edit a Markdown cell in a Jupyter notebook by double clicking on the cell and then making your changes. Make sure the cell-type box in the middle of the top menu bar is set to
Markdown. To implement your changes in the Markdown cell, type
A Code cell will have a
In [ ]: immediately to the left of the cell for input. The code in that cell can be executed by typing
Shift-Enter. For a Code cell, the cell-type box in the middle of the top menu bar says
Closing a Jupyter notebook
When you are done with a Jupyter notebook, you first save any changes that you want to remain with the notebook. Then you close the browser windows associated with that Jupyter notebook session. You should then close the local server instance that was opened to run the Jupyter notebook in your terminal window. On a Mac or Windows, this is done by going to your terminal window and typing
Ctrl-C and then selecting
y for yes and hitting
This course is not a programming course in which you receive in-class instruction on using a programming language such as Python or R. However, you will have assignments that requires some basic use of a programming language. For this reason, I am pointing you to the OSM Lab boot camp repository, which contains six basic Python tutorials in the
- PythonReadIn.ipynb. This Jupyter notebook provides instruction on basic Python I/O, reading data into Python, and saving data to disk.
- PythonNumpyPandas.ipynb. This Jupyter notebook provides instruction on working with data using
NumPyas well as Python's powerful data library
- PythonDescribe.ipynb. This Jupyter notebook provides instruction on describing, slicing, and manipulating data in Python.
- PythonFuncs.ipynb. This Jupyter notebook provides instruction on working with and writing Python functions.
- PythonVisualize.ipynb. This Jupyter notebook provides instruction on creating visualizations in Python.
- PythonRootMin.ipynb. This Jupyter notebook provides instruction on implementing univariate and multivariate root finders and unconstrained and constrained minimizers using functions in the
To further one's Python programming skills, a number of other great resources exist.
- The official Python 3 tutorial site
- QuantEcon.net is a site run by Thomas Sargent (NYU Stern) and John Stachurski (Australia National University). QuantEcon has a very large number of high-quality economics focused computational tutorials in Python. The first three sections provide a good introduction to Python programming.
- Python computational labs of the Applied and Computational Mathematics Emphasis at Brigham Young University.
- Code Academy's Python learning module
In addition, a number of excellent textbooks and reference manuals are very helpful and may be available in your local library. Or you may just want to have these in your own library. Lutz (2013) is a giant 1,500-page reference manual that has an expansive collection of materials targeted at beginners. Beazley (2009) is a more concise reference but is targeted at readers with some experience using Python. Despite its focus on a particular set of tools in the Python programming language, McKinney (2018) has a great introductory section that can serve as a good starting tutorial. Further, its focus on Python's data analysis capabilities is truly one of the important features of Python. Rounding out the list is Langtangen (2010). This book's focus on scientists and engineers makes it a unique reference for optimization, wrapping C and Fortran and other scientific computing topics using Python.
If you need any special accommodations, please provide Dr. Evans with a copy of your Accommodation Determination Letter (provided to you by the Student Disability Services office) as soon as possible so that you may discuss with me how your accommodations may be implemented in this course.