Instructor: Allison Parrish
The goal of this course is to introduce students to essential techniques in working with text and data and to enhance their literacy in the language and practice of contemporary computer programming. Students will learn the Python programming language from scratch, and work toward making a small final project that meaningfully contributes to their interests and practice. Covered topics: working on the command line, Jupyter Notebook, data formats (CSV and JSON), Pandas, web scraping, text analysis with Natural Language Processing, Git and Github.
- Introduction
- Text processing on the command line
- Suggested exercise: UNIX command line exercise
- Collaborate to make a list of data journalism, data science and text analysis projects that are meaningful and aspirational for you.
- Review the list of aspirational projects
- Python: Expressions and strings
- What is plain text?
- Python: Writing Python programs
- Suggested exercise: Write a Python program that mimics a UNIX command line tool. (Or otherwise creatively analyse or modify a text.)
- Jupyter Notebook tutorial
- Lists and loops
- Jupyter notebook from class
- Suggested exercise: Exercise A. (Download this Python file and make the modifications suggested in the comments until the output matches the suggestions.)
- Dictionaries, sets and tuples
- Jupyter notebook we created in class
- Suggested exercise: Exercise B
- Counting things
- Accessing Web APIs
- Jupyter notebook from class
- Suggested exercise: Web API Worksheet
- Scraping HTML with Beautiful Soup
- Jupyter notebook from class
- Suggested exercise: Web Scraping Worksheet
- Pandas for simple data analysis and visualization
- Jupyter notebook from class
- Suggested exercise: Pandas worksheet
- Pandas, continued
- Jupyter notebook from class
- Suggested exercise: Continue the Pandas worksheet
- Terms to know when talking about language
- Intro to NLP with spaCy (skipped!)
- Parsing and tagging Chinese with Jieba
- Understanding word vectors
- Experimental Chinese word vector notebook! Very incomplete! (Requires this file of 50k word vectors from FastText. Do not decompress!)
- Project presentations