Skip to content

UCLALuskinDataScience/urbandatascience-s22

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UP229 Urban Data Science

Spring 2022: Wednesdays 9-11.50, Public Affairs 2343

Instructor: Adam Millard-Ball, he/him

Teaching assistant: Tiffany Green

Office hours:

  • Adam Millard-Ball: Varies, but normally Tuesdays 2-3 and Wednesdays 12-1. Sign up here
  • Tiffany Green: Fridays 2-3 on Zoom. No sign up needed.

About this course: New data sources are a potential goldmine for urban planners and policy makers. But sometimes they are large, sometimes they are messy, sometimes they are awkward to access, and often they are all of these things. In this hands-on course, we’ll develop skills in scraping, processing, and managing urban data, and using tools such as natural language processing, geospatial analysis, and machine learning. We’ll use examples from transit, housing, and equity planning, and build competence in open-source tools and languages such as Python and SQL. We’ll also consider the limits to data science, and the biases and pitfalls that "big data" can entail.

Prerequisites: Basic Python programming experience, for example through UP206A (Introduction to Geographic Information Systems and Spatial Data Science), or an introductory Python course. One good, free option is offered by Data Carpentry. Another is the University of Michigan Introduction to Data Science in Python course; if you have no prior knowledge, you should take Programming for Everyone first. (You can take these for free if you choose the "audit" option.) Whichever option you choose, before starting this course you should be familiar with Python syntax, Jupyter notebooks, plotting via matplotlib, and pandas and geopandas dataframes.

Learning objectives

  • Expand your urban data analysis, visualization, and Python skills, regardless of your starting point
  • Identify applications of these techniques to urban planning challenges
  • Know how to read API documentation and where to get more information
  • Understand how to collaborate using git and other software tools
  • Critically analyze the constraints to data science methods, particularly in terms of ethics and causal inference

Course tools

All the assignments, and lecture notebooks will be posted on this GitHub site. Readings will be on Canvas.

I ask you to do the assignments using a Jupyter Notebook in Anconda. That helps make sure that we all have a consistent Python setup across different computers (Mac, Windows, etc.).

We will primarily use Slack to maintain communication outside of scheduled class times. In particular, it provides a space for problem solving and for you to help each other. You’ll receive an email invitation from me to join our Slack workspace. If you have not used Slack before, please familiarize yourself with the tool using the resources here.

Textbook

There is no textbook for the class. All required readings will be posted here on GitHub. However, you may find some of the following books helpful.

Class participation

Your active participation is essential to making this course successful and enjoyable. I expect you to:

  • Actively follow the examples in class through your own copy of the Jupyter notebook posted in advance. Tweak and experiment with the examples. If you don't follow an example, let me know—others will undoubtedly have the same question.
  • Discuss the substantive readings on Slack. Post a question, idea, or comment by 8.30am on the day of class. Please engage with the posts of others as well as writing your own—this is a discussion board, not a repository of essays.
  • Use Slack to help each other out with questions on the assignments and projects.

Graded assignments

Biweekly homeworks (25%). You must submit at least 4 out of 5 homeworks on time (but please do them all). We'll spend time in class on Wednesdays working through the homeworks, so please make a start on it before then.

Challenge problems (25%). Most homeworks will include a challenge problem, which is more open ended. You must do at least 2 of these, and present one in class.

Final project (35%). Working in groups of 2-3, you'll conceptualize and implement an urban data science project. You'll submit a proposal (Week 3), and make lightening presentations of your interim (Weeks 6-7) and final analysis (Weeks 9-10).

Class participation (15%). Your class participation grade will consider attendance and active participation in class and on Slack.

Course Policies

Accessibility and Disabilities

If you require any accommodations because of a disability, please talk to me within the first two weeks of the quarter if possible. The sooner that I am aware of any accessibility needs, the quicker I can try and accommodate them.

Late Submission of Assignments

Students can make a formal request to the instructor for special consideration for an extension to an assignment due date. This request should be received at least 48 hours in advance. Otherwise, one partial grade will be deducted for every 24-hour period an assignment is late. For example, an A- will go to a B+. Note that no extensions are possible for the homeworks (because I post the solutions promptly)—that's why only 4 out of 5 homeworks are required.

Academic Integrity

UCLA’s policy about plagiarism is clear: the sources of all ideas, text, pictures, or graphics that are not your (or your team’s) own must be fully cited, all passages copied from other sources must be in quotation marks with the source cited, and you absolutely cannot submit materials that have previously been submitted by other students in previous iterations of this course, even if you have re-worked this material for your submission. Being in this class constitutes an acknowledgment and willingness to abide by UCLA’s academic integrity policies. Should you have any questions about these policies, see here. (Thanks to Prof. Mike Manville for permission to use this text.)

The same principles of academic integrity apply to your code. If you borrow any code snippets or ideas, acknoweldge them with a comment in the code. E.g.

# Iteration code from: https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas

Weekly schedule

The schedule is preliminary and subject to change, depending on how quickly or slowly we move through the material.

Jupyter notebooks for the pre-recorded lectures and in-class assignments will be posted on GitHub. The course readings and lecture videos will all be posted on Canvas.

Pre-course:

Week 1: Introduction and APIs

HW1: Due Tuesday Apr 12.

Week 2: Web scraping

Week 3: Data wrangling

HW2: Due Tuesday Apr 26

Week 4: Spatial joins

Week 5: Machine learning: prediction

HW3: Due Tuesday May 10

Week 6: Machine learning: clustering

Week 7: Catch up

HW4: Due Tuesday May 24

Week 8: Natural language processing: parsing

Week 9: Natural language processing: topic modeling and sentiment analysis

Week 10: Databases, big data, privacy, and ethics

HW5: Due Tuesday June 7

About

UP229 Urban Data Science in Spring 2022

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published