Skip to content

MUSA-620-Spring-2018/course-materials

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 

Repository files navigation

MUSA 620 - Data Wrangling and Data Visualization
University of Pennsylvania, School of Design

SCHEDULING

Class: Tuesdays from 9am to 12pm in Meyerson Hall, room B2.

Office hours: Monday from 4pm to 7pm. Email galkamaxd at gmail to schedule a time.

Instructor: Max Galka (galkamaxd at gmail dot com)

TA: Evan Cernea (ecernea at sas dot upenn dot edu)

OBJECTIVE

The purpose of this course is to familiarize students with the “pipeline” approach to data science. This involves the process of gathering data, storing the data, analyzing the data, and visualizing the data such that non-technical decision makers can make sense of it. The course is broken down accordingly into four sections.

  • Data collection: Students will learn how to gather data by way of web scraping, APIs, and other unstructured sources.
  • Databases: This part of the course teaches students how to store this data for efficient retrieval and analysis.
  • Analytics: Students will learn a range of machine-driven techniques for analyzing structured and unstructured data.
  • Data visualization: The last part of the course teaches students how to present the results of their analysis visually using R and the web application framework Shiny.

FORMAT

The course will be conducted in weekly sessions devoted to lectures, demonstrations, and in-class projects.

ASSIGNMENTS

There is one required final project at the end of the semester. Homework will be assigned before the close of class and will be due the following Tuesday by the end of day. Five of the homework assignments will be explicitly required. The remainder are optional, but will count toward the participation component of your final grade.

For the final project, students will replicate the pipeline approach on a dataset (or datasets) of their choosing. The final deliverable will be a web-based data visualization and accompanying description including a summary of the results and the methods used in each step of the process (collection, storage, analysis and visualization).

Final assignment

Assignment Q&A:

  • If you get stuck, the first step should always be to see if you can find the answer to your question online. In particular, Stack Overflow, Stack Exchange: GIS, and the rest of the Stack Exchange family are great resources.
  • You are encouraged to ask [and answer] questions via the Slack channel as opposed to email, in case other students will have also have the same question.
  • Evan and I are available for in depth discussion about assignment during office hours.

GRADING

The grading breakdown is as follows: 50% for homework; 40% for final project, 10% for participation

There will be five required homework assignments, due at the beginning of class. Late homework will be accepted for up to one week after the deadline and will be deducted 10%. Credit will not be given for homework that is late by more than one week.

SOFTWARE

This course relies on use of the R Statistical Package in conjunction with Shiny and other associated extensions. For geospatial topics, we will also use QGIS.

SCHEDULE

Class # Date Topic Homework*
Week 1 Jan 16 ggplot2, QGIS, data visualization fundamentals
Week 2 Jan 23 Data frames, tidyverse, map projections Assign HW 1
Week 3 Jan 30 Geocoding/mapping: ggmap, sf (simple features) package
Week 4 Feb 6 Databases: Postgres, SQL
Week 5 Feb 13 Databases: PostGIS, spatial queries Assign HW 2
Week 6 Feb 20 Web scraping 1: The DOM, web inspector
Week 7 Feb 27 Web scraping 2: CSS selectors, scraping dynamic pages Assign HW 3
Spring Break
Week 8 Mar 13 Unstructured data: Twitter API
Week 9 Mar 20 Natural language processing: sentiment analysis Assign HW 4
Week 10 Mar 27 Advanced data visualization
Week 11 Apr 3 Interactive maps: Leaflet
Week 12 Apr 10 Shiny Assign HW 5
Week 13 Apr 17 Shiny
Week 14 Apr 24 In-class work on final projects
  • Assignment dates of homework are tentative and subject to change

Final assignment

Releases

No releases published

Packages

No packages published