MS Applied Data Science
Instructor: Dimitri Yatsenko, Ph.D.
TA: Cesar Flores
Organization concepts and terminology of data models and the underlying data architectures needed to support them. Presentation of the relational database management systems including an introduction to SQL programming: relational database design and data queries with integration into application programming languages, with Python used for examples.
The course will include practical exercises and will be graded based on a few indvidual and group projects on real-world datasets.
Textbook: This courses focuses on learning through examples and relevant concepts will be introduced while applying them.
We will use two practical books for learning SQL, which will provide us with examples and exercises.
Practical skills for querying (Viescas) and designing (Hernandez) relational databases:
- John Viescas SQL Queries for Mere Mortals, Addison-Wesley Professional; 4 edition (2018). ISBN-10: 0134858336, ISBN-13: 978-0134858333
- Michael Hernandez, Database Design for Mere Mortals: A Hands-On Guide to Relational Database Design; 3rd edition (2013). ISBN-13: 978-0321884497, ISBN-10: 9780321884497
Not required: a more systematic and comprehensive introduction into database systems from the Computer Science undergraduate/grad perspective:
- Elmasri & Navathe, Fundamentals of Database Systems Person; 7th Edition (2015), ISBN-10: 1484213300, ISBN-13: 978-0133970777
Databases in data science. Data models: diverse ways to think about data: hiearchical, network, relational, object, graph, and document data models. History of datatabases and database technologies. Next-generation databases.
A Taste of SQL: Database access. Creating SQL tables and inserting data. Simple queries. sqlite and mysql.
Homework 1 (due Sep 10):
- Chapters 1-3
- Answer questions in [Block1.md](Block 1). Submit as a PDF file on Slack. Send a copy to the TA as well.
- Installation of SQL Magic for Jupyter
- Connecting to the database from Jupyter
- On the MySQL server, create a database named
<username>_universityand define a table namedperson. Make sure it has a well chosen primary key.
Querying databases from a host programming language for data analysis (Python).
Client interfaces. Work with individual tables: CREATE/DROP INSERT, DELETE, UPDATE, and SELECT statements.
Assignments: You are welcome to use any SQL client to connect to the database as discussed in class. I recommend using Jupyter for ease of interaction.
- Chapters 4-6
- Assignment 2 -- Due Sep 24. Print as PDF and send by Slack or email.
Advanced database design and queries. Modeling complex relationships. Multi-table queries.
Subqueries, joins.
Summarizing data. Using LEFT JOIN in conjunction with GROUP BY and the HAVING clause.
- Assignment 3 -- Due Oct 1. Print as PDF and send by Slack or email.
- Assignment 4 -- Due Oct 9. Print as PDF and send by Slack or email.
- Assignment 5 -- Due Oct 21. DataJoint queries with aggregation. Print as PDF and send by Slack or email.
Database design from conceptual modeling to implementation. Indexes. Data integrity. Database normalization.
Final Project -- group or individual.