Skip to content

LauraVGP/Database-Systems-for-Data-Science

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Database-Systems-for-Data-Science (Fall 2020)

MS Applied Data Science

Instructor: Dimitri Yatsenko, Ph.D.

TA: Cesar Flores

Syllabus

Organization concepts and terminology of data models and the underlying data architectures needed to support them. Presentation of the relational database management systems including an introduction to SQL programming: relational database design and data queries with integration into application programming languages, with Python used for examples.

The course will include practical exercises and will be graded based on a few indvidual and group projects on real-world datasets.

Textbook: This courses focuses on learning through examples and relevant concepts will be introduced while applying them.

We will use two practical books for learning SQL, which will provide us with examples and exercises.

Practical skills for querying (Viescas) and designing (Hernandez) relational databases:

  1. John Viescas SQL Queries for Mere Mortals, Addison-Wesley Professional; 4 edition (2018). ISBN-10: 0134858336, ISBN-13: 978-0134858333
  2. Michael Hernandez, Database Design for Mere Mortals: A Hands-On Guide to Relational Database Design; 3rd edition (2013). ISBN-13: 978-0321884497, ISBN-10: 9780321884497

Not required: a more systematic and comprehensive introduction into database systems from the Computer Science undergraduate/grad perspective:

  • Elmasri & Navathe, Fundamentals of Database Systems Person; 7th Edition (2015), ISBN-10: 1484213300, ISBN-13: 978-0133970777

Weeks 1-2: (Aug 25, Sept 1)

Databases in data science. Data models: diverse ways to think about data: hiearchical, network, relational, object, graph, and document data models. History of datatabases and database technologies. Next-generation databases.

A Taste of SQL: Database access. Creating SQL tables and inserting data. Simple queries. sqlite and mysql.

Homework 1 (due Sep 10):

  1. Chapters 1-3
  2. Answer questions in [Block1.md](Block 1). Submit as a PDF file on Slack. Send a copy to the TA as well.
  3. Installation of SQL Magic for Jupyter
  4. Connecting to the database from Jupyter
  5. On the MySQL server, create a database named <username>_university and define a table named person. Make sure it has a well chosen primary key.

Weeks 4-6 (Sep 8, 15, 22)

Querying databases from a host programming language for data analysis (Python). Client interfaces. Work with individual tables: CREATE/DROP INSERT, DELETE, UPDATE, and SELECT statements.

Assignments: You are welcome to use any SQL client to connect to the database as discussed in class. I recommend using Jupyter for ease of interaction.

  1. Chapters 4-6
  2. Assignment 2 -- Due Sep 24. Print as PDF and send by Slack or email.

Weeks 7-8 (Sep 29, Oct 6)

Advanced database design and queries. Modeling complex relationships. Multi-table queries. Subqueries, joins. Summarizing data. Using LEFT JOIN in conjunction with GROUP BY and the HAVING clause.

  1. Assignment 3 -- Due Oct 1. Print as PDF and send by Slack or email.
  2. Assignment 4 -- Due Oct 9. Print as PDF and send by Slack or email.
  3. Assignment 5 -- Due Oct 21. DataJoint queries with aggregation. Print as PDF and send by Slack or email.

Week 9 (Oct 13) - No class - Fall Break

Database design from conceptual modeling to implementation. Indexes. Data integrity. Database normalization.

Week 10-12 (Oct 20, Oct 27, Nov 3)

Weeks 13-15 (Nov 10, 17, 24)

Final Project -- group or individual.

About

Database-Systems-for-Data-Science

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%