# Preface

## Course description

This course provides an introduction to data science for sophomore mechanical engineers with no prior knowledge on the topic. We start by defining data science and demonstrating why mechanical engineers should care. Most data science nowadays happens in Python. So, the course starts with a brief overview of Python basics going from simple mathematical expressions to data loading and visualization. Data science has to deal with uncertainty and randomness, and that is why we introduce several concepts from probability theory. We use these probabilistic concepts to summarize and compare datasets. Data science is about making models. For example, how can you make a model that will give you the efficiency of an engine given the operating conditions using experimental data? So, we end by showing how to use data to make models, and how to test if these models are good.

## Course learning outcomes
After completing this course, you will be able to:
+ Program in Python within a Jupyter notebook environment.
+ Summarize and compare datasets using empirically estimated statistics.
+ Summarize and compare datasets visually.
+ Represent uncertainty using probabilities.
+ Apply simple probability rules to propagate uncertainty.
+ Estimate probabilities from data.
+ Solve regression problems (learn from data a linear model that takes you from a set of input variables to a continuous variables).
+ Solve classification problems (learn from data a model that takes you from a set of input variables to a discrete label).

## Prerequisites
+ Basic calculus.
+ Matrix-vector multiplication.
+ Some programming experience (e.g., in Matlab).

# Lecture 1: Introduction

## Learning objectives

+ Define data science.
+ Highlight some applications of data science in mechanical engineering.
+ Explain the lecture book structure.
+ Explain why we will be using Python.
+ Introduce Jupyter notebooks.
+ Demonstrate how to run Jupyter notebooks on Google Colab.
+ Introduce Python expressions, names.
+ Introduce the basics of function calling.
+ Introduce the basic Python data types.
+ Demonstrate how one can get help on a Python function.
+ Show students how they can complete and submit the first homework assignment.

# What is data science?
According to [Wikipedia](https://en.wikipedia.org/wiki/Data_science) "Data
scienceÂ is an inter-disciplinary field that uses scientific methods, processes,
algorithms and systems to extract knowledge and insights from [...] data." This
is a pretty good definition!

In mechanical engineering applications, "scientific methods" come in the form of
physical laws. So, in our context, data science is about combining domain
knowledge (dynamics, mechanics of materials, fluid mechanics, etc.) with data.
This connection between data and domain knowledge is not ad hoc. The "glue" is
provided by probability theory and statistics. This course is giving you the
foundations you need to start learning more, or more precisely, the foundation
we can fit in one credit...

# Some cool appliations of data science in mechanical engineering
What follows is an incomplete list of cool things that one can do combining
mechanical engineering with data science. Note that these applications are
rather advanced. We will not learn how to carry them out in this class. You
will, however, learn the fundamentals upon which the data science components of
these applications are based.

## Space X Falcon rocket landing
Space X uses a particular data science technique called
[Kalman filter](https://en.wikipedia.org/wiki/Kalman_filter#:~:text=In%20statistics%20and%20control%20theory,than%20those%20based%20on%20a)Â 
to estimate the position and velocity of the Falcon rocket using noisy data
coming from GPS and accelerometers. The characteristic of this technique is that
it quantifies the uncertainty in the estimate. Once the estimate is available,
then we can decide which thrusters to activate to control the rocket as desired.

<iframe width="500" height="281" src="https://www.youtube.com/embed/l5I8jaMsHYk" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

## Boston Dynamics spot autonomous navigation
Spot is a dog robot. Spot also uses the Kalman filter to localize itself in
space. Here is the description of the video: "Spot autonomously navigates a
specified route through an office and lab facility. Before the test, the robot
is manually driven through the space so it can build a map of the space using
visual data from cameras mounted on the front, back and sides of the robot.
During the autonomous run, Spot uses data from the cameras to localize itself in
the map and to detect and avoid obstacles. Once the operator presses 'GO' at the
beginning of the video, the robot is on its own. Total walk time for this route
is just over 6 minutes."

<iframe width="500" height="281" src="https://www.youtube.com/embed/Ve9kWX_KXus" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

## Smart extraterrestrial habitats
The [Resilient Extraterrestrial Habitats Institute (RETHi)](https://purdue.edu/rethi/?_ga=2.219727577.1509543321.1624260470-1305845852.1623658385) is a NASA-funded
project based at Purdue the vision of which is to develop technologies that
enable the design of habitats on the Moon and Mars. These structures will likely
remain without crew for significant periods of time ranging from months to years
making autonomy a key requirement.
[Prof. Bilionis](https://www.predictivesciencelab.org) is leading the Awareness
Thrust of the institute. The Awareness Thrust is responsible for developing data
science technologies for assessing the health state of all habitat systems using
sensor data and controlling robotic agents to carry out maintenance and repair
activities. Note that every year there are various opportunities for
undergraduate research on data science related topics at the institute. In the
video you will see [Prof. Dyke](https://engineering.purdue.edu/IISL/), who is
leading the project.

<iframe width="500" height="281" src="https://www.youtube.com/embed/yFd8wE9qtkw" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

## Virtual surgery
[Prof. Buganza](https://engineering.purdue.edu/tepolelab/) combines physical
models of skin with data collected from patients and experiments to simulate
skin surgeries. In the process, he has to solve parameter calibration problems,
quantify uncertainties in skin properties, and propagate these uncertainties
through the physical models.

<iframe width="438" height="415" src="https://www.youtube.com/embed/lOl_tPj9sMs" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

# Why Python?

Python is the programming language used in the majority of data science
applications. Google is using Python (see https://www.tensorflow.org).
Facebook is using Python (see https://pytorch.org). We use Python for the
following reasons:
+ It is absolutely free.
+ It is very easy to learn.
+ There are amazing libraries for pretty much anything you may need to do
related to data science.
+ It is fun!