
# <font color=#770000>ICPE 639 Introduction to Machine Learning </font>

## ------ With Energy Applications

Some of the examples and exercises of this course are based on several books as well as open-access materials on machine learning, including [Hands-on Machine Learning with Scikit-Learn, Keras and TensorFlow](https://www.oreilly.com/library/view/hands-on-machine-learning/9781492032632/) as well as an interesting ["towards data science forum"](https://towardsdatascience.com/). 


<p> &#169; 2021: Xiaoning Qian </p>

[Homepage](http://xqian37.github.io/)

**<font color=blue>[Note]</font>** This is currently a work in progress, will be updated as the material is tested in the class room.

All material open source under a Creative Commons license and free for use in non-commercial applications.

Source material used under the Creative Commons Attribution-NonCommercial 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/3.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.


**<font color=blue>[Acknowledgements]</font>** Some of the materials have been built upon the previous efforts by Huiling Liao, Xiaomeng Yan, and Prof. Jianhua Huang. Ziyu Xiang has helped to identify several energy applications in this set of notebooks. 

**<font color=blue>[Note]</font>** Microsoft has recently offerred a free **Machine Learning for Beginners** course: https://github.com/microsoft/ML-For-Beginners, from which some materials will be covered here too. 

### What is Machine Learning? 

In [None]:
from IPython.display import Image
from IPython.core.display import HTML 
Image(url= "https://www.catalyticgenerators.com/wp-content/uploads/2015/10/papayas-210x201.jpg")


In [None]:
Image(url= "https://thumbs.dreamstime.com/z/papaya-fruit-white-background-33588125.jpg", width=250, height=250)

In [None]:
Image(url= "https://thumbs.dreamstime.com/z/slice-juicy-papaya-fruit-14255329.jpg", width=200, height=250)

### Prof. Tom Mitchell's Machine Learning definition in 1997: 

"A computer program is said to learn from experience $E$ with respect to some task $T$ and some performance measure $P$, if its performance on $T$, as measured by $P$, improves with experience $E$."

### My personal take on Machine Learning

1. **Technical**: It requires mathematical modeling, <font color="blue">probability, optimization</font>, signal processing, statistics, and computer science to enable computers to learn from previous **data** to make **predictions**. 

2. **Layman**: Machine Learning is to enable computers to **extract knowledge** from data, make **prediction** given observations, and help better **decision making**. 

3. **Example**: The image has a papaya, with orange skin (**data/image analysis**). The papaya is probably ripe (**prediction (learning from experience)**) and tasty (**decision making**). 

In [None]:
Image(url= "https://www.researchgate.net/publication/330217507/figure/fig2/AS:712813135818756@1546959303617/Overview-of-categorical-types-and-different-machine-learning-algorithms-AI-artificial.png")

In [None]:
Image(url= "https://www.pmf-research.eu/wp-content/webp-express/webp-images/uploads/2019/05/venn-diagram.png.webp")

### References: 

1. [Introduction to Machine Learning with Python](https://www.oreilly.com/library/view/introduction-to-machine/9781449369880/)
     with GitHub [repository](https://github.com/amueller/introduction_to_ml_with_python)    
2. [Hands-on ML with scikit-learn](https://www.oreilly.com/library/view/hands-on-machine-learning/9781492032632/) 
     with GitHub [repository](https://github.com/ageron/handson-ml2)    
3. and many online resources...

### Practical Machine Learning

In [None]:
Image(url="http://xqian37.github.io/ML-forWSP.png", width=500)

# Supervised Learning

Supervised learning is one of the most commonly used and successful types of machine learning. Supervised learning is used whenever we want to **predict a certain outcome of interest** from a given input, and we have examples of input/output pairs. 

To establish notations for futher use, we'll use $x_i$ to denote the "input" variables, also called input **features**, and $y_i$ to denote "output" or **target** variable that we are trying to predict. A pair $(x_i,y_i)$ is called a **training example**, and the dataset that we'll be using to learn - a list of $n$ training examples $\{(x_i,y_i): i = 1,\cdots,n\}$ - is called a **training set**. Note that the subscript $i$ in the notation is simply an index into the training set. We will aslo use $\mathcal{X}$ denote the space of input values, and $\mathcal{Y}$ the space of output values.


To describe the supervised learning problem slightly more formally, our goal is, given a training set, to learn a function $h$: $\mathcal{X}\rightarrow\mathcal{Y}$ so that $h(x)$ is a "good" predictor for the corresponding value of $y$. For historical reasons, this function $h$ is called a **hypothesis** in statistics. 

There are two major types of supernvised machine learning problems, called **classification** and **regression**. 

### Classification

In classification, the goal is to predict a class label, which is a choice from a predefined list of possibilities. 
Classification is sometimes separated into **binary classification**, which is the special case of distinguishing between exactly two classes, and **multiclass classification**, which is classification between more than two classes. You can think of binary classification as trying to answer a yes/no question. Example are given below, 

- Binary Classification:  Classifying emails as either spam or not spam.
- Multiclass Classification: Predicting what language a website is in from the text on the website. The classes here would be a pre-defined list of possible languages.

### Regression

For regression tasks, the goal is to predict a continuous number, or a *floating-point* number in programming terms (or real number in mathematical terms). Examples: 
- Predicting a person’s annual income from their education, their age, and where they live is an example of a regression task. When predicting income, the predicted value is an amount, and can be any number in a given range.
- Predicting the yield of a corn farm given attributes such as previous yields, weather, and number of employees working on the farm. The yield again can be an arbitrary number.

## Essence of (supervised) Machine Learning

<img src="https://dataanalyticsbook.info/graphics/2_1.png" alt="EML">

<center><font size=5>Fit the data to a mapping function $\hat{y}=f(x)$ by <font color=red>optimization</font></font></center>

### Make it more challenging: <font color=red> optimization under uncertainty</font>


## Course Project

Potential resources from our own institute (Prof. Le Xie): 

https://github.com/tamu-engineering-research/COVID-EMDA

https://github.com/tamu-engineering-research/2021TXPowerOutage

# Questions? 

In [None]:
Image(url= "https://mirrors.creativecommons.org/presskit/buttons/88x31/png/by-nc-sa.png", width=100)

#jupyter-nbconvert --to slides Module1-ML-SL-concepts.ipynb --post serve