# What is Artificial Intelligence?

[Introduction to AI](https://www.youtube.com/watch?v=mJeNghZXtMo)

## Machine Learning

**Machine learning** is a field of computer science that gives computer systems the ability to "learn" with data, without being explicitly programmed.(1959 Arthur Samuel)

**Modern definition**: "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E." (Tom Mitchel 1998)

Example:
Let's say your email program watches which emails you do or do not flag as spam and based on that learns how to better filter spam.



*   Task(T): classify an email as spam or not.
*   Experience(E): Watch you label emails as spam or not.
*   Performance(P): The number of email correctly classified as spam or not.

### Machine Learning Algorithms:

![alt text](https://cdn-images-1.medium.com/max/2000/1*FUZS9K4JPqzfXDcC83BQTw.png)

#### Supervised learning
is where you have input variables ($x$) and an output variable ($Y$) and you use an algorithm to learn the mapping function from the input to the output.

$$Y = f(x)$$

Supervised learning problems are categorized into "regression" and "classification" problems.
##### Classification **"discrete label"** 
A classification problem is when the output variable is a category, such as “red” or “blue” or “disease” and “no disease”.


##### Regression **"continuous label"**
A regression problem is when the output variable is a real value, such as “dollars” or “weight”.


#### Unsupervised learning
is where you only have input data ($X$) and no corresponding output variables. The goal for unsupervised learning is to model the underlying structure or distribution in the data in order to learn more about the data.
![alt text](https://cdn-images-1.medium.com/max/1000/1*9bJ6MVxms5W8_9gfX7fR5Q.jpeg)

### Machine Learning Terminology:


#### Label 
A **label** is the thing we're predicting—the y variable in simple linear regression. The label could be the future price of wheat, the kind of animal shown in a picture, the meaning of an audio clip, or just about anything.


#### Features 
A feature is an input variable—the x variable in simple linear regression. A simple machine learning project might use a single feature, while a more sophisticated machine learning project could use millions of features, specified as:
$$x_1, x_2, ..., x_N$$

$$\mathbf{X} = \begin{bmatrix}
    x_{1}^{(1)} & x_{2}^{(1)} & x_{3}^{(1)} & \dots  & x_{4}^{(1)} \\
    x_{1}^{(2)} & x_{2}^{(2)} & x_{3}^{(2)} & \dots  & x_{4}^{(2)} \\
    \vdots & \vdots & \vdots & \ddots & \vdots \\
    x_{1}^{(150)} & x_{2}^{(150)} & x_{3}^{(150)} & \dots  & x_{4}^{(150)}
\end{bmatrix}.
$$

(The superscript denotes the *i*th row, and the subscript denotes the *j*th feature, respectively.)

In the spam detector example, the features could include the following:
* words in the email text
* sender's address
* time of day the email was sent

#### Examples
An **example** is a particular instance of data, **x**. We break examples into two categories:

* labeled examples
* unlabeled examples

A **labeled** example includes both feature(s) and the label. That is:
$$\text{labeled examples: {features, label}:} (x, y)$$
Use labeled examples to **train** the model. In our spam detector example, the labeled examples would be individual emails that users have explicitly marked as "spam" or "not spam."

An **unlabeled** example contains features but not the label. That is:
$$\text{unlabeled examples: {features, ?}:} (x, ?)$$
Once we've trained our model with labeled examples, we use that model to predict the label on unlabeled examples. In the spam detector, unlabeled examples are new emails that humans haven't yet labeled.

#### Models
A **model** defines the relationship between features and label. For example, a spam detection model might associate certain features strongly with "spam". Let's highlight two phases of a model's life:

* **Training** means creating or learning the model. That is, you show the model labeled examples and enable the model to gradually learn the relationships between features and label.

* **Inference** means applying the trained model to unlabeled examples. That is, you use the trained model to make useful predictions (y').

In [None]:
import pandas as pd
housing = pd.read_csv('sample_data/california_housing_train.csv')
housing_3_features = housing[['housing_median_age', 'total_rooms', 'total_bedrooms', 'median_house_value']]

In [None]:
housing_3_features.head()

Unnamed: 0,housing_median_age,total_rooms,total_bedrooms,median_house_value
0,15.0,5612.0,1283.0,66900.0
1,19.0,7650.0,1901.0,80100.0
2,17.0,720.0,174.0,85700.0
3,14.0,1501.0,337.0,73400.0
4,20.0,1454.0,326.0,65500.0
