<a href="https://colab.research.google.com/github/HarshitDoshi/AI_Playground/blob/master/AI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Harshit's Artificial Intelligence (AI) Notes

According to me, the field of AI is roughly divided into the following sub-domains:

- Data Science
- Learning
- Robotics

### Data Science
Data science is the field which deals with the large amount of data we require and have in the worldly problems. The following things maybe roughly put in the field of data science:

- Data Storage
- Databases
- Data Mining
- Data Exploration
- Big Data
- Data Cleansing
- Data Analysis
- Data Visualization And Representation

### Learning

Learning is the science in which we make computers or machines learn in a way which is quite similar to how a human learns. Learning is mostly about mathematics. Statistics plays a major role in learning and hence, in general, it is also known as "statistical learning". When statistical learning methods are converted to computer algorithms and programs, we call it "machine learning".

### Robotics

Robotics is the science of creating machines which try to replicate how a human body works using mechanical, electrical, computer and electronic systems.

---

## Statistical Learning

There are 2 aspects of every statistical learning problems. They are:

- Input Variable
- Output Variable

### Input Variables
Input variables are the various factors affecting the _output_ or the _result_ of any statistical learning problem. They are also known as _predictors_, _independent variables_ or _features_.

Example:

In the advertising budget problem, the budget of **television based advertisements**, **radio based advertisements** and **newspaper based advertisements** can be shown by variables, **$X_{1}$**, **$X_{2}$** and **$X_{3}$** respectively.

Factors | Variables
--- | ---
TV | $X_{1}$
RADIO | $X_{2}$
NEWSPAPER | $X_{3}$

These are the input variables for the advertising budget problem.

### Output Variables

The final result of a prediction or learning problem. It is the outcome of the whole problem. The output variables are dependent on the input variables for that particular problem. They are also known as _responses_ or _dependent variables_.

Example:
In the advertising budget problem, after we have predicted the **sales** from the given input variables, using one of the many statistical learning methods, we get the resultant **sales**. We can represent that as **$Y$**.

Result | Variable
--- | ---
Sales | $Y$

### Relationship between the input variables and the output variables

Thus, we observe a _quantitative response_ ($Y$), due to _$p$_ different _predictors_ ($X_{1}, X_{2}, X_{3}, \cdots , X_{p}$).

Summing up the $p$ predictors into one, we get:
$$
X = X_{1} + X_{2} + X_{3} + \cdots + X_{p}
$$

Also, $X$ affects $Y$. Thus, there is some relationship between $X$ and $Y$.

It can be shown by the following equation:
$$
Y = f(X) + \varepsilon
$$

where,
> - $f(X)$ is a fixed but unknown function which is dependent on $X$,
> - $\varepsilon$ is the _error term_ which is independent of $X$ and has a mean of _zero_. Errors are **_positive_** if the observation lies **above** the _curve of $f(X)$_ and **_negative_** if they lie **below** it.

Our goal is to find an estimate of $f$, which would fit $X$ to $Y$ with the minimum error.

### Why estimate $f$?

Our motive behind estimating $f$ can be one (or both) of the following:
1. Prediction
2. Inference

### Prediction

Prediction means trying to estimate a result for the future based on the past. We humans predict something by acknowledging the data from the past and trying to guess or estimate the future. Similarly, machines can be taught using various learning techniques and they can then estimate or guess the future. Examples of prediction or domains where prediction can be applied are:

- Weather forecasts
- Disaster analysis
- Stock markets
- Traffic forecast

For prediction, we have a set of _inputs_, $X$, readily available. The _output_, $Y$, is not available to us.

We can then predict $Y$ as follows:
$$
\hat{Y} = \hat{f}(X)
$$

Where,
> - $\hat{Y}$ is our prediction for $Y$, and,
> - $\hat{f}(X)$ is our estimate for $f(X)$.

Here, the error term, $\varepsilon$, averages to zero.

The accuracy of $\hat{Y}$, as a prediction of $Y$, depends on 2 quantities,
1. Reducible error
2. Irreducible error

In general, $\hat{f}$ will not be an accurate estimate for $f$, and it will introduce a _reducible error_. We can reduce or minimize it using better statistical learning methods.

Even though we perfectly estimate $f$ such that $\hat{Y} = f(X)$, our prediction would still contain an error, because, $Y$ is also a function of $\varepsilon$. Variability associated with $\varepsilon$ also affects the accuracy of our prediction. This is the _irreducible error_. We cannot remove it how much ever we try.

**Why is the irreducible error > 0?**

$\varepsilon$ may contain _unmeasurable variables_ that are useful in predicting $Y$. It may also contain _unmeasurable variations_.

$$
E(Y - \hat{Y})^{2} = E[f(X) + \varepsilon - \hat{f}(X)]^{2} = [f(X) - \hat{f}(X)]^{2} + var(\varepsilon)
$$

where,
> - $E(Y - \hat{Y})^{2}$ is the _expected value_,
> - $[f(X) - \hat{f}(X)]^{2}$ is the _squared difference_ between the predicted and actual value of $Y$. It is _reducible_ in nature.
> - $var({\varepsilon})$ is the variance associated with $\varepsilon$. It is irreducible in nature.

### Inference

We are often interested in knowing how the output of a problem is affected by the input. We want to know how $Y$ is affected as $X = X_{1} + X_{2} + X_{3} + \cdots + X_{p}$ changes.

Thus, we want to understand the relationship between $X$ and $Y$.

Here, $\hat{f}$ cannot be treated as a "_black box_". We need the exact form of $\hat{f}$.

**We may want to infer,**
- which predictors are _associated_ with the response,
- what the _relationship between each_ predictor and the response is (positive, negative, etc.).
- can the relationship between $Y$ and each predictor be adequately summarized using a linear equation or is the relationship more complicated (quadratic, cubic, etc.)?

### How do we estimate $f$?

To estimate $f$, we need to teach our method. To teach, we have some data which we call as the _training data_. Training data is the dataset or part of the dataset which is used to train or teach the method on how to estimate $f$.

**Characteristics of training data:**

- $i$ denotes the $i^{th}$ observation out of the total $n$ observations.
- $j$ denotes the $j^{th}$ predictor out of the $p$ total predictors.

Thus,

>> $x_{ij}$ is the $i^{th}$ observation of the $j^{th}$ predictor.

Thus,

>> $y_{i}$ is the response variable for the $i^{th}$ observation.

Thus,

Our training data set consists of,
$$
{(x_{1}, y_{1}), (x_{2}, y_{2}, \cdots, (x_{n}, y_{n}))}
$$
where,
$$
x_{i} = (x_{i1}, x_{i2}, \cdots, x_{ip})^{T}
$$

Our goal is to apply a statistical learning method to our training data in order to estimate the unknown fucntion $f$.

We want to find a function $f$ such that,

>> $Y \approx \hat{f}(X)$,

for any observation $(X, Y)$.

Most statistical methods for this task can be classified into:

1. Parametric methods
2. Non-parametrix methods

### Parametric Methods

**Involves a 2-step, _model based_ approach.**

**STEP 1:**

We, first, make an assumption about the functional form, or shape, of $f$.