# What is Machine Learning?

## Introduction
Statistical learning is about using many tools to understand data. These tools can be grouped into two types: supervised and unsupervised. Supervised learning means you create a model to predict or estimate an outcome based on inputs. This kind of problem is found in many areas like business, medicine, space science, and government policies. Unsupervised learning means you don't have a specific outcome you're looking for, but you still try to find patterns or relationships in the data.


```{admonition} Best book on Machine Learning
My notes are based on this book reference.
https://www.statlearning.com/
```



## Simple Linear Regression

```{image} https://cdn.mathpix.com/snip/images/AWvQ3klUs21gxWHz6kmsBrUr8yzpr5VDeF6QePIYyZ4.original.fullsize.png
:align: center
:alt: Sample space
:width: 60%
```

Simple linear regression is a fundamental statistical method that allows us to summarize and study relationships between two continuous (quantitative) variables. This technique assumes a straight-line (linear) relationship between the variable being predicted (dependent variable) and the variable being used to predict (independent variable).

### Formula

$$
Y \approx \beta_0+\beta_1 X
$$

For example, X may represent TV advertising and Y may represent sales.
Then we can regress sales onto TV by ftting the model

$$
\text { sales } \approx \beta_0+\beta_1 \times \mathrm{TV}
$$


$\beta_0$ and $\beta_1$ are two unknown constants that represent he intercept and slope terms in the linear model. Together, $\beta_0$ and $\beta_1$ are known as the model coefficients or parameters. Once we have used our raining data to produce estimates $\hat{\beta}_0$ and $\hat{\beta}_1$ for the model coefficients, we can predict future sales on the basis of a particular value of TV advertising sy computing

$$
\hat{y}=\hat{\beta}_0+\hat{\beta}_1 x
$$

where $\hat{y}$ indicates a prediction of $Y$ on the basis of $X=x$. Here we use a hat symbol, ' , to denote the estimated value for an unknown parameter or coefficient, or to denote the predicted value of the response.

The equation for a simple linear regression line is $(y = b_0 + b_1x)$ , where:
- \(y\) is the dependent variable (the variable we are trying to predict).
- \(x\) is the independent variable (the variable we are using to make predictions).
- $(b_0)$ is the y-intercept (the value of \(y\) when \(x\) is 0).
- $(b_1)$ is the slope of the line (the change in \(y\) for a one-unit change in \(x\)).
  
### Estimating the Coefcients

In practice, $\beta_0$ and $\beta_1$ are unknown. So before we can use to make predictions, we must use data to estimate the coefficients. Let
$$
\left(x_1, y_1\right),\left(x_2, y_2\right), \ldots,\left(x_n, y_n\right)
$$

represent $n$ observation pairs, each of which consists of a measurement of $X$ and a measurement of $Y$. In the Advertising example, this data set consists of the TV advertising budget and product sales in $n=200$ different markets. (Recall that the data are displayed in Figure 2.1.) Our goal is to obtain coefficient estimates $\hat{\beta}_0$ and $\hat{\beta}_1$ such that the linear model (3.1) fits the available data well-that is, so that $y_i \approx \hat{\beta}_0+\hat{\beta}_1 x_i$ for $i=1, \ldots, n$. In other words, we want to find an intercept $\hat{\beta}_0$ and a slope $\hat{\beta}_1$ such that the resulting line is as close as possible to the $n=200$ data points. There are a number of ways of measuring closeness. However, by far the most common approach involves minimizing the least squares criterion, and we take that approach in this chapter. Alternative approaches will be considered in Chapter 6.

Let $\hat{y}_i=\hat{\beta}_0+\hat{\beta}_1 x_i$ be the prediction for $Y$ based on the $i$ th value of $X$. Then $e_i=y_i-\hat{y}_i$ represents the $i$ th residual - this is the difference between the $i$ th observed response value and the $i$ th response value that is predicted by our linear model. We define the residual sum of squares (RSS) as
$$
\mathrm{RSS}=e_1^2+e_2^2+\cdots+e_n^2
$$


### Example 1: Height and Weight

Imagine we have data on the heights and weights of a group of people. We could use simple linear regression to predict the weight of someone based on their height. In this example:
- The independent variable \(x\) would be height (e.g., in centimeters).
- The dependent variable \(y\) would be weight (e.g., in kilograms).
- By analyzing the data, we calculate the values of \(b_0\) (the intercept) and \(b_1\) (the slope).
- Suppose we find the regression equation to be \(y = 50 + 0.5x\). This means that for every additional centimeter in height, we expect the weight to increase by 0.5 kilograms, starting from 50 kilograms.

### Example 2: Advertising Budget and Sales

Consider a company that wants to understand the relationship between its advertising budget and sales:
- The independent variable \(x\) would be the advertising budget (e.g., in thousands of dollars).
- The dependent variable \(y\) would be sales (e.g., in millions of dollars).
- After collecting data and running a regression analysis, we might find the regression equation to be \(y = 10 + 3x\).
- This equation suggests that for every additional thousand dollars spent on advertising, sales are expected to increase by 3 million dollars, starting from a baseline of 10 million dollars in sales.

### Understanding the Regression Line

The regression line represents the best estimate of the relationship between the independent and dependent variables. It minimizes the sum of the squared differences between the observed values and the values predicted by the line (a method known as least squares).

### Assumptions of Simple Linear Regression

Simple linear regression analysis requires several key assumptions, including:
- Linear relationship: There must be a linear relationship between the independent and dependent variables.
- Independence: The residuals (differences between observed and predicted values) should be independent.
- Homoscedasticity: The residuals should have constant variance at every level of the independent variable.
- Normality: The residuals should be normally distributed.

In practical terms, simple linear regression is a powerful tool for prediction and understanding relationships between variables, but its simplicity also means it has limitations, especially when dealing with complex, non-linear relationships or interactions between multiple variables.






