



# <center><font color='teal'>Introduction to Statistical Learning 📈
## <center><font color='teal'>with Applications in Python 🐍</center>


## <font color='teal'>Datasets 📊
    
| Dataset Name | Description                                                       |
|--------------|-------------------------------------------------------------------|
| Auto         | Gas mileage, horsepower and other info for cars                   |
| Bikeshare    | Hourly usage of a bike sharing program in Washington, DC          |
| Boston       | Housing values & other info about Boston census tracts            |
| BrainCancer  | Survival times for patients diagnose with brain cancer            |
| Caravan      | Info about individuals offered caravan insurance                  |
| Carseats     | Info about car seat sales in 400 stores                           |
| College      | Demographic characteristics, tuition, and more for USA colleges   |
| Credit       | Info about credit card debt for 400 customers                     |
| Default      | Customer default records for a credit card company                |
| Fund         | Returns of 2,000 hedge fund managers over 50 months               |
| Hitters      | Records and salaries for baseball players                         |
| Khan         | Gene expression measurements for 4 cancer types                   |
| NCI60        | Gene expression measurements for 64 cancer cell lines             |
| NYSE         | Returns, volatility, and volume for the NY Stock Exchange         |
| OJ           | Sales info for Citrus Hill and Minute Maid orange juice           |
| Portfolio    | Past values for financial assets, for use in portfolio allocation |
| Publication  | Time to publication for 244 clinical trials                       |
| Smarket      | Daily percentage returns for S&P 500 over a 5-year period         |
| USArrests    | Crime statistics per 100K residents in 50 states of USA           |
| Wage         | Income survery data for men in central Atlantic region of USA     |
| Weekly       | 1,089 weekly stock market returns for 21 years                    |
    
All data sets are available in the ISLP package, with the exception of
USArrests, which is part of the base R distribution, but accessible from Python
    
`pip install ISLP`
    
Each dataset is also available in CSV format from the [Book Website](https://www.statlearning.com/)

In [None]:
# !pip install ISLP

In [2]:
import ISLP
import numpy as np
import pandas as pd

In [4]:
ISLP.load_data('Auto')

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,year,origin,name
0,18.0,8,307.0,130,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350.0,165,3693,11.5,70,1,buick skylark 320
2,18.0,8,318.0,150,3436,11.0,70,1,plymouth satellite
3,16.0,8,304.0,150,3433,12.0,70,1,amc rebel sst
4,17.0,8,302.0,140,3449,10.5,70,1,ford torino
...,...,...,...,...,...,...,...,...,...
387,27.0,4,140.0,86,2790,15.6,82,1,ford mustang gl
388,44.0,4,97.0,52,2130,24.6,82,2,vw pickup
389,32.0,4,135.0,84,2295,11.6,82,1,dodge rampage
390,28.0,4,120.0,79,2625,18.6,82,1,ford ranger


## <font color='teal'>Notation Summary 📚

- We will use $n$ to represent the number of distinct data points, or observations, in our sample. 
- We will let $p$ denote the number of variables that are available for use in making predictions

In general, we will let $x_{ij}$ represent the value of the $jth$ variable for the $ith$ observation.

Where $i = 1, 2,...,n$ and $j = 1, 2,...,p$

Throughout this book, $i$ will be used to index the samples or observations (from 1 to n) and
$j$ will be used to index the variables (from 1 to p). We let $X$ denote an n × p matrix whose $(i, j)th$ element is $x_{ij}$
    


$$
\mathbf{X} = \begin{pmatrix}
 x_{11} & x_{12} & ... & x_{1p}\\ 
 x_{21} & x_{22} & ... & x_{2p} \\ 
 \vdots & \vdots & \ddots  & \vdots \\ 
 x_{n1} & x_{n2} & ... & x_{np} 
\end{pmatrix}
$$

At times we will be interested in the rows of $X$, which we write as $x_{1}, x_{2},...,x_{n}$

Here $x_{i}$ is a vector of length $p$, containing the $p$ variable measurements for the $ith$ observation. (A row from $X$)

$$
x_{i} = \begin{pmatrix}
 x_{i1} \\ 
 x_{i2}  \\ 
 \vdots  \\ 
 x_{ip} 
\end{pmatrix}
$$

At other times we will instead be interested in the columns of $X$, which we write as $x_{1}, x_{2},..., x_{p}$ 


Each is a vector of length n.

$$
\mathbf{x_{j}} = \begin{pmatrix}
 x_{1j} \\ 
 x_{2j}  \\ 
 \vdots  \\ 
 x_{nj} 
\end{pmatrix}
$$

For example, for the Wage data, $ \mathbf{x_{1}}$ contains the n = 3,000 values for year.

Using this notation, the matrix $X$ can be written as

$$
\mathbf{X} = (\mathbf{x_{1}}, \mathbf{x_{2}} ... \mathbf{x_{p}})
$$

or

$$
\mathbf{X} = \begin{pmatrix}
 x^{T}_{1} \\ 
 x^{T}_{2}  \\ 
 \vdots  \\ 
 x^{T}_{n} 
\end{pmatrix}
$$

The T notation denotes the transpose of a matrix or vector. So for example,

$$
\mathbf{{X}}^{T} = \begin{pmatrix}
 x_{11} & x_{21} & ... & x_{n1}\\ 
 x_{12} & x_{22} & ... & x_{n2} \\ 
 \vdots & \vdots & \ddots  & \vdots \\ 
 x_{1p} & x_{2p} & ... & x_{np} 
\end{pmatrix}
$$

While

$$
x^{T}_{i} = (x_{i1},  x_{i2} ... x_{ip})
$$

We use $y_{i}$ to denote the $ith$ observation of the variable on which we wish to make predictions.  
Hence, we write the set of all $n$ observations in vector form as

$$
\mathbf{y} = \begin{pmatrix}
 y_{1} \\
 y_{2} \\
 \vdots \\
 y_{n}
 \end{pmatrix}
$$

Then our observed data consists of ${(x_{1}, y_{1}),(x_{2}, y_{2}),...,(x_{n}, y_{n})}$

Where each $x_{i}$ is a vector of length $p$. (If $p$ = 1, then $x_{i}$ is simply a scalar.)

In this text, a vector of length $n$ will always be denoted in lower case bold; e.g.

$$
\mathbf{a} = \begin{pmatrix}
 a_{1} \\
 a_{2} \\
 \vdots \\
 a_{n} \\
 \end{pmatrix}
$$

However, vectors that are not of length n (such as feature vectors of length $p$) will be denoted in lower case normal font, e.g. $a$. 

Scalars will also be denoted in lower case normal font, e.g. $a$. In the rare cases in which these two uses for lower case normal font lead to ambiguity, we will clarify which use is intended.

Matrices will be denoted using bold capitals, such as $\mathbf{A}$. Random variables will be denoted using capital normal font, e.g. $A$,
regardless of their dimensions

Occasionally we will want to indicate the dimension of a particular object. To indicate that an object is a scalar, we will use the notation $a \epsilon \mathbb{R}$. To indicate that it is a vector of length $k$, we will use $a \epsilon \mathbb{R}^{k}$ (or $a \epsilon \mathbb{R}^{n}$ if it is of length $n$). We will indicate that an object is an $r × s$ matrix using $\mathbf{A} \epsilon \mathbb{R}^{r×s}$

<font color='teal'>**Matrix Multiplication**



We have avoided using matrix algebra whenever possible. However, in a few instances it becomes too cumbersome to avoid it entirely. In these rare instances it is important to understand the concept of multiplying two matrices.
    
Suppose that 

<font size=6>$$\mathbf{A} \epsilon \mathbb{R}^{r×d}$$ 
    
    and 
    
<font size=6>$$\mathbf{B} \epsilon \mathbb{R}^{d×s}$$ 

We have avoided using matrix algebra whenever possible. However, in a few instances it becomes too cumbersome to avoid it entirely. In these rare instances it is important to understand the concept of multiplying two matrices.
    
Suppose that 

<font size=6>$$\mathbf{A} \epsilon \mathbb{R}^{r×d}$$ 
    
    and 
    
<font size=6>$$\mathbf{B} \epsilon \mathbb{R}^{d×s}$$ 
    
Then the product of $\mathbf{A}$ and $\mathbf{A}$ is denoted $\mathbf{AB}$. 
    
The $(i, j)th$ element of $\mathbf{AB}$ is computed by multiplying each element of the $ith$ row of $\mathbf{A}$ by the corresponding element of the $jth$ column of $\mathbf{B}$. That is, 
    
<font size=6>$$(\mathbf{AB})_{ij} = \Sigma^{d}_{k=1} a_{ik}b_{kj}$$
    
As an example, consider
    
$$
\mathbf{A} = \begin{pmatrix}
 1 & 2 \\ 
 3 & 4 \\ 
\end{pmatrix} 
\; and \; \mathbf{B} = \begin{pmatrix}
 5 & 6 \\ 
 7 & 8 \\ 
\end{pmatrix}
$$
    
Then 
    
$$\mathbf{AB} = \begin{pmatrix}
1 \times 5 + 2 \times 7 & 1 \times 6 + 2 \times 8 \\
3 \times 5 + 4 \times 7 & 3 \times 6 + 4 \times 8 \\
\end{pmatrix} = \begin{pmatrix}
19 & 22 \\
43 & 50 \\
\end{pmatrix}
$$
    

In [6]:
wage_df = ISLP.load_data('Wage')

In [8]:
wage_df.describe()

Unnamed: 0,year,age,logwage,wage
count,3000.0,3000.0,3000.0,3000.0
mean,2005.791,42.414667,4.653905,111.703608
std,2.026167,11.542406,0.351753,41.728595
min,2003.0,18.0,3.0,20.085537
25%,2004.0,33.75,4.447158,85.38394
50%,2006.0,42.0,4.653213,104.921507
75%,2008.0,51.0,4.857332,128.680488
max,2009.0,80.0,5.763128,318.34243
