# Course 2 week 1 lecture notebook

## Outline
Please click on the desired link to jump to that section of the lecture notebook!

[Numpy and Pandas functions](#numpy-pandas-functions)

[Linear model](#linear-model)

[Concordance index](#c-index)

[Combine features](#combine-features)

<a name="numpy-pandas-functions"></a>
## Numpy and Pandas functions

In [None]:
import numpy as np
import pandas as pd
from utils import load_data

- We will load a small dataset and practice some pandas functions that will be helpful in this week's assignment.

In [None]:
X, y = load_data(2)

In [None]:
X

In [None]:
y

### Mean
- Calculate the mean of the dataframe

In [None]:
X.mean()

Notice how it calculates the mean of each column.  
- Pandas will treat each column separately.  
- If you were working with a 2D array in numpy, taking the mean would take the mean of the entire matrix.
- Specifying the axis is a way to ensure that you will take the mean of each column instead of the entire table of data.

- Calculate the mean of each column (each feature)

In [None]:
X.mean(axis=0)

- Calculate the mean of each example (also known as each record, row, or patient)

In [None]:
X.mean(axis=1)

### Natural log
- Calculate the natural log of the data
- Notice, pandas doesn't have a `.log()` function, so we'll use numpy
- Also notice that in numpy and pandas, the log function is the natural log (the base is the number 'e').

In [None]:
np.log(X)

### This is the end of this practice section.

Please continue on with the lecture videos!

---

<a name='linear-model'></a>
## Linear model

We'll practice using a scikit-learn model for linear regression. You will do something similar in this week's assignment (but with a different model).

[sklearn.linear_model.LinearRegression()](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html)

- Import the model

In [None]:
from sklearn.linear_model import LinearRegression

- Create the model object

In [None]:
model = LinearRegression()
model

- We'll load in some data

In [None]:
from utils import load_data

In [None]:
X, y = load_data(100)

- Fit the model

In [None]:
model.fit(X, y)
model

- View the coefficients (these are the 'weights' associated with each feature). 
- You'll use the coefficients for making predictions.
$$\hat{y} = \beta_1x_1 + \beta_2x_2 + ... \beta_N x_N$$

In [None]:
model.coef_

### This is the end of this practice section.

Please continue on with the lecture videos!

---

<a name='c-index'></a>
## Concordance Index

- We'll generate some labels

In [None]:
import pandas as pd

- We will let `y` refer to the actual health outcome of the patient.
- 1 indicates disease, 0 indicates health (normal)

In [None]:
y = pd.Series([0,0,1,1,0])
y.name="health"
y

In [None]:
risk_score = pd.Series([2.2, 3.3, 4.4, 4.4])
risk_score.name='risk score'
risk_score

### Identify a permissible pair
- Look at the label, and see if they are different

In [None]:
if y[0] != y[1]:
    print(f"y[0]={y[0]} and y[1]={y[1]} is a permissible pair")
else:
    print(f"y[0]={y[0]} and y[1]={y[1]} is not a permissible pair")

In [None]:
if y[0] != y[2]:
    print(f"y[0]={y[0]} and y[2]={y[2]} is a permissible pair")
else:
    print(f"y[0]={y[0]} and y[2]={y[2]} is NOT permissible pair")

### Check for risk ties
- For permissible pairs, check if they have the same risk score

In [None]:
if risk_score[2] == risk_score[3]:
    print(f"patient 2 ({risk_score[2]}) and patient 3 ({risk_score[3]}) have a risk tie")
else:
    print(f"patient 2 ({risk_score[2]}) and patient 3 ({risk_score[3]}) DO NOT have a risk tie")

### Concordant pairs
- Check if a permissible pair is also a concordant pair
- We'll check one case, where the first patient is healthy and the second has the disease

In [None]:
if y[1] == 0 and y[2] == 1:
    if risk_score[1] < risk_score[2]:
        print(f"patient 1 and 2 is a concordant pair")

- Note that we checked the situation where patient 1 is healthy and patient 2 has the disease.
- We should also check the other situation where patient 1 has the disease and patient 2 is healthy.

You'll practice implementing this algorithm in this week's assignment!

### This is the end of this practice section.

Please continue on with the lecture videos!

---

<a name="combine-features"></a>
## Combine features


In [None]:
import pandas as pd

In [None]:
from utils import load_data

In [None]:
X, y = load_data(2)

In [None]:
X

In [None]:
feature_names = X.columns
feature_names

### Combine strings

- Use f-strings to combine two strings 
- There are other ways to do this, but Python's f-strings are quite useful).

In [None]:
name1 = feature_names[0]
name2 = feature_names[1]

In [None]:
name1

In [None]:
name2

In [None]:
combined_names = f"{name1}_&_{name2}"
combined_names

### Add two columns
- Add the values from two columns and put them into a new column.
- You'll do something similar in this week's assignment.

In [None]:
X['new_column'] = X['Age'] + X['Systolic_BP']
X

### This is the end of this practice section.

Please continue on with the lecture videos!

---