### The Sigmoid Function

$\hat y = \frac{1}{1 + e^{ax+b}}$

* a, b are model parameters
* control the position of the turning point
* and the steepness of the sigmoid
* a, b are calculated when training the model on training data
* x is an independent variable or feature (e.g. Age)

In [11]:
import math

age = 10
a = 0.5  # we will use Python to find these automatically
b = -20
1 / (1 + math.exp(a*age + b))

0.999999694097773

In [4]:
math.e  # Eulers number = base of natural logarithms

2.718281828459045

we use e because the derivative of $e^x$ is also $e^x$

### Logistic Regression is a linear model

* x1 - Age
* x2 - Pclass
* x3 - gender
* parameters a1..a3 balance the features (=coefficients)
* this is a linear combination (we multiply x values with coefficients and add them up)

$\hat y = \frac{1}{1 + e^{a_1x_1+a_2x_2+a_3x_3+b}}$

$\hat y = \frac{1}{1 + e^{a \cdot X+b}}$

In [19]:
age = 20
pclass = 1
gender = 1  # 1=female

a1 = 1.0
a2 = 2.5
a3 = 0.001
b = -20
1 / (1 + math.exp(a1*age + a2*pclass + a3*gender + b))

0.07578810603184725

In [27]:
import seaborn as sns

df = sns.load_dataset('titanic')

In [28]:
df.head(3)

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True


In [58]:
df = df.dropna(subset=['age'])  # Feature Engineering
X = df[['age', 'sibsp']]     # DataFrame == table  == matrix
y = df['survived']  # Series    == column == vector

In [59]:
X.shape, y.shape

((714, 2), (714,))

In [60]:
from sklearn.linear_model import LogisticRegression

In [61]:
model = LogisticRegression(C=1e9)  # initialize the model
model.fit(X, y)                    # train the model
# (find the best a, b values for this data)

LogisticRegression(C=1000000000.0)

In [62]:
# calculate a metric (accuracy == % of correct predictions)
model.score(X, y)

0.603641456582633

In [63]:
# inspect model parameters
model.coef_, model.intercept_

(array([[-0.01300558, -0.10069418]]), array([0.05470242]))

In [68]:
# make predictions
passengers = [[25.0, 0.0], [2.0, 0.0], [99.0, 3.0]]
model.predict(passengers)

array([0, 1, 0])

In [69]:
# probabilites - actual values of the sigmoid function
# left column 0: p(dead)
# right column 1: p(alive)
model.predict_proba(passengers)

array([[0.56720018, 0.43279982],
       [0.49282768, 0.50717232],
       [0.82272681, 0.17727319]])

In [None]:
!pip install scikit-learn