In [43]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### 1. Data Setup
Inputs (X): First 5 passengers' age and fare
Target (y): Whether they survived (binary: 0 or 1)

In [44]:

titanic = sns.load_dataset('titanic').dropna(subset = ['age'])

X = titanic[['age', 'fare']].head()
y = titanic['survived'].head()

### 2. Adding Bias Column (Convention: FIRST column)
Why bias? Allows the model to shift predictions independently of inputs (like intercept in linear regression).

Result: X_with_bias shape = (5 rows, 3 columns):

In [45]:

# Adding the Bias Feature
def add_bias(X):
    bias = pd.Series(np.ones(len(X)), name = 'bias')
    ans = pd.concat([bias, X], axis = 1)
    return ans

X_with_bias = add_bias(X)
X_with_bias.head()


Unnamed: 0,bias,age,fare
0,1.0,22.0,7.25
1,1.0,38.0,71.2833
2,1.0,26.0,7.925
3,1.0,35.0,53.1
4,1.0,35.0,8.05


Shape: (5, 3) — 5 passengers, 3 features (bias + age + fare)

### 3. Initialize Weights (3 weights for 3 inputs)

In [46]:
np.random.seed(42)
weights = np.random.normal(size = (3, 3))

print(weights.shape)
print(weights)


(3, 3)
[[ 0.49671415 -0.1382643   0.64768854]
 [ 1.52302986 -0.23415337 -0.23413696]
 [ 1.57921282  0.76743473 -0.46947439]]


Meaning:

weights[0] = weight for bias column
weights[1] = weight for age column
weights[2] = weight for fare column

Shape: (3,) — one weight per input feature

### 4. Forward Pass: Weighted Sum

In [47]:
# Computing the weighted sum
layer_1_output = X_with_bias @ weights

print(layer_1_output)

            0          1          2
0   45.452664   0.274263  -7.907014
1  170.943350  45.669187 -41.715199
2   52.610752  -0.144332  -9.160457
3  137.658960  32.417152 -32.476195
4   66.515422  -2.155783 -11.326374


Matrix multiplication (@): Each row of X_with_bias × entire weights vector.

Math for passenger 0:
```
weighted_sum[0] = (1.0 × 0.4967) + (22 × -0.1383) + (7.25 × 0.6477)
                = 0.4967 - 3.0426 + 4.7003 
                = ≈ 2.1544
```

Result shape: (5,) — one weighted sum per passenger


### 5. Sigmoid Activation Function

In [48]:

# Sigmoid Activation
def sigmoid(X):
    return 1/(1 + np.exp(-X))

layer_1_activation = sigmoid(layer_1_output)
print(layer_1_activation)


     0         1             2
0  1.0  0.568139  3.680168e-04
1  1.0  1.000000  7.643973e-19
2  1.0  0.463980  1.051038e-04
3  1.0  1.000000  7.866251e-15
4  1.0  0.103792  1.205072e-05



Purpose: Converts any real number → probability (0 to 1)

Formula: $$f(x) = \frac{1}{1 + e^{-x}}$$

Effect:
- Large positive `x` → output ≈ 1 (high survival probability)
- Large negative `x` → output ≈ 0 (low survival probability)  
- `x` near 0 → output ≈ 0.5 (uncertain)

Example: `sigmoid(2.1544) ≈ 0.896` (89.6% survival chance)

Result: `output` shape = `(5,)` — probabilities for all 5 passengers

### 6. Making Predictions

Rule: Probability > 50% → predict "survived" (1), else "died" (0)

Example:
```
output = [0.896, 0.123, 0.754, 0.623, 0.761]
preds  = [1,    0,    1,    1,    1   ]


In [49]:
preds = np.where(layer_1_activation > 0.5, 1, 0)

preds

array([[1, 1, 0],
       [1, 1, 0],
       [1, 0, 0],
       [1, 1, 0],
       [1, 0, 0]])

### 7. Initialize Weights 2 fornext layer

In [50]:
layer_1_with_bias = add_bias(layer_1_activation)
np.random.seed(42)
weights_2 = np.random.normal(size = (4,))

### 8. Forward pass 2 and activation

In [51]:
layer_2_output = layer_1_with_bias  @ weights_2
layer_2_activation = sigmoid(layer_2_output)
print(layer_2_activation)

0    0.674144
1    0.732264
2    0.659064
3    0.732264
4    0.604845
dtype: float64


### 7. Calculate Accuracy

How it works:
- `preds == y` → `[True, False, True, ...]` → converts to `[1, 0, 1, ...]`
- `sum()` → number of correct predictions
- Divide by `len(y)` (5) → accuracy as decimal (0.0 to 1.0)

In [52]:
preds2 = np.where(layer_2_activation > 0.5, 1, 0)

starting_acc = sum(preds2 == y) / len(y)
starting_acc

0.6

### Key Concepts Reinforced

1. Single neuron = logistic regression with bias
2. Weights = learned importance of each feature  
3. Bias = baseline prediction shift
4. Sigmoid = turns numbers into probabilities
5. 0.5 threshold = decision boundary for binary classification

This is the simplest possible neural network — just one neuron doing weighted sum + sigmoid. Real networks stack many of these together in layers.

The `starting_acc` tells you how well these random initial weights performed before any training (likely poor, around 40-60%).