Using the titanic dataset you will work through the basic steps of the neural network's forward pass.  This means adding a bias feature, creating weights, applying the weights and activation function, and making predictions using the results of a sigmoid activation. 

As a convention, add the bias column as the first column in the dataset. 

The first step when defining an ANN is to assign a weight for each input of each node.  

Assuming the ANN used in this exercise uses one single node, this amounts to creating three initial weights -- one for the bias, and one weight for each feature.  

Below, use a `random.seed(42)` for reproducibility. Next, use the random normal number generator from `NumPy` to generate an array of three weights for the bias and the features.  Assign this array to the variable `weights`.

Use matrix multiplication to apply the `weights` to your data `X_with_bias` and assign the results to `weighted_sum` 

After computing the weighted sum for the data you are to apply a sigmoid activation function.  Below, complete the definition of the `sigmoid` function that takes in an array of values and returns the result of applying the transformation:

$$f(x) = \frac{1}{1 + e^{-x}}$$

Finally, apply the `sigmoid` function to the array `weighted_sum` and assign the results to the variable `output`.

Recall that the output of the sigmoid can be interpreted as a probability of being a member of the positive class -- survived.   Use the `output` variable to make predictions for your first pass through the neural network.  What is the accuracy of your predictions?  Assign the predictions as `preds` and accuracy as `starting_acc`. 

In [12]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### 1. Data Setup
Inputs (X): First 5 passengers' age and fare
Target (y): Whether they survived (binary: 0 or 1)

In [13]:

titanic = sns.load_dataset('titanic').dropna(subset = ['age'])

X = titanic[['age', 'fare']].head()
y = titanic['survived'].head()

### 2. Adding Bias Column (Convention: FIRST column)
Why bias? Allows the model to shift predictions independently of inputs (like intercept in linear regression).

Result: X_with_bias shape = (5 rows, 3 columns):

In [14]:

# Adding the Bias Feature
def add_bias(X):
    bias = pd.Series(np.ones(len(X)), name = 'bias')
    ans = pd.concat([bias, X], axis = 1)
    return ans

X_with_bias = add_bias(X)
X_with_bias.head()


Unnamed: 0,bias,age,fare
0,1.0,22.0,7.25
1,1.0,38.0,71.2833
2,1.0,26.0,7.925
3,1.0,35.0,53.1
4,1.0,35.0,8.05


Shape: (5, 3) — 5 passengers, 3 features (bias + age + fare)

### 3. Initialize Weights (3 weights for 3 inputs)

In [15]:
np.random.seed(42)
weights = np.random.normal(size = 3)

print(weights.shape)
print(weights)


(3,)
[ 0.49671415 -0.1382643   0.64768854]


Meaning:

weights[0] = weight for bias column
weights[1] = weight for age column
weights[2] = weight for fare column

Shape: (3,) — one weight per input feature

### 4. Forward Pass: Weighted Sum

In [16]:
# Computing the weighted sum
weighted_sum = X_with_bias @ weights

print(weighted_sum)

0     2.150641
1    41.412047
2     2.034774
3    30.049725
4     0.871356
dtype: float64


Matrix multiplication (@): Each row of X_with_bias × entire weights vector.

Math for passenger 0:
```
weighted_sum[0] = (1.0 × 0.4967) + (22 × -0.1383) + (7.25 × 0.6477)
                = 0.4967 - 3.0426 + 4.7003 
                = ≈ 2.1544
```

Result shape: (5,) — one weighted sum per passenger


### 5. Sigmoid Activation Function

In [17]:

# Sigmoid Activation
def sigmoid(X):
    return 1/(1 + np.exp(-X))

output = sigmoid(weighted_sum)
print(output)


0    0.895729
1    1.000000
2    0.884400
3    1.000000
4    0.705028
dtype: float64



Purpose: Converts any real number → probability (0 to 1)

Formula: $$f(x) = \frac{1}{1 + e^{-x}}$$

Effect:
- Large positive `x` → output ≈ 1 (high survival probability)
- Large negative `x` → output ≈ 0 (low survival probability)  
- `x` near 0 → output ≈ 0.5 (uncertain)

Example: `sigmoid(2.1544) ≈ 0.896` (89.6% survival chance)

Result: `output` shape = `(5,)` — probabilities for all 5 passengers

### 6. Making Predictions

Rule: Probability > 50% → predict "survived" (1), else "died" (0)

Example:
```
output = [0.896, 0.123, 0.754, 0.623, 0.761]
preds  = [1,    0,    1,    1,    1   ]


In [18]:
preds = np.where(output > 0.5, 1, 0)

preds

array([1, 1, 1, 1, 1])

### 7. Calculate Accuracy

How it works:
- `preds == y` → `[True, False, True, ...]` → converts to `[1, 0, 1, ...]`
- `sum()` → number of correct predictions
- Divide by `len(y)` (5) → accuracy as decimal (0.0 to 1.0)

In [19]:
starting_acc = sum(preds == y) / len(y)
starting_acc

0.6

## Complete Data Flow Visualization

```
Raw Data (2 cols)    →    With Bias (3 cols)    →    Weights (3,)    →    Weighted Sum (5,)    →    Sigmoid (5,)    →    Predictions (5,)
┌──────┬──────┐      ┌──────┬──────┬──────┐      ┌──────────────┐      ┌──────────┐      ┌──────────┐      ┌──────────┐
│ age │ fare │  →   │ bias │ age  │ fare │  → ×  │ [w_bias,     │  →   │ [2.15,    │  →   │ [0.896,   │  →  │ [1, 0, 1…│
│ 22  │ 7.25 │      │  1.0 │ 22   │ 7.25 │      │  w_age,      │      │  -1.23,   │      │  0.123,   │      │           │
│ 38  │71.28 │      │  1.0 │ 38   │71.28 │      │  w_fare]     │      │   1.45,   │      │   0.754,  │      │           │
└──────└──────┘      └──────└──────└──────┘      └──────────────┘      └──────────┘      └──────────┘      └──────────┘
     (5,2)                  (5,3)                      (3,)                   (5,)               (5,)               (5,)
```


### Key Concepts Reinforced

1. Single neuron = logistic regression with bias
2. Weights = learned importance of each feature  
3. Bias = baseline prediction shift
4. Sigmoid = turns numbers into probabilities
5. 0.5 threshold = decision boundary for binary classification

This is the simplest possible neural network — just one neuron doing weighted sum + sigmoid. Real networks stack many of these together in layers.

The `starting_acc` tells you how well these random initial weights performed before any training (likely poor, around 40-60%).