# Survival Model 
We are determining what factors would cause a passanger on the titantic to have a probability of dying? :(
- Variable	Definition	Key
- survival	`Survival	0 = No, 1 = Yes`
- pclass	`Ticket class	1 = 1st, 2 = 2nd, 3 = 3rd`
- sex
- Age	Age in years	
- sibsp`	# of siblings / spouses aboard the Titanic	`
- parch`	# of parents / children aboard the Titanic	`
- ticket	Ticket number	
- fare	Passenger fare	
- cabin	Cabin number	
- embarked`	Port of Embarkation	C = Cherbourg, Q = Queenstown, S = Southampton`

### 1. Importing the Dataset
- We will be importing the data with pandas csv

In [2]:
import pandas as pd

data = pd.read_csv('Titanic-Dataset.csv')
#data.fillna(data['Age'].mean(), inplace=True) #fills the row 'Age' that has no value, aka NaN, we will fill it with the mean of the column
#print(data['Age'].mean())
#data.isnull().sum() #this is to find the total of how many values are null

### 2. Separating our result from our feature

- We should consider what we should drop / what data is relevant that could play a factor if someone survived

In [4]:
data.dropna(subset=['Age'], inplace=True) #This will drop any corresponding row that has the age of 'NaN'

#males will be zero
#females will be 1
data.loc[data["Sex"] == "male", "Sex"] = 0
data.loc[data["Sex"] == "female", "Sex"] = 1


# dropping multiple rows that arent relevant
features = data.drop([ 'PassengerId', 'Name', 'Ticket', 'Fare', 'Embarked', 'Cabin', 'Survived'], axis=1) 
# this our target end result, we want the learning model to learn and train, and once it makes a prediction, we will show the expected result
labels = data['Survived'] 
features.head()

Unnamed: 0,Pclass,Sex,Age,SibSp,Parch
0,3,0,22.0,1,0
1,1,1,38.0,1,0
2,3,1,26.0,0,0
3,1,1,35.0,1,0
4,3,0,35.0,0,0


### 3. Standardizing and Scaling our data

- 'Standardize' is a concept to ensure all dataset features is normalized, ensuring all features contribute equally to the result, prevents bias

- 'Scaling' (aka normalization) adjusts the dataset to a range between 0 and 1. The purpose is to make sure all features are on the same scale, prevents weights being stronger than the other, making the machine learn better

In [None]:
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

#Standardize, then Scale
scaler = StandardScaler()
scaled_features = scaler.fit_transform(features)

### 4. Creating our data set

- We will be splitting our data into two sections, training tests and testing sets.

- Training sets allows the model to try to find patterns and predict what the output would it be, testing sets evaluates the final performance.

- scaled_features: this is the dataset that is scaled containing the features that we want to train our model. 

- target: 1D array containing what we are targetting from the features (the end result we want)

- test_size=0.2: specfies what the porpotion of the data to include in the testing set, which is 20%, while 80% will be for training set

- random_state=42: This sets a seed for the random number generator. usually its 0 or 42.

In [None]:
from torch import tensor
import torch

#Training and Testing Batches
from torch.utils.data import DataLoader, TensorDataset

#we have two batches, x and y                       
x_train, x_test, y_train, y_test = train_test_split(scaled_features, target, test_size=0.2, random_state=42) 
training_set = TensorDataset(x_train, y_train)
testing_set = TensorDataset(x_test, y_test)

#converts the testing batches to be tensors for the machine learning
x_train = tensor(x_train, dtype=torch.float32)
x_test = tensor(x_test, dtype=torch.float32)
y_train = tensor(x_train, dtype=torch.float32)
y_test = tensor(y_test, dtype=torch.float32)

training_loader = DataLoader(training_set, batch_size=32, shuffle=True)
testing_loader = DataLoader(testing_set, batch_size=32, shuffle=False)



### 5. Building the Model 
- We will create a model that will find patterns of our attributes, passed as neurons, and decide the final output if the person is alive or not.


###### What are Weights and Biases?

**WEIGHT**

- Before a neuron is created, a randomize weight value is applied to each feature, some features may have more weight, like 'Age' could affect the survivability more than just 'gender' 

**BIASES**
- Biases are for the machine to help identify and find patterns in neurons to the end result that are not linearly related to the features

- For instance, biases can represent the model's inclination or prior belief towards one class over another.

In [None]:
from torch import nn as nn; 

class BinaryModel(nn.Module): #we are calling it a binary model because we are simply checking 2 things, nothing more nothing less, binary (survived, not survived)
    def __init__(self):
        super(BinaryModel, self).__init__() #this binaryModel class is a child of the nn module class, we want to use their attributes for this class (nn contains lots of calculations and funcs that we want to utilize for our learning model)
        self.linear1 = nn.Linear(5, 7) #takes 5 of the features, and transforms to a tensor of 7, each neuron has a randomize weight for each feature, some features may 'weigh'/more influence more than others, but at random
        self.linear2 = nn.Linear(7, 4)
        self.linear3 = nn.Linear(4, 1)
        self.activation = nn.Sigmoid() #sigmoid handles only 1 neuron, provides and output between 0 and 1

            
    def forward(self, x):
        x = self.linear1(x) #x is the input/features
        x = self.linear2(x) 
        x = self.linear3(x)
        x = self.activation(x) 

model = BinaryModel()