# Titanic Passenger Survival Prediction System

## SU Meta Description
<p>
Discover the Titanic Passenger Survival Prediction System, a machine learning model that analyzes passenger data to predict survival chances with accuracy and insights.
</p>


## Introduction

##### **Definition** A binary classification ML task that predicts whether a passenger survived (1) or did not survive (0) the Titanic disaster using their attributes (e.g., class, sex, age, fare, etc.)
##### **Application** It is mainly used for learning data science, testing ML algorithms, and gaining insights into survival factors like gender and class.

##  Input and Output

###  Inputs (Features / Attributes)

| **Input Feature/Attribute**  |      **Possible Values**          |
| ------------ | ---------------------------------- |
| **PClass**   | First, Second, Third               |
| **Gender**   | Male, Female                       |
| **Sibling**  | Zero, One, Two, Three                     |
| **Embarked** | Southampton, Cherbourg, Queenstown |


### Output (Prediction)

| **Output Attribute** | **Possbile Values** |
| -------------------- | ------------------------- |
| **Survived**         | Yes, No                   |



### Examples (Titanic Dataset)

| PClass | Gender | Sibling | Embarked    | Survived |
| :----: | :----: | :-----: | :---------- | :------: |
|  Third |  Male  |   One   | Southampton |    No    |
| Second | Female |   Zero  | Southampton |    Yes   |
|  Third |  Male  |   Zero  | Southampton |    No    |
|  Third | Female |  Three  | Southampton |    Yes   |
|  Third |  Male  |   Zero  | Queenstown  |    No    |


## Developer & System Information

|                                |                                                                                         |
| ------------------------------------------- | --------------------------------------------------------------------------------------------------- |
| **Developer Name**                          | Mr. Mohsin Afzal, Dr. Rao Muhammad Adeel Nawab                                                      |
| **LinkedIn** (Mohsin Afzal)                 | [Mohsin Afzal](https://www.linkedin.com/in/mohsin-mahmood-7a2139347/)                               |
| **LinkedIn** (Dr. Rao Muhammad Adeel Nawab) | [Dr. Rao Muhammad Adeel Nawab](https://www.linkedin.com/in/rao-muhammad-adeel-nawab/)               |
| **Program Name**                            | titanic\_project                                                                                    |
| **IDE**                                     | Jupyter Notebook                                                                                    |
| **Programming Language**                    | Python 3.13.5                                                                                       |
| **Operating System**                        | Windows 11                                                                                          |
| **Libraries**                               | NumPy 2.1.3, Pandas 2.2.3, Pickle (built-in), scikit-learn 1.6.1, PrettyTable 3.16.0, Astropy 7.0.0 |
| **Date of Completion**                      | 9-Sep-2025                                                                                          |
| **Website**                                 | [ilmoirfan.ai](https://ilmoirfan.ai), [ilmoirfan.com](https://ilmoirfan.com)                        |
| **Email**                                   | [info@ilmoirfan.ai](mailto:info@ilmoirfan.ai), [info@ilmoirfan.com](mailto:info@ilmoirfan.com)      |


##  Table of Content
- Step 1: Import Libraries  

- Step 2: Load Sample Data  

- Step 3: Understand and Pre-process Sample Data  
  - Step 3.1: Understand Sample Data  
  - Step 3.2: Pre-process Sample Data  

- Step 4: Feature Extraction  

- Step 5: Label Encoding the Sample Data (Input and Output is converted in Numeric Representation)  
  - Step 5.1: Train the Label Encoder  
  - Step 5.2: Label Encode the Output  
  - Step 5.3: Label Encode the Input  

- Step 6: Execute the Training Phase  
  - Step 6.1: Splitting Sample Data into Training Data and Testing Data  
  - Step 6.2: Splitting Input Vectors and Outputs / Labels of Training Data  
  - Step 6.3: Train the Support Vector Classifier  
  - Step 6.4: Save the Trained Model  

- Step 7: Execute the Testing Phase  
  - Step 7.1: Splitting Input Vectors and Outputs/Labels of Testing Data  
  - Step 7.2: Load the Saved Model  
  - Step 7.3: Evaluate the Machine Learning Model  
    - Step 7.3.1: Make Predictions with the Trained Models on Testing Data  
  - Step 7.4: Calculate the Accuracy Score  

- Step 8: Execute the Application Phase  
  - Step 8.1: Take Input from User  
  - Step 8.2: Convert User Input into Feature Vector (Exactly Same as Feature Vectors of Sample Data)  
  - Step 8.3: Label Encoding of Feature Vector (Exactly Same as Label Encoded Feature Vectors of Sample Data)  
  - Step 8.4: Load the Saved Model  
  - Step 8.5: Model Prediction  
    - Step 8.5.1: Apply Model on the Label Encoded Feature Vector of unseen instance and return Prediction to the User  

- Step 9: Execute the Feedback Phase  
  - Step 9.1: Collect Feedback from Users and Domain Experts on Performance of the Model Deployed in the Real World  
  - Step 9.2: Make a List of Potential Improvements  
  - Step 9.3: Improve the Model Based on Feedback

## Code – Titanic Prediction Survival

###  Step 1: Import Libraries

In [58]:
"""
Purpose of this code:
----------------------
This script imports the required libraries for building and evaluating a 
machine learning model. It sets up tools for data manipulation, preprocessing, 
splitting datasets, training a Support Vector Machine (SVM) model, 
evaluating accuracy, and displaying results in tabular form.
"""

# Import numerical computing library (used for arrays, math operations, etc.)
import numpy as np

# Import data manipulation library (used for DataFrames, reading/writing data, etc.)
import pandas as pd

# Import pickle (used for saving and loading Python objects/models)
import pickle

# Import train_test_split (used to divide dataset into training and testing sets)
from sklearn.model_selection import train_test_split

# Import LabelEncoder (used to convert categorical values into numeric codes)
from sklearn.preprocessing import LabelEncoder

# Import Support Vector Machine (SVM) model from scikit-learn
from sklearn import svm

# Import accuracy_score (used to evaluate the performance of the model)
from sklearn.metrics import accuracy_score

# Import PrettyTable (used to display results in a clean, table format)
from prettytable import PrettyTable   

# Import Table and Column from astropy (used to create structured tables)
from astropy.table import Table, Column


### Step 2: Load Sample Data

In [59]:
"""
Purpose of this section:
-------------------------
This part of the code loads a sample dataset from a CSV file into a 
pandas DataFrame. It then displays the dataset so that students can 
see the raw data being used for further processing and model training.
"""

# Read the CSV file and load it into a pandas DataFrame
# 'sample-data.csv' should be present in the working directory
sample_data = pd.read_csv("sample-data.csv")

# Print a header to indicate that sample data is being displayed
print("\n\nSample Data:")
print("============\n")

# Configure pandas to display all rows and columns (so nothing gets hidden with '...')
pd.set_option("display.max_rows", None, "display.max_columns", None)

# Print the complete dataset
print(sample_data)




Sample Data:

    PClass  Gender Sibling     Embarked Survived
0    Third    Male     One  Southampton       No
1   Second  Female    Zero  Southampton      Yes
2    Third    Male    Zero  Southampton       No
3    Third  Female   Three  Southampton      Yes
4    Third    Male    Zero   Queenstown       No
5    First  Female   Three  Southampton      Yes
6    Third    Male    Zero  Southampton       No
7    Third    Male    Zero  Southampton      Yes
8    First    Male    Zero  Southampton       No
9   Second    Male    Zero  Southampton      Yes
10   Third    Male     One   Queenstown       No
11   First    Male    Zero    Cherbourg      Yes
12   First    Male    Zero    Cherbourg       No
13  Second  Female    Zero  Southampton      Yes
14  Second    Male    Zero  Southampton       No
15   Third    Male     One    Cherbourg      Yes
16   Third    Male     Two    Cherbourg       No
17   Third    Male    Zero  Southampton      Yes
18   Third    Male    Zero  Southampton       No
19  

### Step 3: Understand and Pre-process Sample Data

####  Step 3.1: Understand Sample Data

In [60]:
"""
Purpose of this section:
-------------------------
This part of the code helps us understand the dataset by:
1. Printing the names of all attributes (columns) in the DataFrame.
2. Showing the total number of instances (rows) available in the dataset.
"""

# Print a header for clarity
print("\n\nAttributes in Sample Data:")
print("==========================\n")

# Display all column names (attributes) in the dataset
print(sample_data.columns)

# Print a header for instance count
print("\n\nNumber of Instances in Sample Data:", sample_data["PClass"].count())
print("========================================\n")




Attributes in Sample Data:

Index(['PClass', 'Gender', 'Sibling', 'Embarked', 'Survived'], dtype='object')


Number of Instances in Sample Data: 100



#### Step 3.2: Pre-process Sample Data
o   Sample Data is already Preprocessed
o   No Preprocessing needs to be Performed 

#### Step 4: Feature Extraction
o   Features are already Extracted
o   No Feature Extraction needs to be Performed

### Step 5: Label Encoding the Sample Data (Input and Output is converted in Numeric Representation)

#### Step 5.1: Train the Label Encoder

In [61]:
"""
Purpose of this section:
-------------------------
This part of the code sets up and trains Label Encoders. 
Label Encoding is used to convert categorical values (like 'Male', 'Female', 'First', 'Second') 
into numeric codes that machine learning algorithms can work with.

Steps:
1. Define sample label categories for each attribute.
2. Initialize a LabelEncoder for each attribute.
3. Train (fit) each LabelEncoder with its respective category values.
"""

# Define label categories for each feature
# These are possible values each column can have in the dataset
pclass = pd.DataFrame({"Pclass": ["First", "Second", "Third"]})
gender = pd.DataFrame({"Gender": ["Male", "Female"]})
sibling = pd.DataFrame({"Sibling": ["Zero", "One", "Two", "Three"]})
embarked = pd.DataFrame({"Embarked": ["Southampton", "Cherbourg", "Queenstown"]})
survived = pd.DataFrame({"Survived": ["Yes", "No"]})

# Create LabelEncoder objects for each categorical column
pclass_label_encoder = LabelEncoder()
gender_label_encoder = LabelEncoder()
sibling_label_encoder = LabelEncoder()
embarked_label_encoder = LabelEncoder()
survived_label_encoder = LabelEncoder()

# Train (fit) the encoders on the given categories
# np.ravel() flattens the DataFrame column into a 1D array for the encoder
pclass_label_encoder.fit(np.ravel(pclass))
gender_label_encoder.fit(np.ravel(gender))
sibling_label_encoder.fit(np.ravel(sibling))
embarked_label_encoder.fit(np.ravel(embarked))
survived_label_encoder.fit(np.ravel(survived))


#### Step 5.2: Label Encode the Output

In [62]:
"""
Purpose of this section:
-------------------------
This part of the code applies Label Encoding to the target/output variable 
("Survived") in the dataset. The categorical values ("Yes", "No") are 
converted into numerical values (e.g., 1 and 0) that can be used by 
machine learning algorithms.

Steps:
1. Make copies of the dataset for comparison (original vs. encoded).
2. Encode the "Survived" column using the trained LabelEncoder.
3. Display the original and encoded values side by side.
4. Print both original and encoded datasets for clarity.
5. Save the encoded dataset into a new CSV file.
"""

# Create copies of the dataset (one for encoded output, one to keep original)
sample_data_encoded_output = sample_data.copy()
original_sample_data = sample_data.copy()

# Transform "Survived" (categorical → numeric) using the trained LabelEncoder
print("\n\nSurvived Attribute After Label Encoding:")
print("========================================\n")
sample_data["encoded_survived"] = survived_label_encoder.transform(sample_data['Survived'])

# Show original "Survived" values and their numeric encodings
print(sample_data[["Survived", "encoded_survived"]])

# Replace original categorical columns with encoded values for the output dataset
sample_data_encoded_output[['PClass', 'Gender', 'Sibling', 'Embarked', 'Survived']] = \
    sample_data[['PClass', 'Gender', 'Sibling', 'Embarked', 'encoded_survived']]

# Print the original dataset
print("\n\nOriginal Sample Data:")
print("=====================\n")
print(original_sample_data)

# Print the dataset after encoding the output
print("\n\nSample Data after Label Encoding of Output:")
print("===========================================\n")
print(sample_data_encoded_output)

# Save the encoded dataset to a CSV file (useful for training/testing later)
sample_data_encoded_output.to_csv(r'sample-data-encoded-output.csv', index=False, header=True)




Survived Attribute After Label Encoding:

   Survived  encoded_survived
0        No                 0
1       Yes                 1
2        No                 0
3       Yes                 1
4        No                 0
5       Yes                 1
6        No                 0
7       Yes                 1
8        No                 0
9       Yes                 1
10       No                 0
11      Yes                 1
12       No                 0
13      Yes                 1
14       No                 0
15      Yes                 1
16       No                 0
17      Yes                 1
18       No                 0
19      Yes                 1
20       No                 0
21      Yes                 1
22       No                 0
23      Yes                 1
24       No                 0
25      Yes                 1
26       No                 0
27      Yes                 1
28       No                 0
29      Yes                 1
30       No               

#### Step 5.3: Label Encode the Input

In [63]:
"""
Purpose of this section:
-------------------------
This part of the code applies Label Encoding to the input features 
("PClass", "Gender", "Sibling", "Embarked"). Each categorical value 
is converted into a numeric value so that machine learning models 
can process them.

Steps:
1. Make copies of the dataset to preserve original values.
2. Encode each input attribute (PClass, Gender, Sibling, Embarked).
3. Display the original values alongside their numeric encodings.
4. Replace categorical columns with their encoded versions.
5. Print original vs. encoded datasets for clarity.
6. Save the fully encoded dataset into a CSV file.
"""

# Create copies of the dataset (to keep original values safe)
sample_data_encoded = sample_data_encoded_output.copy()
sample_data_encoded_output_orignal = sample_data_encoded_output.copy()

# Encode "PClass" attribute
print("\n\nPClass Attribute After Label Encoding:")
print("======================================\n")
sample_data_encoded_output["encoded_pclass"] = pclass_label_encoder.transform(sample_data_encoded_output['PClass'])
print(sample_data_encoded_output[["PClass", "encoded_pclass"]])

# Encode "Gender" attribute
print("\n\nGender Attribute After Label Encoding:")
print("======================================\n")
sample_data_encoded_output["encoded_gender"] = gender_label_encoder.transform(sample_data_encoded_output['Gender'])
print(sample_data_encoded_output[["Gender", "encoded_gender"]])

# Encode "Sibling" attribute
print("\n\nSibling Attribute After Label Encoding:")
print("=======================================\n")
sample_data_encoded_output["encoded_sibling"] = sibling_label_encoder.transform(sample_data_encoded_output['Sibling'])
print(sample_data_encoded_output[["Sibling", "encoded_sibling"]])

# Encode "Embarked" attribute
print("\n\nEmbarked Attribute After Label Encoding:")
print("========================================\n")
sample_data_encoded_output["encoded_embarked"] = embarked_label_encoder.transform(sample_data_encoded_output['Embarked'])
print(sample_data_encoded_output[["Embarked", "encoded_embarked"]])

# Replace categorical columns with encoded values in the dataset
sample_data_encoded[['PClass', 'Gender', 'Sibling', 'Embarked', 'Survived']] = \
    sample_data_encoded_output[['encoded_pclass', 'encoded_gender', 'encoded_sibling', 'encoded_embarked', 'Survived']]

# Print original dataset (with text categories)
print("\n\nOriginal Sample Data:")
print("=====================\n")
print(original_sample_data)

# Print encoded dataset (with numeric values instead of categories)
print("\n\nSample Data after Label Encoding:")
print("=================================\n")
print(sample_data_encoded)

# Save the encoded dataset to a new CSV file
sample_data_encoded.to_csv(r'sample-data-encoded.csv', index=False, header=True)




PClass Attribute After Label Encoding:

    PClass  encoded_pclass
0    Third               2
1   Second               1
2    Third               2
3    Third               2
4    Third               2
5    First               0
6    Third               2
7    Third               2
8    First               0
9   Second               1
10   Third               2
11   First               0
12   First               0
13  Second               1
14  Second               1
15   Third               2
16   Third               2
17   Third               2
18   Third               2
19   Third               2
20   Third               2
21   First               0
22   First               0
23  Second               1
24   Third               2
25   Third               2
26   Third               2
27   Third               2
28   Third               2
29   Third               2
30   Third               2
31   First               0
32   Third               2
33   First               0
34   First   

### Step 6: Execute the Training Phase


#### Step 6.1: Splitting Sample Data into Training Data and Testing Data

In [64]:
"""
Purpose of this section:
-------------------------
This part of the code splits the dataset into Training Data and Testing Data.
- Training Data is used to teach the machine learning model.
- Testing Data is used to check how well the model performs on unseen data.

Steps:
1. Use train_test_split() to split the dataset into training (80%) and testing (20%).
2. Save the training and testing sets as separate CSV files.
3. Print both datasets to verify the split.
"""

# Split dataset into training (80%) and testing (20%)
# random_state=0 ensures reproducibility (same split every time)
# shuffle=False keeps the original order of rows (no shuffling before splitting)
training_data_encoded, testing_data_encoded = train_test_split(
    sample_data_encoded,
    test_size=0.2,
    random_state=0,
    shuffle=False
)

# Save the training and testing datasets to CSV files
training_data_encoded.to_csv(r'training-data-encoded.csv', index=False, header=True)
testing_data_encoded.to_csv(r'testing-data-encoded.csv', index=False, header=True)

# Print training dataset
print("\n\nTraining Data:")
print("==============\n")
print(training_data_encoded)

# Print testing dataset
print("\n\nTesting Data:")
print("==============\n")
print(testing_data_encoded)




Training Data:

    PClass  Gender  Sibling  Embarked  Survived
0        2       1        0         2         0
1        1       0        3         2         1
2        2       1        3         2         0
3        2       0        1         2         1
4        2       1        3         1         0
5        0       0        1         2         1
6        2       1        3         2         0
7        2       1        3         2         1
8        0       1        3         2         0
9        1       1        3         2         1
10       2       1        0         1         0
11       0       1        3         0         1
12       0       1        3         0         0
13       1       0        3         2         1
14       1       1        3         2         0
15       2       1        0         0         1
16       2       1        2         0         0
17       2       1        3         2         1
18       2       1        3         2         0
19       2       0    

### Step 6.2: Splitting Input Vectors and Outputs / Labels of Training Data

In [65]:
"""
Purpose of this section:
-------------------------
This part of the code separates the features (input vectors) and 
the labels (output values) from the training dataset.

- Input Vectors (X): All columns except the last one (features like PClass, Gender, etc.)
- Output Labels (y): The last column (Survived) which we want the model to predict.

Steps:
1. Select all columns except the last as input features.
2. Select the last column as the target output (label).
3. Print both to verify the split.
"""

# Extract input vectors (features) from training data
# iloc[: , :-1] → select all rows, and all columns except the last one
print("\n\nInputs Vectors (Feature Vectors) of Training Data:")
print("==================================================\n")
input_vector_train = training_data_encoded.iloc[:, :-1]
print(input_vector_train)

# Extract output labels (target variable) from training data
# iloc[: , -1] → select all rows, but only the last column
print("\n\nOutputs/Labels of Training Data:")
print("================================\n")
print("  Survived")
output_label_train = training_data_encoded.iloc[:, -1]
print(output_label_train)




Inputs Vectors (Feature Vectors) of Training Data:

    PClass  Gender  Sibling  Embarked
0        2       1        0         2
1        1       0        3         2
2        2       1        3         2
3        2       0        1         2
4        2       1        3         1
5        0       0        1         2
6        2       1        3         2
7        2       1        3         2
8        0       1        3         2
9        1       1        3         2
10       2       1        0         1
11       0       1        3         0
12       0       1        3         0
13       1       0        3         2
14       1       1        3         2
15       2       1        0         0
16       2       1        2         0
17       2       1        3         2
18       2       1        3         2
19       2       0        0         0
20       2       1        3         0
21       0       0        3         2
22       0       1        3         0
23       1       0        3       

### 6.3: Train the Support Vector Classifier

In [66]:
"""
Purpose of this section:
-------------------------
This part of the code trains a Support Vector Classifier (SVC) on the training data. 
The SVC is a type of machine learning model that finds the best boundary 
to separate different classes (e.g., survived vs. not survived).

Steps:
1. Initialize the SVC model with parameters.
   - gamma='auto' : automatically sets the influence of data points.
   - random_state=0 : ensures results are reproducible.
2. Train (fit) the model using input features (X) and output labels (y).
3. Print the trained model with its parameters.
"""

print("\n\nTraining the Support Vector Classifier on Training Data")
print("========================================================\n")

# Print message for clarity
print("\nParameters and their values:")
print("============================\n")

# Initialize the Support Vector Classifier with chosen parameters
svc_model = svm.SVC(gamma='auto', random_state=0)

# Train the model using the training data (features and labels)
# input_vector_train = feature columns
# output_label_train = target column (Survived)
svc_model.fit(input_vector_train, np.ravel(output_label_train))

# Print trained model details (parameters and values)
print(svc_model)




Training the Support Vector Classifier on Training Data


Parameters and their values:

SVC(gamma='auto', random_state=0)


### Step 6.4: Save the Trained Model

In [67]:
"""
Purpose of this section:
-------------------------
This part of the code saves the trained Support Vector Classifier (SVC) 
model to your computer so it can be reused later without retraining.

Steps:
1. Use pickle.dump() to serialize (save) the trained model.
2. Store it as a .pkl file on disk.
3. The saved model can be loaded back later for predictions.
"""

# Save the trained SVC model to a file named "svc_trained_model.pkl"
# 'wb' = write in binary mode
pickle.dump(svc_model, open('svc_trained_model.pkl', 'wb'))


### Step 7: Execute the Testing Phase

#### Step 7.1: Splitting Input Vectors and Outputs/Labels of Testing Data

In [68]:
"""
Purpose of this section:
-------------------------
This part of the code separates the features (inputs) and the labels (outputs) 
from the testing dataset. 

- Input Vectors (X_test): All columns except the last one.
- Output Labels (y_test): The last column (Survived), which we use to check 
  how well the trained model performs.

Steps:
1. Extract feature columns from testing data.
2. Extract target/output column from testing data.
3. Print both for verification.
"""

# Extract input vectors (features) from testing data
# iloc[: , :-1] → select all rows, and all columns except the last one
print("\n\nInputs Vectors (Feature Vectors) of Testing Data:")
print("=================================================\n")
input_vector_test = testing_data_encoded.iloc[:, :-1]
print(input_vector_test)

# Extract output labels (target values) from testing data
# iloc[: , -1] → select all rows, and only the last column
print("\n\nOutputs/Labels of Testing Data:")
print("==============================\n")
print("  Survived")
output_label_test = testing_data_encoded.iloc[:, -1]
print(output_label_test)




Inputs Vectors (Feature Vectors) of Testing Data:

    PClass  Gender  Sibling  Embarked
80       2       1        2         2
81       2       0        3         2
82       2       1        3         2
83       0       1        0         2
84       1       1        3         2
85       2       0        0         1
86       1       1        3         2
87       0       0        0         2
88       2       1        3         2
89       1       0        3         2
90       1       1        3         2
91       1       0        3         2
92       2       1        3         0
93       0       1        0         2
94       0       1        0         2
95       2       1        0         0
96       2       1        0         1
97       2       1        0         2
98       1       1        0         2
99       2       1        0         2


Outputs/Labels of Testing Data:

  Survived
80    0
81    1
82    0
83    1
84    0
85    1
86    0
87    1
88    0
89    1
90    0
91    1
92    0

### Step 7.2: Load the Saved Model

In [69]:
"""
Purpose of this section:
-------------------------
This part of the code loads the previously saved Support Vector Classifier (SVC) 
model from disk into memory. By doing this, we can use the trained model for 
making predictions without retraining it from scratch.

Steps:
1. Use pickle.load() to read the model file (.pkl).
2. Load the trained model into a Python variable.
3. The loaded model is now ready for predictions.
"""

# Load the saved SVC model from the file "svc_trained_model.pkl"
# 'rb' = read in binary mode
model = pickle.load(open('svc_trained_model.pkl', 'rb'))


### Step 7.3: Evaluate the Machine Learning Model

#### Step 7.3.1: Make Predictions with the Trained Models on Testing Data

In [70]:
"""
Purpose of this section:
-------------------------
This part of the code evaluates the trained Support Vector Classifier (SVC) 
on the testing data. It uses the trained model to make predictions and then 
stores those predictions for further analysis.

Steps:
1. Use the trained model to predict outputs (Survived/Not Survived) for the test data.
2. Store the predictions in the testing dataset for comparison with actual values.
3. Save the dataset with predictions into a new CSV file.
4. Print the predictions to verify results.
"""

# Use the trained model to predict outcomes on the testing data
# input_vector_test = feature columns from the testing dataset
model_predictions = model.predict(input_vector_test)

# Copy the testing dataset (deep=True makes a full copy, not just references)
testing_data_encoded.copy(deep=True)

# Disable chained assignment warning (not needed for students to worry about)
pd.options.mode.chained_assignment = None

# Add predictions as a new column to the testing dataset
testing_data_encoded["Predictions"] = model_predictions

# Save the predictions into a CSV file
testing_data_encoded.to_csv(r'model-predictions.csv', index=False, header=True)

# Print the predictions alongside the testing dataset
model_predictions = testing_data_encoded
print("\n\nPredictions Returned by svc_trained_model:")
print("==========================================\n")
print(model_predictions)




Predictions Returned by svc_trained_model:

    PClass  Gender  Sibling  Embarked  Survived  Predictions
80       2       1        2         2         0            0
81       2       0        3         2         1            1
82       2       1        3         2         0            0
83       0       1        0         2         1            0
84       1       1        3         2         0            0
85       2       0        0         1         1            1
86       1       1        3         2         0            0
87       0       0        0         2         1            1
88       2       1        3         2         0            0
89       1       0        3         2         1            1
90       1       1        3         2         0            0
91       1       0        3         2         1            1
92       2       1        3         0         0            0
93       0       1        0         2         1            0
94       0       1        0         2  

### Step 7.4: Calculate the Accuracy Score

In [71]:
"""
Purpose of this section:
-------------------------
This part of the code calculates how accurate the trained model is on the 
testing dataset. Accuracy is the ratio of correctly predicted values to the 
total number of predictions.

Steps:
1. Compare predicted values with the actual labels (Survived).
2. Use accuracy_score() from sklearn to compute accuracy.
3. Print the accuracy score rounded to 2 decimal places.
"""

# Calculate accuracy by comparing predictions with actual labels
# "Survived"   = actual values from the testing dataset
# "Predictions" = values predicted by the trained model
model_accuracy_score = accuracy_score(
    model_predictions["Survived"],
    model_predictions["Predictions"]
)

# Print the accuracy score
print("\n\nAccuracy Score:")
print("===============\n")
print(round(model_accuracy_score, 2))  # rounded to 2 decimal places




Accuracy Score:

0.8


### Step 8: Execute the Application Phase

#### Step 8.1: Take Input from User

In [72]:
"""
Purpose of this section:
-------------------------
This part of the code allows the user to enter their own data 
(PClass, Gender, Sibling, Embarked). These values will later be 
encoded and used by the trained model to make a prediction.

Steps:
1. Ask the user to input their class (First, Second, Third).
2. Ask the user to input their gender (Male, Female).
3. Ask the user to input the number of siblings (Zero, One, Two, Three).
4. Ask the user to input their port of embarkation 
   (Cherbourg, Southampton, Queenstown).
"""

# Take Passenger Class input from user
pclass_input = input("\nPlease enter PClass here (First, Second, Third): ").strip()

# Take Gender input from user
gender_input = input("\nPlease enter your Gender here (Male, Female): ").strip()

# Take Sibling count input from user
sibling_input = input("\nPlease enter your Sibling here (Zero, One, Two, Three): ").strip()

# Take Port of Embarkation input from user
embarked_input = input("\nPlease enter Embarked here (Cherbourg, Southampton, Queenstown): ").strip()



Please enter PClass here (First, Second, Third):  First

Please enter your Gender here (Male, Female):  Male

Please enter your Sibling here (Zero, One, Two, Three):  Zero

Please enter Embarked here (Cherbourg, Southampton, Queenstown):  Queenstown


### Step 8.2: Convert User Input into Feature Vector (Exactly Same as Feature Vectors of Sample Data)

In [73]:
"""
Purpose of this section:
-------------------------
This part of the code takes the user's input (entered as text) 
and converts it into a structured feature vector (DataFrame). 
This feature vector will later be encoded into numbers so the 
trained model can use it for prediction.

Steps:
1. Create a DataFrame with the user input (PClass, Gender, Sibling, Embarked).
2. Display the DataFrame so the user can confirm their entered values.
"""

# Convert user input into a DataFrame (feature vector)
# Each input is placed inside a list so it becomes a row in the DataFrame
user_input = pd.DataFrame({
    'PClass': [pclass_input],
    'Gender': [gender_input],
    'Sibling': [sibling_input],
    'Embarked': [embarked_input]
})

# Print the feature vector created from user input
print("\n\nUser Input Feature Vector:")
print("==========================\n")
print(user_input)




User Input Feature Vector:

  PClass Gender Sibling    Embarked
0  First   Male    Zero  Queenstown


### Step 8.3: Label Encoding of Feature Vector (Exactly Same as Label Encoded Feature Vectors of Sample Data)

In [74]:
"""
Purpose of this section:
-------------------------
This part of the code encodes the user's input into numeric values 
using the previously trained LabelEncoders. Since machine learning 
models work with numbers (not text), this step is necessary before 
making predictions.

Steps:
1. Copy the user input DataFrame to preserve original values.
2. Transform each categorical column (PClass, Gender, Sibling, Embarked) 
   into its numeric representation using the trained encoders.
3. Print both the original and encoded feature vectors for comparison.
"""

# Make a copy of the user input so we keep both original and encoded versions
unseen_data_features = user_input.copy()

# Encode each attribute using its respective LabelEncoder
unseen_data_features["PClass"] = pclass_label_encoder.transform(user_input['PClass'])
unseen_data_features["Gender"] = gender_label_encoder.transform(user_input['Gender'])
unseen_data_features["Sibling"] = sibling_label_encoder.transform(user_input['Sibling'])
unseen_data_features["Embarked"] = embarked_label_encoder.transform(user_input['Embarked'])

# Print the original user input feature vector
print("\n\nUser Input Feature Vector:")
print("==========================\n")
print(user_input)

# Print the encoded feature vector (numeric version for the model)
print("\n\nUser Input Encoded Feature Vector:")
print("==================================\n")
print(unseen_data_features)




User Input Feature Vector:

  PClass Gender Sibling    Embarked
0  First   Male    Zero  Queenstown


User Input Encoded Feature Vector:

   PClass  Gender  Sibling  Embarked
0       0       1        3         1


### Step 8.4: Load the Saved Model

In [75]:
"""
Purpose of this section:
-------------------------
This part of the code loads the previously saved Support Vector Classifier (SVC) 
model from the `.pkl` file into memory. By doing this, we can use the trained 
model for making predictions on new (unseen) data without retraining it.

Steps:
1. Use pickle.load() to read the model file from disk.
2. Load the trained model into a Python variable.
3. The loaded model is ready to make predictions.
"""

# Load the trained SVC model from the file "svc_trained_model.pkl"
# 'rb' = read in binary mode
model = pickle.load(open('svc_trained_model.pkl', 'rb'))


### Step 8.5: Model Prediction

#### Step 8.5.1: Apply Model on the Label Encoded Feature Vector of unseen instance and return Prediction to the User

In [76]:
"""
Purpose of this section:
-------------------------
This part of the code uses the trained Support Vector Classifier (SVC) 
to predict whether a new passenger (user input) survived or not.

Steps:
1. Provide the encoded user input to the trained model.
2. The model predicts survival (1) or not survival (0).
3. Translate the numeric prediction into a readable label ("SURVIVED" / "NOT SURVIVED").
4. Display the prediction in a nicely formatted PrettyTable.
"""

# Use the trained model to predict survival based on user input
predicted_survival = model.predict(unseen_data_features)

# Convert numeric prediction (0/1) into a meaningful label
if predicted_survival == 1: 
    prediction = "SURVIVED"
if predicted_survival == 0:
    prediction = "NOT SURVIVED"

# Create a PrettyTable to display the prediction nicely
pretty_table = PrettyTable()
pretty_table.add_column("       ** Prediction **       ", [prediction])

# Print the result
print(pretty_table)


+--------------------------------+
|        ** Prediction **        |
+--------------------------------+
|            SURVIVED            |
+--------------------------------+


### Step 9: Execute the Feedback Phase

#### Step 9.1: Collect Feedback from Users and Domain Experts on Performance of the Model Deployed in the Real World

#### Step 9.2: Make a List of Potential Improvements

#### Step 9.3: Improve the Model Based on Feedback

## Conclusion
The analysis shows that survival on the Titanic was strongly influenced by gender, class, and age. Women, children, and first-class passengers had the highest survival rates. The predictive models confirmed these patterns, with [your best model] performing best overall. Despite dataset limitations, the study highlights how socio-economic and demographic factors shaped survival outcomes in the disaster.