# **Experiment 1: Linear Regression Implementation**

### Objective: To predict the Fare of passengers based on input features such as Pclass, Age, SibSp, Parch, and categorical variables.

### **Step 1: Dataset Import and Preprocessing**

In [42]:
# Import libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

# Load the dataset
df = pd.read_csv('Titanic.csv')

# Display the first few rows
print("First few rows of the dataset:")
df.head()

First few rows of the dataset:


Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [None]:

# Check and handle missing values
df['Age'].fillna(df['Age'].mean(), inplace=True)  # Fill missing Age with mean
df['Embarked'].fillna(df['Embarked'].mode()[0], inplace=True)  # Fill missing Embarked with mode
df.drop(columns=['Cabin'], inplace=True, errors='ignore')  # Drop Cabin column due to many missing values

# Convert categorical columns to numerical
df = pd.get_dummies(df, columns=['Sex', 'Embarked'], drop_first=True)

In [44]:
print("\nPreprocessed Dataset:")
df.head()


Preprocessed Dataset:


Unnamed: 0,PassengerId,Survived,Pclass,Name,Age,SibSp,Parch,Ticket,Fare,Sex_male,Embarked_Q,Embarked_S
0,1,0,3,"Braund, Mr. Owen Harris",22.0,1,0,A/5 21171,7.25,True,False,True
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",38.0,1,0,PC 17599,71.2833,False,False,False
2,3,1,3,"Heikkinen, Miss. Laina",26.0,0,0,STON/O2. 3101282,7.925,False,False,True
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",35.0,1,0,113803,53.1,False,False,True
4,5,0,3,"Allen, Mr. William Henry",35.0,0,0,373450,8.05,True,False,True


### **Step 2: Define Features and Target Variable**

In [45]:
# Define features (X) and target variable (y)
X = df[['Pclass', 'Age', 'SibSp', 'Parch', 'Sex_male', 'Embarked_Q', 'Embarked_S']]
y = df['Fare']

# Split the dataset into 80% training and 20% testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print("\nTraining and Testing Data Split:")
print(f"Training data size: {X_train.shape}")
print(f"Testing data size: {X_test.shape}")



Training and Testing Data Split:
Training data size: (712, 7)
Testing data size: (179, 7)


### **Step 3: Train the Linear Regression Model**

In [46]:
# Train Linear Regression model
lr_model = LinearRegression()
lr_model.fit(X_train, y_train)  # Train the model on training data

# Display model coefficients
print("\nModel Coefficients:")
print(f"Intercept: {lr_model.intercept_}")
print(f"Coefficients: {lr_model.coef_}")



Model Coefficients:
Intercept: 126.11076179425521
Coefficients: [-33.95376366  -0.08555086   5.79866895  10.84550666  -3.58633347
 -13.87123214 -21.18932071]


### **Step 4: Evaluate the Model**


In [47]:
# Make predictions on the test data
y_pred = lr_model.predict(X_test)

# Calculate evaluation metrics
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("\nEvaluation Metrics:")
print(f"Mean Absolute Error (MAE): {mae:.2f}")
print(f"Mean Squared Error (MSE): {mse:.2f}")
print(f"R-squared Value: {r2:.2f}")



Evaluation Metrics:
Mean Absolute Error (MAE): 20.81
Mean Squared Error (MSE): 928.73
R-squared Value: 0.40
