# Tasks 

1. Load the Titanic dataset (sns.load_dataset("titanic")).
2. Identify columns with missing values.
3. For numerical columns: fill missing values with the median.
4. For categorical columns: fill missing values with the mode.
5. Compare the dataset before and after cleaning.

In [1]:
# Task 1. Load the Titanic dataset (sns.load_dataset("titanic"))

import seaborn as sns

titanic_original = sns.load_dataset("titanic")
titanic = sns.load_dataset("titanic")

In [2]:
# Task 2. Identify columns with missing values

missing_columns = titanic.columns[titanic.isnull().any()].tolist()
print("Columns with missing values:", missing_columns)

Columns with missing values: ['age', 'embarked', 'deck', 'embark_town']


In [3]:
# Task 3. For numerical columns: fill missing values with the median.

numerical_columns = titanic.select_dtypes(include=['float64', 'int64']).columns

for column in numerical_columns:
    median_value = titanic[column].median()
    titanic[column] = titanic[column].fillna(median_value)

print("Numerical columns after filling missing values with median:")
print(titanic[numerical_columns].head())

Numerical columns after filling missing values with median:
   survived  pclass   age  sibsp  parch     fare
0         0       3  22.0      1      0   7.2500
1         1       1  38.0      1      0  71.2833
2         1       3  26.0      0      0   7.9250
3         1       1  35.0      1      0  53.1000
4         0       3  35.0      0      0   8.0500


In [4]:
# Task 4. For categorical columns: fill missing values with the mode.
categorical_columns = titanic.select_dtypes(include=['object']).columns
for column in categorical_columns:
    mode_value = titanic[column].mode()[0]
    titanic[column] = titanic[column].fillna(mode_value)
print("Categorical columns after filling missing values with mode:")
print(titanic[categorical_columns].head())

Categorical columns after filling missing values with mode:
      sex embarked    who  embark_town alive
0    male        S    man  Southampton    no
1  female        C  woman    Cherbourg   yes
2  female        S  woman  Southampton   yes
3  female        S  woman  Southampton   yes
4    male        S    man  Southampton    no


In [6]:
# Task 5. Compare the dataset before and after cleaning.

print("Missing values BEFORE cleaning:\n", titanic_original.isnull().sum())

Missing values BEFORE cleaning:
 survived         0
pclass           0
sex              0
age            177
sibsp            0
parch            0
fare             0
embarked         2
class            0
who              0
adult_male       0
deck           688
embark_town      2
alive            0
alone            0
dtype: int64


In [7]:
print("\nMissing values AFTER cleaning:\n", titanic.isnull().sum())


Missing values AFTER cleaning:
 survived         0
pclass           0
sex              0
age              0
sibsp            0
parch            0
fare             0
embarked         0
class            0
who              0
adult_male       0
deck           688
embark_town      0
alive            0
alone            0
dtype: int64


In [8]:
differences = (titanic_original != titanic).sum()
print("\nNumber of changed values per column:\n", differences)


Number of changed values per column:
 survived         0
pclass           0
sex              0
age            177
sibsp            0
parch            0
fare             0
embarked         2
class            0
who              0
adult_male       0
deck           688
embark_town      2
alive            0
alone            0
dtype: int64
