# TASK 1: Load and Explore the Dataset

In [1]:
!pip show pandas

Name: pandas
Version: 2.3.3
Summary: Powerful data structures for data analysis, time series, and statistics
Home-page: https://pandas.pydata.org
Author: 
Author-email: The Pandas Development Team <pandas-dev@python.org>
License: BSD 3-Clause License

 Copyright (c) 2008-2011, AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development Team
 All rights reserved.

 Copyright (c) 2011-2023, Open source contributors.

 Redistribution and use in source and binary forms, with or without
 modification, are permitted provided that the following conditions are met:

 * Redistributions of source code must retain the above copyright notice, this
   list of conditions and the following disclaimer.

 * Redistributions in binary form must reproduce the above copyright notice,
   this list of conditions and the following disclaimer in the documentation
   and/or other materials provided with the distribution.

 * Neither the name of the copyright holder nor the names of its
   contribut

In [2]:
import pandas as pd

In [3]:
df = pd.read_csv('data.csv')

In [4]:
print(df.head())

  Region  Tenure  SpendScore Subscribed
0  North    12.0       780.0         No
1   West     4.0       420.0        Yes
2   East     6.0       510.0         No
3   West     9.0       610.0         No
4   East    10.0         NaN        Yes


In [5]:
print("\nSummary Statistics for Numerical Columns:")
print(df[['Tenure', 'SpendScore']].describe())


Summary Statistics for Numerical Columns:
          Tenure  SpendScore
count   9.000000    9.000000
mean    9.111111  642.222222
std     3.333333  167.539382
min     4.000000  420.000000
25%     7.000000  510.000000
50%     9.000000  610.000000
75%    11.000000  780.000000
max    15.000000  910.000000


In [6]:
print("\nMissing Values Check (True = Missing, False = Present):")
print(df.isnull())


Missing Values Check (True = Missing, False = Present):
   Region  Tenure  SpendScore  Subscribed
0   False   False       False       False
1   False   False       False       False
2   False   False       False       False
3   False   False       False       False
4   False   False        True       False
5   False   False       False       False
6   False    True       False       False
7   False   False       False        True
8   False   False       False       False
9    True   False       False       False


In [8]:
print("\nCount of missing values per column:")
print(df.isnull().sum())


Count of missing values per column:
Region        1
Tenure        1
SpendScore    1
Subscribed    1
dtype: int64


# TASK 2: Handle Missing Values

In [7]:
df_drop = df.dropna()
print("Data Frame after dropping rows with missing values:")
print(df_drop)

Data Frame after dropping rows with missing values:
  Region  Tenure  SpendScore Subscribed
0  North    12.0       780.0         No
1   West     4.0       420.0        Yes
2   East     6.0       510.0         No
3   West     9.0       610.0         No
5  North     7.0       560.0        Yes
8   East    15.0       910.0         No


In [8]:
df_filled = df.copy()
# Fill the numerical columns with 0
df_filled['Tenure'] = df_filled['Tenure'].fillna(0)
df_filled['SpendScore'] = df_filled['SpendScore'].fillna(0)
# Fill the categorical columns with 'Unknown'
df_filled['Region'] = df_filled['Region'].fillna('Unknown')
df_filled['Subscribed'] = df_filled['Subscribed'].fillna('Unknown')
print("Data Frame after filling missing values:")
print(df_filled)

Data Frame after filling missing values:
    Region  Tenure  SpendScore Subscribed
0    North    12.0       780.0         No
1     West     4.0       420.0        Yes
2     East     6.0       510.0         No
3     West     9.0       610.0         No
4     East    10.0         0.0        Yes
5    North     7.0       560.0        Yes
6     West     0.0       490.0         No
7    North    11.0       830.0    Unknown
8     East    15.0       910.0         No
9  Unknown     8.0       670.0        Yes


In [9]:
print("Missing Values in df_filled (True = Missing):")
print(df_filled.isnull())

print("\nCount of missing values per column after filling:")
print(df_filled.isnull().sum())

Missing Values in df_filled (True = Missing):
   Region  Tenure  SpendScore  Subscribed
0   False   False       False       False
1   False   False       False       False
2   False   False       False       False
3   False   False       False       False
4   False   False       False       False
5   False   False       False       False
6   False   False       False       False
7   False   False       False       False
8   False   False       False       False
9   False   False       False       False

Count of missing values per column after filling:
Region        0
Tenure        0
SpendScore    0
Subscribed    0
dtype: int64


In [10]:
df['Tenure'] = df['Tenure'].fillna(df['Tenure'].mean())
print("Data Frame after filling Tenure with mean:")
print(df)

Data Frame after filling Tenure with mean:
  Region     Tenure  SpendScore Subscribed
0  North  12.000000       780.0         No
1   West   4.000000       420.0        Yes
2   East   6.000000       510.0         No
3   West   9.000000       610.0         No
4   East  10.000000         NaN        Yes
5  North   7.000000       560.0        Yes
6   West   9.111111       490.0         No
7  North  11.000000       830.0        NaN
8   East  15.000000       910.0         No
9    NaN   8.000000       670.0        Yes


In [11]:
df['SpendScore'] = df['SpendScore'].fillna(df['SpendScore'].median())
print("Data Frame after filling SpendScore with median:")
print(df)

Data Frame after filling SpendScore with median:
  Region     Tenure  SpendScore Subscribed
0  North  12.000000       780.0         No
1   West   4.000000       420.0        Yes
2   East   6.000000       510.0         No
3   West   9.000000       610.0         No
4   East  10.000000       610.0        Yes
5  North   7.000000       560.0        Yes
6   West   9.111111       490.0         No
7  North  11.000000       830.0        NaN
8   East  15.000000       910.0         No
9    NaN   8.000000       670.0        Yes


In [12]:
df['Subscribed'] = df['Subscribed'].fillna(df['Subscribed'].mode()[0])
print("Data Frame after filling Subscribed with mode:")
print(df)

Data Frame after filling Subscribed with mode:
  Region     Tenure  SpendScore Subscribed
0  North  12.000000       780.0         No
1   West   4.000000       420.0        Yes
2   East   6.000000       510.0         No
3   West   9.000000       610.0         No
4   East  10.000000       610.0        Yes
5  North   7.000000       560.0        Yes
6   West   9.111111       490.0         No
7  North  11.000000       830.0         No
8   East  15.000000       910.0         No
9    NaN   8.000000       670.0        Yes


In [13]:
df['Region'] = df['Region'].ffill()
print("Data Frame after applying forward fill to Region:")
print(df)

Data Frame after applying forward fill to Region:
  Region     Tenure  SpendScore Subscribed
0  North  12.000000       780.0         No
1   West   4.000000       420.0        Yes
2   East   6.000000       510.0         No
3   West   9.000000       610.0         No
4   East  10.000000       610.0        Yes
5  North   7.000000       560.0        Yes
6   West   9.111111       490.0         No
7  North  11.000000       830.0         No
8   East  15.000000       910.0         No
9   East   8.000000       670.0        Yes
