<p style="text-align:center">
    <a href="https://skills.network" target="_blank">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="200" alt="Skills Network Logo"  />
    </a>
</p>


# **Finding Missing Values**


Estimated time needed: **30** minutes


Data wrangling is the process of cleaning, transforming, and organizing data to make it suitable for analysis. Finding and handling missing values is a crucial step in this process to ensure data accuracy and completeness. In this lab, you will focus exclusively on identifying and handling missing values in the dataset.


## Objectives


After completing this lab, you will be able to:


-   Identify missing values in the dataset.

- Quantify missing values for specific columns.

- Impute missing values using various strategies.


## Hands on Lab


##### Setup: Install Required Libraries


In [None]:
!pip install pandas
!pip install matplotlib
!pip install seaborn



##### Import Necessary Modules:


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

## Tasks


<h2>1. Load the Dataset</h2>
<p>
We use the <code>pandas.read_csv()</code> function for reading CSV files. However, in this version of the lab, which operates on JupyterLite, the dataset needs to be downloaded to the interface using the provided code below.
</p>


The functions below will download the dataset into your browser:



In [None]:
# Define the URL of the dataset
file_path = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/n01PQ9pSmiRX6520flujwQ/survey-data.csv"

# Load the dataset into a DataFrame
df = pd.read_csv(file_path)

# Display the first few rows to ensure it loaded correctly
print(df.head())


### 2. Explore the Dataset
##### Task 1: Display basic information and summary statistics of the dataset.


In [None]:
## Write your code here
df.describe(include='all')

### 3. Finding Missing Values
##### Task 2: Identify missing values for all columns.


In [None]:
## Write your code here

missing_data=df.isnull()

for column in missing_data.columns.values.tolist():
    print(column)
    print(missing_data[column].value_counts())
    print(" ") 

##### Task 3: Visualize missing values using a heatmap (Using seaborn library).



In [None]:
## Write your code here

plt.figure(figsize=(10, 6))

# Create a heatmap to visualize missing values
sns.heatmap(df.isnull(), cbar=False, cmap='viridis', yticklabels=False)

# Add labels and title
plt.title('Missing Values Heatmap')
plt.xlabel('Columns')
plt.ylabel('Rows')

# Show the plot
plt.show()


##### Task 4: Count the number of missing rows for a specific column (e.g., `Employment`).


In [None]:
## Write your code here

missing_count_employment = df['Employment'].isnull().sum()

# Print the count of missing values
print('number of missing cells in Employment column:', missing_count_employment)

### 4. Imputing Missing Values
##### Task 5: Identify the most frequent (majority) value in a specific column (e.g., `Employment`).


In [None]:
## Write your code here
most_frequent_value = df['Employment'].value_counts().idxmax()


##### Task 6: Impute missing values in the `Employment` column with the most frequent value.



In [None]:
## Write your code here

# Impute missing values in the 'Employment' column with the most frequent value
df['Employment'] = df['Employment'].fillna(most_frequent_value)

### 5. Visualizing Imputed Data
##### Task 7: Visualize the distribution of a column after imputation (e.g., `Employment`).


In [None]:
## Write your code here

df_employment = df['Employment'].value_counts()

df_employment.plot(kind='pie')
plt.title('pie chart to show distribution of Employment type')
plt.show()

### Summary


In this lab, you:
- Loaded the dataset into a pandas DataFrame.
- Identified missing values across all columns.
- Quantified missing values in specific columns.
- Imputed missing values in a categorical column using the most frequent value.
- Visualized the imputed data for better understanding.
  


Copyright © IBM Corporation. All rights reserved.
