In [None]:
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Pandas Null Values Handling Practice\n",
    "\n",
    "This notebook contains practice questions for handling null values in pandas.\n",
    "\n",
    "**Dataset:** sales_data.csv (Employee sales data with missing values)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup: Import Libraries and Load Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "# Load the dataset\n",
    "df = pd.read_csv('sales_data.csv')\n",
    "print(\"Dataset loaded successfully!\")\n",
    "df.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Question 1: Detecting Null Values\n",
    "\n",
    "**Task:** \n",
    "- Display the total number of null values in each column\n",
    "- Display the percentage of null values in each column\n",
    "- Which column has the most missing values?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Your code here\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Question 2: Visualizing Missing Data\n",
    "\n",
    "**Task:** \n",
    "- Use `isnull()` to create a boolean DataFrame showing where nulls exist\n",
    "- Display rows that contain at least one null value"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Your code here\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Question 3: Drop Rows with Null Values\n",
    "\n",
    "**Task:** \n",
    "- Create a new DataFrame that drops all rows containing any null values\n",
    "- How many rows remain after dropping?\n",
    "- Is this a good approach for this dataset? Why or why not?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Your code here\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Question 4: Drop Columns with Many Nulls\n",
    "\n",
    "**Task:** \n",
    "- Drop columns that have more than 20% missing values\n",
    "- Which columns were dropped?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Your code here\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Question 5: Fill Null Values with Mean/Median\n",
    "\n",
    "**Task:** \n",
    "- Fill null values in the 'salary' column with the mean salary\n",
    "- Fill null values in the 'age' column with the median age\n",
    "- Display the filled data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Your code here\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Question 6: Fill Categorical Nulls\n",
    "\n",
    "**Task:** \n",
    "- Fill null values in the 'department' column with 'Unknown'\n",
    "- Fill null values in the 'name' column with 'Anonymous'\n",
    "- Verify no nulls remain in these columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Your code here\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Question 7: Forward Fill and Backward Fill\n",
    "\n",
    "**Task:** \n",
    "- Sort the DataFrame by 'join_date'\n",
    "- Use forward fill (`ffill`) to fill null values in 'performance_score'\n",
    "- Then use backward fill (`bfill`) for any remaining nulls\n",
    "- Why might this approach work or not work for this data?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Your code here\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Question 8: Conditional Filling\n",
    "\n",
    "**Task:** \n",
    "- For null values in 'sales_amount', fill them with different values based on department:\n",
    "  - Sales department: fill with the mean of Sales department\n",
    "  - Other departments: fill with 0\n",
    "- How many values did you fill?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Your code here\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Question 9: Interpolation\n",
    "\n",
    "**Task:** \n",
    "- Sort the data by 'employee_id'\n",
    "- Use interpolation to fill null values in numeric columns\n",
    "- Compare the results with mean filling"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Your code here\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Question 10: Complete Data Cleaning Pipeline\n",
    "\n",
    "**Task:** \n",
    "Create a complete cleaning pipeline that:\n",
    "1. Loads the original data\n",
    "2. Fills null values appropriately for each column type\n",
    "3. Verifies no nulls remain\n",
    "4. Saves the cleaned data to 'sales_data_cleaned.csv'\n",
    "\n",
    "Choose the best strategy for each column based on what you've learned!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Your code here\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Bonus Challenge\n",
    "\n",
    "**Task:** \n",
    "- Analyze which method of handling nulls (dropping vs. filling) is better for this dataset\n",
    "- Calculate summary statistics before and after handling nulls\n",
    "- Discuss potential biases introduced by different filling methods"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Your code here\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.8.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}