# Learning Pandas: DataFrame Operation

This notebook demonstrates how to handle missing data in Pandas DataFrames. Each section below explains a different operation or concept.

In [1]:
import pandas as pd
import numpy as np

## Import Libraries

Import Pandas and NumPy, which are essential for data manipulation and handling missing values.

In [2]:
np.nan

nan

## What is np.nan?

`np.nan` represents a missing value ("Not a Number") in NumPy and Pandas.

In [3]:
data = {'A': [1,2,np.nan,4,5],
        'B': [6,np.nan,7,8,9],
        'C': [11,12,13,np.nan,15]
        }
df = pd.DataFrame(data)
df

Unnamed: 0,A,B,C
0,1.0,6.0,11.0
1,2.0,,12.0
2,,7.0,13.0
3,4.0,8.0,
4,5.0,9.0,15.0


## Create DataFrame with Missing Values

This section creates a DataFrame containing missing values using `np.nan`.

In [4]:
df.isnull()

Unnamed: 0,A,B,C
0,False,False,False
1,False,True,False
2,True,False,False
3,False,False,True
4,False,False,False


## Detect Missing Values

Use `isnull()` to check which values are missing in the DataFrame.

In [5]:
df.isnull().sum()

A    1
B    1
C    1
dtype: int64

## Count Missing Values

Use `isnull().sum()` to count the number of missing values in each column.

In [6]:
# Drops rows with na values
df.dropna(inplace=True)
# or
df = df.dropna()

## Drop Rows with Missing Values

Use `dropna()` to remove rows containing missing values from the DataFrame.

In [7]:
df

Unnamed: 0,A,B,C
0,1.0,6.0,11.0
4,5.0,9.0,15.0


## View DataFrame After Dropping Rows

Display the DataFrame after removing rows with missing values.

In [8]:
df.reset_index(drop=True)

Unnamed: 0,A,B,C
0,1.0,6.0,11.0
1,5.0,9.0,15.0


## Reset Index After Dropping Rows

Use `reset_index(drop=True)` to reset the DataFrame index after dropping rows.

In [9]:
data1 = {'A': [1,2,3,4,5],
        'B': [6,np.nan,7,8,9],
        'C': [11,12,13,np.nan,15]
        }
df1 = pd.DataFrame(data1)
df1

Unnamed: 0,A,B,C
0,1,6.0,11.0
1,2,,12.0
2,3,7.0,13.0
3,4,8.0,
4,5,9.0,15.0


## Create Another DataFrame with Missing Values

This section creates a new DataFrame with missing values for further operations.

In [10]:
df1 = df1.dropna(axis=1)
df1

Unnamed: 0,A
0,1
1,2
2,3
3,4
4,5


## Drop Columns with Missing Values

Use `dropna(axis=1)` to remove columns containing missing values from the DataFrame.

In [11]:
data2 = {'A': [1,2,3,4,5],
        'B': [6,np.nan,7,np.nan,9],
        'C': [11,12,13,np.nan,15]
        }
df2 = pd.DataFrame(data2)
df2

Unnamed: 0,A,B,C
0,1,6.0,11.0
1,2,,12.0
2,3,7.0,13.0
3,4,,
4,5,9.0,15.0


## Create DataFrame for Threshold Example

This section creates a DataFrame to demonstrate dropping rows based on a threshold of non-missing values.

In [12]:
df2 = df2.dropna(thresh=2)
df2

Unnamed: 0,A,B,C
0,1,6.0,11.0
1,2,,12.0
2,3,7.0,13.0
4,5,9.0,15.0


## Drop Rows Based on Threshold

Use `dropna(thresh=2)` to keep only rows with at least 2 non-missing values.

In [13]:
df2 = df2.fillna(0)
df2

Unnamed: 0,A,B,C
0,1,6.0,11.0
1,2,0.0,12.0
2,3,7.0,13.0
4,5,9.0,15.0


## Fill Missing Values with Zero

Use `fillna(0)` to replace all missing values in the DataFrame with zero.

In [14]:
data3 = {'A': [1,2,3,4,5],
        'B': [6,np.nan,7,np.nan,9],
        'C': [11,12,13,np.nan,15]
        }
df3 = pd.DataFrame(data2)
df3

Unnamed: 0,A,B,C
0,1,6.0,11.0
1,2,,12.0
2,3,7.0,13.0
3,4,,
4,5,9.0,15.0


## Create DataFrame for Fill Methods

This section creates a DataFrame to demonstrate different methods for filling missing values.

## Fill Missing Values with Mean or Median

Use `fillna(df.mean())` or `fillna(df.median())` to replace missing values with the mean or median of each column.

In [15]:
df3.fillna(df3.mean())
df3.fillna(df3.median())

Unnamed: 0,A,B,C
0,1,6.0,11.0
1,2,7.0,12.0
2,3,7.0,13.0
3,4,7.0,12.5
4,5,9.0,15.0


## Fill Missing Values with Forward/Backward Fill

Use `fillna(method='ffill')` for forward fill and `fillna(method='bfill')` for backward fill to propagate non-missing values.

In [16]:
df3.fillna(method='ffill')
df3.fillna(method='bfill')

  df3.fillna(method='ffill')
  df3.fillna(method='bfill')


Unnamed: 0,A,B,C
0,1,6.0,11.0
1,2,7.0,12.0
2,3,7.0,13.0
3,4,9.0,15.0
4,5,9.0,15.0
