## `dropna()` Method


In Pandas, the `dropna()` method is used to remove missing or null values from a DataFrame. Missing values are represented as `NaN` (Not a Number) in Pandas. The `dropna()` method provides various options to handle missing data effectively. Here's all about the `dropna()` method in Pandas:

**Syntax**:
```python
DataFrame.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
```

**Parameters**:
- `axis`: Specifies the axis along which the `dropna()` method will look for missing values. Use `axis=0` to drop rows containing missing values (default), and `axis=1` to drop columns with missing values.
- `how`: Determines the criteria for dropping rows or columns. Options are:
   - `'any'`: Drop if any of the values in the row or column are missing (default).
   - `'all'`: Drop only if all values in the row or column are missing.
- `thresh`: Specifies the minimum number of non-missing values required to keep the row or column. Rows or columns with fewer non-missing values than the threshold will be dropped.
- `subset`: A list of column names to consider for missing values. The `dropna()` method will only check for missing values in these columns.
- `inplace`: If True, the DataFrame will be modified in place, and the method will return `None`. If False (default), the method will return a new DataFrame with missing values dropped.

**Examples**:
Assuming we have the following DataFrame with missing values (NaN):

```python
import pandas as pd

data = {
    'A': [1, 2, None, 4],
    'B': [5, None, 7, 8],
    'C': [9, 10, 11, None]
}

df = pd.DataFrame(data)
```

```
     A    B     C
0  1.0  5.0   9.0
1  2.0  NaN  10.0
2  NaN  7.0  11.0
3  4.0  8.0   NaN
```

**1. Drop Rows with Any Missing Values (Default)**:

```python
df_dropped = df.dropna()
```

Output:

```
     A    B    C
0  1.0  5.0  9.0
```

**2. Drop Columns with Any Missing Values**:

```python
df_dropped = df.dropna(axis=1)
```

Output:

```
Empty DataFrame
Columns: []
Index: [0, 1, 2, 3]
```

**3. Drop Rows with All Missing Values**:

```python
df_dropped = df.dropna(how='all')
```

Output:

```
     A    B     C
0  1.0  5.0   9.0
1  2.0  NaN  10.0
2  NaN  7.0  11.0
3  4.0  8.0   NaN
```

**4. Drop Rows with at least 2 Non-Missing Values**:

```python
df_dropped = df.dropna(thresh=2)
```

Output:

```
     A    B     C
0  1.0  5.0   9.0
1  2.0  NaN  10.0
2  NaN  7.0  11.0
3  4.0  8.0   NaN
```

**5. Drop Rows with Missing Values Only in Column 'B'**:

```python
df_dropped = df.dropna(subset=['B'])
```

Output:

```
     A    B    C
0  1.0  5.0  9.0
2  NaN  7.0  11.0
3  4.0  8.0   NaN
```

The `dropna()` method is useful for cleaning and preparing data before analysis, as it allows you to remove or handle missing values in a flexible way.

In [1]:
import pandas as pd

In [3]:
# create a DataFrame with some missing values

data = {
    'A': [1, 2, None, 4],
    'B': [5, None, 7, 8],
    'C': [9, 10, 11, None]
}
df = pd.DataFrame(data)
df

Unnamed: 0,A,B,C
0,1.0,5.0,9.0
1,2.0,,10.0
2,,7.0,11.0
3,4.0,8.0,


In [4]:
#1  Drop Rows with Any Missing Values (Default):

df_dropped = df.dropna()
df_dropped

Unnamed: 0,A,B,C
0,1.0,5.0,9.0


In [6]:
# 2. Drop Columns with Any Missing Values:

df_dropped = df.dropna(axis=1)
df_dropped

0
1
2
3


 Output  - Empty DataFrame
- Columns: []
- Index: [0, 1, 2, 3]

In [8]:
# 3. Drop Rows with All Missing Values:

df_dropped = df.dropna(how='all')
df_dropped

Unnamed: 0,A,B,C
0,1.0,5.0,9.0
1,2.0,,10.0
2,,7.0,11.0
3,4.0,8.0,


There were no full row with all missing value

In [9]:
# 5. Drop Rows with Missing Values Only in Column 'B':

df_dropped = df.dropna(subset=['B'])
df_dropped
# Output:

#      A    B    C
# 0  1.0  5.0  9.0
# 2  NaN  7.0  11.0
# 3  4.0  8.0   NaN

Unnamed: 0,A,B,C
0,1.0,5.0,9.0
2,,7.0,11.0
3,4.0,8.0,


### Let's continue with more examples of using the `dropna()` method in Pandas:

**Example 6: Drop Columns with All Missing Values**
```python
import pandas as pd

data = {
    'A': [None, None, None, None],
    'B': [5, 6, None, None],
    'C': [None, None, None, None]
}

df = pd.DataFrame(data)

# Drop columns with all missing values
df_dropped = df.dropna(axis=1, how='all')
```

Output:
```
     B
0  5.0
1  6.0
2  NaN
3  NaN
```

**Example 7: Drop Rows with Missing Values in Specific Columns**
```python
import pandas as pd

data = {
    'A': [1, 2, None, 4],
    'B': [5, None, 7, 8],
    'C': [9, 10, 11, None]
}

df = pd.DataFrame(data)

# Drop rows with missing values in columns 'A' and 'B'
df_dropped = df.dropna(subset=['A', 'B'])
```

Output:
```
     A    B     C
0  1.0  5.0   9.0
3  4.0  8.0   NaN
```

**Example 8: Drop Rows with All Missing Values in Specific Columns**
```python
import pandas as pd

data = {
    'A': [1, 2, None, None],
    'B': [5, None, None, None],
    'C': [9, None, None, None]
}

df = pd.DataFrame(data)

# Drop rows with all missing values in columns 'A' and 'B'
df_dropped = df.dropna(subset=['A', 'B'], how='all')
```

Output:
```
     A    B    C
0  1.0  5.0  9.0
1  2.0  NaN  NaN
```

**Example 9: Modify DataFrame Inplace**
```python
import pandas as pd

data = {
    'A': [1, 2, None, 4],
    'B': [5, None, 7, 8],
    'C': [9, 10, 11, None]
}

df = pd.DataFrame(data)

# Drop rows with any missing values and modify DataFrame inplace
df.dropna(inplace=True)
```

Output:
```
     A    B     C
0  1.0  5.0   9.0
```

**Example 10: Handling Missing Values in Time Series Data**
```python
import pandas as pd

data = {
    'Date': ['2023-07-01', '2023-07-02', '2023-07-03'],
    'Value': [10, None, 30]
}

df = pd.DataFrame(data)

# Convert 'Date' column to datetime type
df['Date'] = pd.to_datetime(df['Date'])

# Set 'Date' column as the index
df.set_index('Date', inplace=True)

# Drop rows with missing values
df.dropna(inplace=True)
```

Output:
```
            Value
Date
2023-07-01   10.0
2023-07-03   30.0
```

These examples demonstrate different scenarios where the `dropna()` method is useful for handling missing values in a DataFrame. Depending on the specific data cleaning and analysis tasks, you can apply different parameters and options to effectively handle missing data.

In [11]:
# Example 6: Drop Columns with All Missing Values

data = {
    'A': [None, None, None, None],
    'B': [5, 6, None, None],
    'C': [None, None, None, None]
}

df = pd.DataFrame(data)
df

Unnamed: 0,A,B,C
0,,5.0,
1,,6.0,
2,,,
3,,,


In [12]:
# Drop rows with all missing values in columns 'A' and 'B'
df_dropped = df.dropna(subset=['A', 'B'], how='all')
df_dropped

Unnamed: 0,A,B,C
0,,5.0,
1,,6.0,


In [13]:
# Example 7: Drop Rows with Missing Values in Specific Columns

data = {
    'A': [1, 2, None, 4],
    'B': [5, None, 7, 8],
    'C': [9, 10, 11, None]
}

df = pd.DataFrame(data)
df

Unnamed: 0,A,B,C
0,1.0,5.0,9.0
1,2.0,,10.0
2,,7.0,11.0
3,4.0,8.0,


In [14]:
# Drop rows with missing values in columns 'A' and 'B'
df_dropped = df.dropna(subset=['A', 'B'])
df_dropped

Unnamed: 0,A,B,C
0,1.0,5.0,9.0
3,4.0,8.0,


In [16]:
# Example 8: Drop Rows with All Missing Values in Specific Columns


data = {
    'A': [1, 2, None, None],
    'B': [5, None, None, None],
    'C': [9, None, None, None]
}

df = pd.DataFrame(data)
df

Unnamed: 0,A,B,C
0,1.0,5.0,9.0
1,2.0,,
2,,,
3,,,


In [17]:
# Drop rows with all missing values in columns 'A' and 'B'
df_dropped = df.dropna(subset=['A', 'B'], how='all')
df_dropped

Unnamed: 0,A,B,C
0,1.0,5.0,9.0
1,2.0,,


In [18]:
# Example 9: Modify DataFrame Inplace

data = {
    'A': [1, 2, None, 4],
    'B': [5, None, 7, 8],
    'C': [9, 10, 11, None]
}

df = pd.DataFrame(data)
df


Unnamed: 0,A,B,C
0,1.0,5.0,9.0
1,2.0,,10.0
2,,7.0,11.0
3,4.0,8.0,


In [19]:
# Drop rows with any missing values and modify DataFrame inplace
df.dropna(inplace=True)

In [24]:
df

Unnamed: 0,A,B,C
0,1.0,5.0,9.0


In [23]:
df.dropna

<bound method DataFrame.dropna of      A    B    C
0  1.0  5.0  9.0>

In [29]:
# Example 10: Handling Missing Values in Time Series Data

data = {
    'Date': ['2023-07-01', '2023-07-02', '2023-07-03'],
    'Value': [10, None, 30]
}

df = pd.DataFrame(data)
df


Unnamed: 0,Date,Value
0,2023-07-01,10.0
1,2023-07-02,
2,2023-07-03,30.0


In [42]:
# Convert 'Date' column to datetime type
df['Date'] = pd.to_datetime(df['Date'])

# Set 'Date' column as the index
df.set_index('Date', inplace=True)

# Display the DataFrame
df

Unnamed: 0_level_0,Value
Date,Unnamed: 1_level_1
2023-07-01,10.0
2023-07-02,
2023-07-03,30.0
