# The dropna function
In the previous lesson, we learned how to check the number of missing values in a column or row. The next step is to handle them. We essentially have two options for handling missing values. First, we can drop rows or columns that contain missing values. The dropna function is used to drop rows and columns with missing values. To use this function accurately and efficiently, we need to first learn its parameters.

The axis parameter determines if rows or columns with missing values are removed. The default value is zero, which indicates rows. The how parameter takes one of two values: any or all. The default value is any, which drops a row or column with at least one missing value. If all is selected, all values must be missing for a row or column to be dropped. Let’s look at some examples to understand how these parameters are used. We’ll create the DataFrame in the image below for the following examples.



![image.png](attachment:fa00c394-7de3-4bd7-8f92-4908a01079c8.png)

The following code block does the same as the previous one except that it drops columns that have at least one missing value.



In [1]:
import numpy as np
import pandas as pd

df = pd.DataFrame({
    "A": [1, 2, 3, np.nan, 7],
    "B": [2.4, np.nan, 5.1, np.nan, 2.6],
    "C": [np.nan, "foo","zoo","bar", np.nan],
    "D": [11.5, np.nan, 6.2, 21.1, 8.7],
    "E": [1, 2, 3, 4, 5]
})

# Drop rows that have at least one missing value
print(df.dropna(axis=0, how="any"))

     A    B    C    D  E
2  3.0  5.1  zoo  6.2  3


In [2]:
df

Unnamed: 0,A,B,C,D,E
0,1.0,2.4,,11.5,1
1,2.0,,foo,,2
2,3.0,5.1,zoo,6.2,3
3,,,bar,21.1,4
4,7.0,2.6,,8.7,5


In [3]:
print(df.dropna(axis=1, how="any"))

   E
0  1
1  2
2  3
3  4
4  5


In [5]:
print(df.dropna(axis=0, how="all"))

     A    B    C     D  E
0  1.0  2.4  NaN  11.5  1
1  2.0  NaN  foo   NaN  2
2  3.0  5.1  zoo   6.2  3
3  NaN  NaN  bar  21.1  4
4  7.0  2.6  NaN   8.7  5


The DataFrame doesn’t have a row or column full of missing values, so setting the how parameter as all won’t drop any row or column. Another important parameter of the dropna function is thresh, which can set a threshold value for dropping. For instance, if we set the thresh parameter to 4, a row must have at least four non-missing values to not be dropped. In other words, the rows with two or more missing values will be dropped because there are five columns in the DataFrame.

In [6]:
import numpy as np
import pandas as pd

df = pd.DataFrame({
    "A": [1, 2, 3, np.nan, 7],
    "B": [2.4, np.nan, 5.1, np.nan, 2.6],
    "C": [np.nan, "foo","zoo","bar", np.nan],
    "D": [11.5, np.nan, 6.2, 21.1, 8.7],
    "E": [1, 2, 3, 4, 5]
})

# Drop rows that have less than 4 non-missing values
print(df.dropna(thresh=4))

     A    B    C     D  E
0  1.0  2.4  NaN  11.5  1
2  3.0  5.1  zoo   6.2  3
4  7.0  2.6  NaN   8.7  5


As we see in the output, the number of non-missing values in all remaining rows is at least 4, which is our threshold. In the last example, we didn’t use the axis parameter, because we’re concerned with rows and the default value of the axis parameter is 0, which indicates rows.

# The inplace parameter
It’s important to note that we need to use the inplace parameter and set it as True to save the changes in the DataFrame. The following line of code returns a DataFrame after dropping the rows that have less than four non-missing values. However, it doesn’t modify df. We can either assign the modified DataFrame to a new variable or use the inplace parameter to modify df. Let’s demonstrate the use of the inplace parameter with a couple of examples.

In [7]:
import numpy as np
import pandas as pd

df = pd.DataFrame({
    "A": [1, 2, 3, np.nan, 7],
    "B": [2.4, np.nan, 5.1, np.nan, 2.6],
    "C": [np.nan, "foo","zoo","bar", np.nan],
    "D": [11.5, np.nan, 6.2, 21.1, 8.7],
    "E": [1, 2, 3, 4, 5]
})

# Drop rows that have less than 4 non-missing values
df.dropna(thresh=4)

print(df)

     A    B    C     D  E
0  1.0  2.4  NaN  11.5  1
1  2.0  NaN  foo   NaN  2
2  3.0  5.1  zoo   6.2  3
3  NaN  NaN  bar  21.1  4
4  7.0  2.6  NaN   8.7  5


As we see in the output, df isn’t modified when the inplace parameter isn’t used. Let’s approach the same example but while setting the inplace parameter as True.



In [8]:
import numpy as np
import pandas as pd

df = pd.DataFrame({
    "A": [1, 2, 3, np.nan, 7],
    "B": [2.4, np.nan, 5.1, np.nan, 2.6],
    "C": [np.nan, "foo","zoo","bar", np.nan],
    "D": [11.5, np.nan, 6.2, 21.1, 8.7],
    "E": [1, 2, 3, 4, 5]
})

# Drop rows that have less than 4 non-missing values
df.dropna(thresh=4, inplace=True)

print(df)

     A    B    C     D  E
0  1.0  2.4  NaN  11.5  1
2  3.0  5.1  zoo   6.2  3
4  7.0  2.6  NaN   8.7  5


We now see the changes are saved and df has been modified. The inplace is a very important parameter in Pandas because it’s used in several functions.


