isnull and notnull return booleans indicating whether a value is missing

In [3]:
import pandas as pd
# Create URL
url = 'https://raw.githubusercontent.com/chrisalbon/sim_data/master/titanic.csv'
# Load data
dataframe = pd.read_csv(url)
## Select missing values, show two rows
dataframe[dataframe['Age'].isnull()].head()

Unnamed: 0,Name,PClass,Age,Sex,Survived,SexCode
12,"Aubert, Mrs Leontine Pauline",1st,,female,1,1
13,"Barkworth, Mr Algernon H",1st,,male,1,0
14,"Baumann, Mr John D",1st,,male,0,0
29,"Borebank, Mr John James",1st,,male,0,0
32,"Bradley, Mr George",1st,,male,1,0


In [6]:
dataframe.isnull().head()

Unnamed: 0,Name,PClass,Age,Sex,Survived,SexCode
0,False,False,False,False,False,False
1,False,False,False,False,False,False
2,False,False,False,True,False,False
3,False,False,False,False,False,False
4,False,False,False,True,False,False


pandas uses NumPy’s
NaN (“Not A Number”) value to denote missing values, but it is important to note that NaN is not fully implemented natively in pandas. For example, if we wanted to replace all strings containing male with missing values, we return an error:


In [4]:
# Attempt to replace values with NaN
dataframe['Sex'] = dataframe['Sex'].replace('male', NaN)

NameError: name 'NaN' is not defined

In [5]:
# Load library
import numpy as np
# Replace values with NaN
dataframe['Sex'] = dataframe['Sex'].replace('male', np.nan)

Oftentimes a dataset uses a specific value to denote a missing observation, such
as NONE, -999, or .. pandas’ read_csv includes a parameter allowing us to
specify the values used to indicate missing values:


# Load data, set missing values
dataframe = pd.read_csv(url, na_values=[np.nan, 'NONE', -999])