### Find missing values
In the field of Data Science, it is common to encounter datasets with missing values. This is especially true in the case of time series data, where missing values can occur if a measurement fails to record the value at a specific timestamp. To count the number of missing values in a DataFrame called df that contains time series data, you can use the command:

missing_values = df.isnull().sum()
In this exercise, you will learn how to find whether your data contains any missing values.

In [4]:
import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt 

In [6]:
# Read in the file content in a DataFrame called discoveries
co2_levels = pd.read_csv('ch2_co2_levels.csv')

# Display the first five lines of the DataFrame
print(co2_levels.head())

    datestamp    co2
0  1958-03-29  316.1
1  1958-04-05  317.3
2  1958-04-12  317.6
3  1958-04-19  317.5
4  1958-04-26  316.4


In [7]:
# Display first seven rows of co2_levels
print(co2_levels.head(7))

    datestamp    co2
0  1958-03-29  316.1
1  1958-04-05  317.3
2  1958-04-12  317.6
3  1958-04-19  317.5
4  1958-04-26  316.4
5  1958-05-03  316.9
6  1958-05-10    NaN


In [8]:
# Set datestamp column as index
co2_levels = co2_levels.set_index('datestamp')

# Print out the number of missing values
print(co2_levels.isnull().sum())

co2    59
dtype: int64


### Handle missing values
In order to replace missing values in your time series data, you can use the command:

df = df.fillna(method="ffill")
where the argument specifies the type of method you want to use. For example, specifying bfill (i.e backfilling) will ensure that missing values are replaced using the next valid observation, while ffill (i.e. forward-filling) ensures that missing values are replaced using the last valid observation.

Recall from the previous exercise that co2_levels has 59 missing values.

In [9]:
# Impute missing values with the next valid observation
co2_levels = co2_levels.fillna(method='bfill')

# Print out the number of missing values
print(co2_levels.isnull())

              co2
datestamp        
1958-03-29  False
1958-04-05  False
1958-04-12  False
1958-04-19  False
1958-04-26  False
...           ...
2001-12-01  False
2001-12-08  False
2001-12-15  False
2001-12-22  False
2001-12-29  False

[2284 rows x 1 columns]
