# Testing your data with an 'assert'

The datasets which we get is never clean. We have to perform EDA i.e. Exploratory Data Analysis to check various things related to the data in hand.

We generally check this programatically or by visualizing it with various types of plots.

One such example is if we drop or fill NaNs, we expect 0 missing values.

We can write an 'assert' statement to make sure that there is no more missing data. 
- By writing assert statements in our analysis scripts we can detect early warnings and errors in our data before the final report. 
- This gives us confidence that our code is running correctly when we rerun our analysis with updated data.

### How assert works ?

Here is an example to show how assert statements work.

In [17]:
assert 1 == 1

If assert evaluates to true it will return nothing.

In [18]:
assert 1 == 2

AssertionError: 

On the other hand if we give it something that evaluates to false then we get an error as shown in above cell.

Lets apply this method to our 'Ebola' dataset.

In [19]:
import pandas as pd

In [20]:
df = pd.read_csv('Ebola.csv')

In [21]:
#check head of the dataframe
df.head()

Unnamed: 0,Date,Day,Cases_Guinea,Cases_Liberia,Cases_SierraLeone,Cases_Nigeria,Cases_Senegal,Cases_UnitedStates,Cases_Spain,Cases_Mali,Deaths_Guinea,Deaths_Liberia,Deaths_SierraLeone,Deaths_Nigeria,Deaths_Senegal,Deaths_UnitedStates,Deaths_Spain,Deaths_Mali
0,1/5/2015,289,2776.0,,10030.0,,,,,,1786.0,,2977.0,,,,,
1,1/4/2015,288,2775.0,,9780.0,,,,,,1781.0,,2943.0,,,,,
2,1/3/2015,287,2769.0,8166.0,9722.0,,,,,,1767.0,3496.0,2915.0,,,,,
3,1/2/2015,286,,8157.0,,,,,,,,3496.0,,,,,,
4,12/31/2014,284,2730.0,8115.0,9633.0,,,,,,1739.0,3471.0,2827.0,,,,,


We have many NaN values which can be seen in above rows. Now let's write an assert to check the missing values in one of the columns.

In [22]:
assert df.Cases_Liberia.notnull().all()

AssertionError: 

I used notnull() method in assert statement which will be evaluated to true if we have values and evaluated to false if we have any missing/null values.

I can fill all the missing values in the dataset by replacing it with 0.

In [23]:
df_withoutMissingValues = df.fillna(value = 0)

Now try writing assert statement on the same column of a modified dataset.

In [24]:
assert df_withoutMissingValues.Cases_Liberia.notnull().all()

It dint return any error which means we do not have any missing values in that column.

I am writing the .all() method together with the .notnull() DataFrame method to check for missing values in a column. The .all() method returns True if all values are True. When used on a DataFrame, it returns a Series of Booleans - one for each column in the DataFrame. So if you are using it on a DataFrame, you need to chain another .all() method so that you return only one True or False value. When using these within an assert statement, nothing will be returned if the assert statement is true: This is how you can confirm that the data you are checking are valid.

In [25]:
assert pd.notnull(df_withoutMissingValues).all().all()

There are no null values in the df_withoutMissingValues. 

In [26]:
#assert that all values of the dataset are greater than 0
assert (df_withoutMissingValues >=0).all().all()

Above expression evaluated to true meaning all values are greater  than 0.