You can define a specific index in a Series, but what happens if you have more indexes (rows) than you have data? For example, with the weather data, perhaps wind speed failed to be recorded during an observation. You can't just fill these with 0s as the wind speed could actually be 0 so recording a non-observed wind speed as 0 would affect calculations you did about wind speed (such as mean wind speeds).

Pandas uses the **NaN** value to fill these spaces.

Pandas allows you to explicitly define NaNs (Not a Number) and add them to a Series or a Data Frame.

Tip: Try to be consistent for all the data that you define as missing data. I suggest you use np.nan in all the cases.

In [1]:
# creating an index and NaN values

import pandas as pd
import numpy as np

# dictionary with numpy NaN value
data = {'apple': 1.5, 'blueberry': 3.5, 'banana': 2.99, 'orange': 4.3, 'mandarine':np.nan}

# create a series with the dictionary
series1 = pd.Series(data)
print('series1 = ')
print(series1)
print()

# Create another series with more indexse than series1
fruits = ['apple','blueberry','banana', 'orange', 'mandarine', 'rockmelon', 'strawberry']

# Create a series using the series1 data and the additional indexes
# series2 has more indexes than data provided
# there are missing value for some indexes
series2 = pd.Series(series1, index=fruits)
print('series2 = ')
print(series2)

series1 = 
apple        1.50
blueberry    3.50
banana       2.99
orange       4.30
mandarine     NaN
dtype: float64

series2 = 
apple         1.50
blueberry     3.50
banana        2.99
orange        4.30
mandarine      NaN
rockmelon      NaN
strawberry     NaN
dtype: float64


Note that all the missing values have automatically been filled in with NaN.

You can find out how many NaNs you have in your Series by checking if the values are null with the **isnull( )** and **notnull( )** functions. Let's see what happen with NaNs.

In [2]:
# isnull() function return true when is a null value or NaN
series1.isnull()
# mandarine is null because it has NaN for value 

apple        False
blueberry    False
banana       False
orange       False
mandarine     True
dtype: bool

In [3]:
# notull() function return true when the value is Not a null value or NaN
series2.notnull()
#apple, blueberry, banana and orange are True because they all have a value. 
#However, mandarine, rockmelon and strawberry are False because they have NaN values

apple          True
blueberry      True
banana         True
orange         True
mandarine     False
rockmelon     False
strawberry    False
dtype: bool

## Counting and Filtering NaNs in Series

![image.png](attachment:image.png)

In [4]:
# counting notnull and null data
print('Number of null data values in series2 is', series2.isnull().sum())
print('Number of valid data values in series2 is', series2.notnull().sum())


Number of null data values in series2 is 3
Number of valid data values in series2 is 4


In [9]:
print('Number of null data values in series2 is', series2.isna().count())

Number of null data values in series2 is 7


In [5]:
# Filtering NOT null data
print('The valid data in series2 are')
print(series2[series2.notnull()])
print('The missing data in series2 are')
print(series2[series2.isnull()])

The valid data in series2 are
apple        1.50
blueberry    3.50
banana       2.99
orange       4.30
dtype: float64
The missing data in series2 are
mandarine    NaN
rockmelon    NaN
strawberry   NaN
dtype: float64


## Practise NaNs