Handling `Nan` and `None` can be trickin in Pandas.

First, the distinction between these two values is not always maintained.  We can see this in the following example.

In [1]:
import numpy
import pandas
from data_algebra.data_ops import *


# some example data
d = pandas.DataFrame({
    'ID': [2, 3, 7, 7, numpy.nan, None],
})

d

Unnamed: 0,ID
0,2.0
1,3.0
2,7.0
3,7.0
4,
5,


Next, `Nan` and `None` are not treated as values.
So they don't work the same as values in grouped calculations.
Notice only

In [2]:
d.groupby(['ID']).size()

ID
2.0    1
3.0    1
7.0    2
dtype: int64

Notice in the above example the `Nan` and `None` keyed rows are dropped.

The data algebra doesn't attempt to work around this.

In [3]:
ex(
    data(d=d).
        project({'n': '(1).sum()'}, group_by=['ID'])
)

Unnamed: 0,ID,n
0,2.0,1
1,3.0,1
2,7.0,2


In [4]:
ex(
    data(d=d).
        extend({'n': '(1).sum()'}, partition_by=['ID'])
)

Unnamed: 0,ID,n
0,2.0,1.0
1,3.0,1.0
2,7.0,2.0
3,7.0,2.0
4,,
5,,


Our suggestion is to replace such keys with a carefully chosen [sentinel value](https://en.wikipedia.org/wiki/Sentinel_value) prior to grouped calculations (which introduces its own problems!).

In [5]:
ex(
    data(d=d)
        .extend({'ID': 'ID.coalesce(-1)'})
        .project({'n': '(1).sum()'}, group_by=['ID'])
)

Unnamed: 0,ID,n
0,-1.0,2
1,2.0,1
2,3.0,1
3,7.0,2


In [6]:
ex(
    data(d=d)
        .extend({'ID': 'ID.coalesce(-1)'})
        .extend({'n': '(1).sum()'}, partition_by=['ID'])
)

Unnamed: 0,ID,n
0,2.0,1
1,3.0,1
2,7.0,2
3,7.0,2
4,-1.0,2
5,-1.0,2
