In [1]:
import pandas_ip as ip
import numpy as np
import pandas as pd
from pandas.core.internals import BlockManager
import ipaddress

## What currently works:

- Creating IP arrays
- Storing IP arrays in Blocks
- Making Series / DataFrames from blocks


These rely in some changes to pandas: https://github.com/pandas-dev/pandas/compare/master...TomAugspurger:pandas-array

## Creating arrays of IPAddresses

From strings

In [2]:
ip.to_ipaddress(['0.0.0.0', '192.168.1.1', '2001:0db8:85a3:0000:0000:8a2e:0370:7334'])

<IPAddress(['0.0.0.0', '192.168.1.1', '2001:db8:85a3::8a2e:370:7334'])>

From integers

In [3]:
ip.to_ipaddress([0, 3232235777, 42540766452641154071740215577757643572])

<IPAddress(['0.0.0.0', '192.168.1.1', '2001:db8:85a3::8a2e:370:7334'])>

From bytes

In [4]:
ip.to_ipaddress([
    b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00',
    b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xc0\xa8\x01\x01',
    b' \x01\r\xb8\x85\xa3\x00\x00\x00\x00\x8a.\x03ps4',
])

<IPAddress(['0.0.0.0', '192.168.1.1', '2001:db8:85a3::8a2e:370:7334'])>

Those all return instances of `IPAddress`, which is analogous to `Categorical`. An array-like container.

In [5]:
values = ip.IPAddress.from_pyints(
    [0, 3232235777, 42540766452641154071740215577757643572]
)
values

<IPAddress(['0.0.0.0', '192.168.1.1', '2001:db8:85a3::8a2e:370:7334'])>

## Pandas Containers

Our `IPAddress` array can be stored in pandas' containers.

In [6]:
s = pd.Series(values)
s

0                         0.0.0.0
1                     192.168.1.1
2    2001:db8:85a3::8a2e:370:7334
dtype: ip

In [7]:
df = pd.DataFrame({
    "A": [np.nan, 2, 3],
    "B": values
})
df

Unnamed: 0,A,B
0,,0.0.0.0
1,2.0,192.168.1.1
2,3.0,2001:db8:85a3::8a2e:370:7334


## IP Accessor

We register the ".ip" accessor with pandas.

In [8]:
s.ip.is_ipv4

0     True
1     True
2    False
dtype: bool

In [9]:
s.ip.is_ipv6

0    False
1    False
2     True
dtype: bool

In [12]:
s.isna()

0     True
1    False
2    False
dtype: bool

## Pandas Methods

Some pandas operations work correctly on IPAddress data.

Indexing:

In [None]:
df.loc[[0, 1], 'B']

In [None]:
df.loc[2, 'B']

In [None]:
df.iloc[1, 1]

Concatenation:

In [None]:
pd.concat([df, df], ignore_index=True)

Null checking

In [None]:
df.isna()

Many things don't (yet) work

In [None]:
df.B == df.B

In [None]:
df.B.values > df.B.values

In [None]:
df.B.sort_values()

In [None]:
df.fillna(method='bfill')  # (0, 'B') should have been filled

In [None]:
df.groupby("B").A.count()

## IPAddressIndex

Nothing on this is actually implemented.

In [None]:
pd.Index._engine

In [None]:
df.B.values.value_counts()

In [None]:
idx = pd.Series([10, 5, 0], index=ip.IPAddressIndex(df.B), name='counts')
idx

In [None]:
idx.loc[ipaddress.IPv4Address(0)]