In [2]:
import pandas_ip as ip
import numpy as np
import pandas as pd
from pandas.core.internals import BlockManager
import ipaddress

## What currently works:

- Creating IP arrays
- Storing IP arrays in Blocks
- Making Series / DataFrames from blocks


These rely in some changes to pandas: https://github.com/pandas-dev/pandas/compare/master...TomAugspurger:pandas-array

## Creating arrays of IPAddresses

From strings

In [3]:
ip.to_ipaddress(['0.0.0.0', '192.168.1.1', '2001:0db8:85a3:0000:0000:8a2e:0370:7334'])

<IPAddress(['0.0.0.0', '192.168.1.1', '2001:db8:85a3::8a2e:370:7334'])>

From integers

In [4]:
ip.to_ipaddress([0, 3232235777, 42540766452641154071740215577757643572])

<IPAddress(['0.0.0.0', '192.168.1.1', '2001:db8:85a3::8a2e:370:7334'])>

From bytes

In [5]:
ip.to_ipaddress([
    b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00',
    b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xc0\xa8\x01\x01',
    b' \x01\r\xb8\x85\xa3\x00\x00\x00\x00\x8a.\x03ps4',
])

<IPAddress(['0.0.0.0', '192.168.1.1', '2001:db8:85a3::8a2e:370:7334'])>

Those all return instances of `IPAddress`, which is analogous to `Categorical`. An array-like container.

In [6]:
values = ip.IPAddress.from_pyints(
    [0, 3232235777, 42540766452641154071740215577757643572]
)
values

<IPAddress(['0.0.0.0', '192.168.1.1', '2001:db8:85a3::8a2e:370:7334'])>

## Pandas Containers

Our `IPAddress` array can be stored in pandas' containers.

In [9]:
s = pd.Series(values)
s

0                         0.0.0.0
1                     192.168.1.1
2    2001:db8:85a3::8a2e:370:7334
dtype: ip

In [12]:
df = pd.DataFrame({
    "A": [np.nan, 2, 3],
    "B": values
})
df

Unnamed: 0,A,B
0,,0.0.0.0
1,2.0,192.168.1.1
2,3.0,2001:db8:85a3::8a2e:370:7334


## IP Accessor

We register the ".ip" accessor with pandas.

In [13]:
s.ip.is_ipv4

0     True
1     True
2    False
dtype: bool

In [14]:
s.ip.is_ipv6

0    False
1    False
2     True
dtype: bool

In [15]:
s.ip.isna()

0     True
1    False
2    False
dtype: bool

In [16]:
s.ip.packed

0    b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00...
1    b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00...
2    b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00...
dtype: object

## Pandas Methods

Some pandas operations work correctly on IPAddress data.

Indexing:

In [18]:
df.loc[[0, 1], 'B']

0        0.0.0.0
1    192.168.1.1
Name: B, dtype: ip

In [19]:
df.loc[2, 'B']

IPv6Address('2001:db8:85a3::8a2e:370:7334')

In [20]:
df.iloc[1, 1]

IPv4Address('192.168.1.1')

Concatenation:

In [21]:
pd.concat([df, df], ignore_index=True)

Unnamed: 0,A,B
0,,0.0.0.0
1,2.0,192.168.1.1
2,3.0,2001:db8:85a3::8a2e:370:7334
3,,0.0.0.0
4,2.0,192.168.1.1
5,3.0,2001:db8:85a3::8a2e:370:7334


Null checking

In [22]:
df.isna()

Unnamed: 0,A,B
0,True,True
1,False,False
2,False,False


Many things don't (yet) work

In [23]:
df.B == df.B

TypeError: Argument 'values' has incorrect type (expected numpy.ndarray, got IPAddress)

In [27]:
df.B.values > df.B.values

array([False, False, False], dtype=bool)

In [24]:
df.B.sort_values()

AttributeError: 'IPAddress' object has no attribute 'argsort'

In [28]:
df.fillna(method='bfill')  # (0, 'B') should have been filled

Unnamed: 0,A,B
0,2.0,0.0.0.0
1,2.0,192.168.1.1
2,3.0,2001:db8:85a3::8a2e:370:7334


In [37]:
df.groupby("B").A.count()

TypeError: 'IPAddress' object is not callable

## IPAddressIndex

Nothing on this is actually implemented.

In [36]:
df.B.values.value_counts()

0.0.0.0                         1
2001:db8:85a3::8a2e:370:7334    1
192.168.1.1                     1
dtype: int64

In [29]:
idx = pd.Series([10, 5, 0], index=ip.IPAddressIndex(df.B), name='counts')
idx

0.0.0.0                         10
192.168.1.1                      5
2001:db8:85a3::8a2e:370:7334     0
Name: counts, dtype: int64

In [30]:
idx.loc[ipaddress.IPv4Address(0)]

KeyError: 'the label [0.0.0.0] is not in the [index]'