In [1]:
import ipaddress

import numpy as np
import pandas as pd

import cyberpandas as cp

## What currently works:

- Creating `IPArrays`
- Storing `IPArrays` in pandas containers

These rely in some changes to pandas.

## Creating arrays of IPAddresses

From strings

In [2]:
cp.to_ipaddress(['0.0.0.0', '192.168.1.1', '2001:0db8:85a3:0000:0000:8a2e:0370:7334'])

IPAddress(['0.0.0.0', '192.168.1.1', '2001:db8:85a3::8a2e:370:7334'])

From integers

In [3]:
cp.to_ipaddress([0, 3232235777, 42540766452641154071740215577757643572])

IPAddress(['0.0.0.0', '192.168.1.1', '2001:db8:85a3::8a2e:370:7334'])

From bytes

In [4]:
cp.to_ipaddress([
    b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00',
    b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xc0\xa8\x01\x01',
    b' \x01\r\xb8\x85\xa3\x00\x00\x00\x00\x8a.\x03ps4',
])

IPAddress(['0.0.0.0', '192.168.1.1', '2001:db8:85a3::8a2e:370:7334'])

Those all return instances of `IPAddress`, which is analogous to `Categorical`. An array-like container.

In [5]:
values = cp.IPAddress.from_pyints(
    [0, 3232235777, 42540766452641154071740215577757643572]
)
values

IPAddress(['0.0.0.0', '192.168.1.1', '2001:db8:85a3::8a2e:370:7334'])

## Pandas Containers

Our `IPAddress` array can be stored in pandas' containers.

In [6]:
s = pd.Series(values)
s

0                         0.0.0.0
1                     192.168.1.1
2    2001:db8:85a3::8a2e:370:7334
dtype: ip

In [7]:
df = pd.DataFrame({
    "A": [np.nan, 2, 3],
    "B": values
})
df

Unnamed: 0,A,B
0,,0.0.0.0
1,2.0,192.168.1.1
2,3.0,2001:db8:85a3::8a2e:370:7334


## IP Accessor

We register the ".ip" accessor with pandas.

In [8]:
s.ip.is_ipv4

0     True
1     True
2    False
dtype: bool

In [9]:
s.ip.is_ipv6

0    False
1    False
2     True
dtype: bool

In [10]:
s.isna()

0     True
1    False
2    False
dtype: bool

## Pandas Methods

Some pandas operations work correctly on IPAddress data.

Indexing:

In [11]:
df.loc[[0, 1], 'B']

0        0.0.0.0
1    192.168.1.1
Name: B, dtype: ip

In [12]:
df.loc[2, 'B']

IPv6Address('2001:db8:85a3::8a2e:370:7334')

In [13]:
df.iloc[1, 1]

IPv4Address('192.168.1.1')

Concatenation:

In [14]:
pd.concat([df, df], ignore_index=True)

Unnamed: 0,A,B
0,,0.0.0.0
1,2.0,192.168.1.1
2,3.0,2001:db8:85a3::8a2e:370:7334
3,,0.0.0.0
4,2.0,192.168.1.1
5,3.0,2001:db8:85a3::8a2e:370:7334


Null checking

In [15]:
df.isna()

Unnamed: 0,A,B
0,True,True
1,False,False
2,False,False


Many things don't (yet) work

In [16]:
df.B >= df.B

0    False
1     True
2     True
Name: B, dtype: bool

In [17]:
df.B.sort_values()

AttributeError: 'IPAddress' object has no attribute 'argsort'

In [18]:
arr = cp.IPAddress([10, 10, 1, 1, 5])

In [19]:
uniques = pd.unique(arr)

AttributeError: 'IPType' object has no attribute 'base'

In [22]:
df.fillna(method='bfill')  # (0, 'B') should have been filled

AttributeError: 'IPAddress' object has no attribute 'reshape'

In [23]:
df.groupby("B").A.count()

TypeError: 'IPAddress' object is not callable