`cyberpandas` provides a container, `IPArray` for holding IP Address data.
This array can efficiently process large arrays of IPv4 and IPv6 addresses.

It satisfies pandas' extension array interface, so these arrays can be used inside a pandas Series or DataFrame, just like a regular NumPy array.

`cyberpandas` also registers the `.ip` accessor on `pandas.Series` for accessing IP-related properties and methods on a Series with IP Address data.

In [1]:
import ipaddress

import numpy as np
import pandas as pd

import cyberpandas

## Creating Arrays

AN `IPArray` can be created from many sources with the `cyberpandas.to_ipaddress` parser:

From strings

In [2]:
cyberpandas.to_ipaddress([
    '0.0.0.0',
    '192.168.1.1',
    '2001:0db8:85a3:0000:0000:8a2e:0370:7334'
])

IPArray(['0.0.0.0', '192.168.1.1', '2001:db8:85a3::8a2e:370:7334'])

From integers

In [3]:
cyberpandas.to_ipaddress([
    0,
    3232235777,
    42540766452641154071740215577757643572
])

IPArray(['0.0.0.0', '192.168.1.1', '2001:db8:85a3::8a2e:370:7334'])

From bytes

In [4]:
cyberpandas.to_ipaddress([
    b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00',
    b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xc0\xa8\x01\x01',
    b' \x01\r\xb8\x85\xa3\x00\x00\x00\x00\x8a.\x03ps4',
])

IPArray(['0.0.0.0', '192.168.1.1', '2001:db8:85a3::8a2e:370:7334'])

From integers

In [5]:
values = cyberpandas.to_ipaddress([
    0,
    3232235777,
    42540766452641154071740215577757643572
])
values

IPArray(['0.0.0.0', '192.168.1.1', '2001:db8:85a3::8a2e:370:7334'])

## Pandas Integration

Our `IPArray` array can be stored in pandas' containers.

In [6]:
s = pd.Series(values)
s

0                         0.0.0.0
1                     192.168.1.1
2    2001:db8:85a3::8a2e:370:7334
dtype: ip

In [7]:
df = pd.DataFrame({
    "A": [np.nan, 2, 3],
    "B": values
})
df

Unnamed: 0,A,B
0,,0.0.0.0
1,2.0,192.168.1.1
2,3.0,2001:db8:85a3::8a2e:370:7334


Most pandas methods will work with `IPAddress` data.

Indexing

In [8]:
df.loc[[0, 1], 'B']

0        0.0.0.0
1    192.168.1.1
Name: B, dtype: ip

In [9]:
df.loc[2, 'B']

IPv6Address('2001:db8:85a3::8a2e:370:7334')

Concatenation:

In [10]:
pd.concat([df, df], ignore_index=True)

Unnamed: 0,A,B
0,,0.0.0.0
1,2.0,192.168.1.1
2,3.0,2001:db8:85a3::8a2e:370:7334
3,,0.0.0.0
4,2.0,192.168.1.1
5,3.0,2001:db8:85a3::8a2e:370:7334


Missing data

In [11]:
df.isna()

Unnamed: 0,A,B
0,True,True
1,False,False
2,False,False


In [12]:
df.dropna(subset=['B'])

Unnamed: 0,A,B
1,2.0,192.168.1.1
2,3.0,2001:db8:85a3::8a2e:370:7334


## Accessor

`cyberpandas` registers the `.ip` accessor with pandas `Series`.


With this accessor, you can access many IP-address-specific attributes and methods, similar to `.dt` and `.cat` for datetime and categorical data.

In [13]:
df.B.ip.is_ipv4

0    1
1    1
2    0
dtype: uint8

In [14]:
df.B.ip.isna

0    1
1    0
2    0
dtype: uint8