Skip to content

Commit

Permalink
[cleaner,ipv6] Add support for IPv6 obfuscation
Browse files Browse the repository at this point in the history
This commit adds a new parser and accompanying map for obfuscating IPv6
addresses.

This new parser will attempt to capture valid IPv6 networks and
addresses, and produce a mostly-randomized obfuscated pair. Due to the
multiple formats an IPv6 address can take, some identifiers are
necessary to preserve relevant information while still obfuscating
actual addresses and networks.

For example, global unicast addresses that have more than one defined
hextet (greater than /16 prefix) will always generate an obfuscated
address starting with `534f` (or 'so', continuing the style of our mac
address handling that uses 'sos' as an identifier). Addresses with a /16
prefix or less, will start with simply '53'. Private addresses, which
start with `fd` will generate an obfuscated address starting with
`fd53`, so that the contextual understanding that it is a private
network/address can remain. Link-local addresses which start with
`fe80::` will remain that way, only having the device hextets obfuscated
- again, keeping the contextual information that it is a link-local
  interface intact, as otherwise these obfuscations may confuse end
users reviewing an sos report for problems.

Note that the address `::1` and `::/0` are explicitly skipped and never
obfuscated, for the same reasons given above.

Additionally, this parser/map will write data to the default map (and
any per-run private maps) differently than previous parsers. Rather than
simply dumping the obfuscation pairs into the map, it is broken up via
network, with hosts belonging to that network nested inside those
network entries (still being json-formatted). Users will also note that
the ipv6 entries in the map also have a `version` key, which is intended
to be used for handling future updates to the parser/map when upgrading
from an older sos version to a newer one. This may or may not be carried
over to future updates to other parsers.

Closes: sosreport#3008
Related: RHBZ#2134906

Signed-off-by: Jake Hunsaker <jhunsake@redhat.com>
  • Loading branch information
TurboTurtle committed Nov 30, 2022
1 parent d499922 commit 5306202
Show file tree
Hide file tree
Showing 6 changed files with 426 additions and 6 deletions.
4 changes: 2 additions & 2 deletions man/en/sos-clean.1
Expand Up @@ -60,8 +60,8 @@ Note that using this option is very likely to leave sensitive information in pla
the target archive, so only use this option when absolutely necessary or you have complete
trust in the party/parties that may handle the generated report.

Valid values for this option are currently: \fBhostname\fR, \fBip\fR, \fBmac\fR, \fBkeyword\fR,
and \fBusername\fR.
Valid values for this option are currently: \fBhostname\fR, \fBip\fR, \fBipv6\fR,
\fBmac\fR, \fBkeyword\fR, and \fBusername\fR.
.TP
.B \-\-keywords KEYWORDS
Provide a comma-delimited list of keywords to scrub in addition to the default parsers.
Expand Down
11 changes: 8 additions & 3 deletions sos/cleaner/__init__.py
Expand Up @@ -25,6 +25,7 @@
from sos.cleaner.parsers.hostname_parser import SoSHostnameParser
from sos.cleaner.parsers.keyword_parser import SoSKeywordParser
from sos.cleaner.parsers.username_parser import SoSUsernameParser
from sos.cleaner.parsers.ipv6_parser import SoSIPv6Parser
from sos.cleaner.archives.sos import (SoSReportArchive, SoSReportDirectory,
SoSCollectorArchive,
SoSCollectorDirectory)
Expand Down Expand Up @@ -54,11 +55,14 @@ class SoSCleaner(SoSComponent):
that future iterations will maintain the same consistent obfuscation
pairing.
In the case of IP addresses, support is for IPv4 and efforts are made to
keep network topology intact so that later analysis is as accurate and
In the case of IP addresses, support is for IPv4 and IPv6 - effort is made
to keep network topology intact so that later analysis is as accurate and
easily understandable as possible. If an IP address is encountered that we
cannot determine the netmask for, a random IP address is used instead.
For IPv6, note that IPv4-mapped addresses, e.g. ::ffff:10.11.12.13, are
NOT supported currently, and will remain unobfuscated.
For hostnames, domains are obfuscated as whole units, leaving the TLD in
place.
Expand Down Expand Up @@ -123,6 +127,7 @@ def __init__(self, parser=None, args=None, cmdline=None, in_place=False,
self.parsers = [
SoSHostnameParser(self.cleaner_mapping, self.opts.domains),
SoSIPParser(self.cleaner_mapping),
SoSIPv6Parser(self.cleaner_mapping),
SoSMacParser(self.cleaner_mapping),
SoSKeywordParser(self.cleaner_mapping, self.opts.keywords,
self.opts.keyword_file),
Expand Down Expand Up @@ -447,7 +452,7 @@ def compile_mapping_dict(self):
_map = {}
for parser in self.parsers:
_map[parser.map_file_key] = {}
_map[parser.map_file_key].update(parser.mapping.dataset)
_map[parser.map_file_key].update(parser.get_map_contents())

return _map

Expand Down
2 changes: 1 addition & 1 deletion sos/cleaner/mappings/__init__.py
Expand Up @@ -39,7 +39,7 @@ def ignore_item(self, item):
if not item or item in self.skip_keys or item in self.dataset.values():
return True
for skip in self.ignore_matches:
if re.match(skip, item):
if re.match(skip, item, re.I):
return True

def add(self, item):
Expand Down
282 changes: 282 additions & 0 deletions sos/cleaner/mappings/ipv6_map.py
@@ -0,0 +1,282 @@
# Copyright 2022 Red Hat, Inc. Jake Hunsaker <jhunsake@redhat.com>

# This file is part of the sos project: https://github.com/sosreport/sos
#
# This copyrighted material is made available to anyone wishing to use,
# modify, copy, or redistribute it subject to the terms and conditions of
# version 2 of the GNU General Public License.
#
# See the LICENSE file in the source distribution for further information.

import ipaddress

from random import getrandbits
from sos.cleaner.mappings import SoSMap


def generate_hextets(hextets):
"""Generate a random set of hextets, based on the length of the source
hextet. If any hextets are compressed, keep that compression.
E.G. '::1234:bcd' will generate a leading empty '' hextet, followed by two
4-character hextets.
:param hextets: The extracted hextets from a source address
:type hextets: ``list``
:returns: A set of randomized hextets for use in an obfuscated
address
:rtype: ``list``
"""
return [random_hex(4) if h else '' for h in hextets]


def random_hex(length):
"""Generate a string of size length of random hex characters.
:param length: The number of characters to generate
:type length: ``int``
:returns: A string of ``length`` hex characters
:rtype: ``str``
"""
return f"{getrandbits(4*length):0{length}x}"


class SoSIPv6Map(SoSMap):
"""Mapping for IPv6 addresses and networks.
Much like the IP map handles IPv4 addresses, this map is designed to take
IPv6 strings and obfuscate them consistently to maintain network topology.
To do this, addresses will be manipulated by the ipaddress library.
If an IPv6 address is encountered without a netmask, it is assumed to be a
/64 address.
"""

networks = {}

ignore_matches = [
r'^::1/.*',
r'::/0',
r'fd53:.*',
r'^53..:'
]

first_hexes = ['534f']

compile_regexes = False
version = 1

def conf_update(self, config):
"""Override the base conf_update() so that we can load the existing
networks into ObfuscatedIPv6Network() objects for the current run.
"""
if 'networks' not in config:
return
for network in config['networks']:
_orig = ipaddress.ip_network(network)
_obfuscated = config['networks'][network]['obfuscated']
_net = self._get_network(_orig, _obfuscated)
self.dataset[_net.original_address] = _net.obfuscated_address
for host in config['networks'][network]['hosts']:
_ob_host = config['networks'][network]['hosts'][host]
_net.add_obfuscated_host_address(host, _ob_host)
self.dataset[host] = _ob_host

def sanitize_item(self, ipaddr):
_prefix = ipaddr.split('/')[-1] if '/' in ipaddr else ''
_ipaddr = ipaddr
if not _prefix:
# assume a /64 default per protocol
_ipaddr += "/64"
try:
_addr = ipaddress.ip_network(_ipaddr)
# ipaddr was an actual network per protocol
_net = self._get_network(_addr)
_ipaddr = _net.obfuscated_address
except ValueError:
# A ValueError is raised from the ipaddress module when passing
# an address such as 2620:52:0:2d80::4fe/64, which has host bits
# '::4fe' set - the /64 is generally interpreted only for network
# addresses. We use this behavior to properly obfuscate the network
# before obfuscating a host address within that network
_addr = ipaddress.ip_network(_ipaddr, strict=False)
_net = self._get_network(_addr)
if _net.network_addr not in self.dataset:
self.dataset[_net.original_address] = _net.obfuscated_address
# then, get the address within the network
_hostaddr = ipaddress.ip_address(_ipaddr.split('/')[0])
_ipaddr = _net.obfuscate_host_address(_hostaddr)

if _prefix and '/' not in _ipaddr:
return f"{_ipaddr}/{_prefix}"
return _ipaddr

def _get_network(self, address, obfuscated=''):
"""Attempt to find an existing ObfuscatedIPv6Network object from which
to either find an existing obfuscated match, or create a new one. If
no such object already exists, create it.
"""
_addr = address.compressed
if _addr not in self.networks:
self.networks[_addr] = ObfuscatedIPv6Network(address, obfuscated,
self.first_hexes)
return self.networks[_addr]


class ObfuscatedIPv6Network():
"""An abstraction class that represents a network that is (to be) handled
by sos.
Each distinct IPv6 network that we encounter will have a representative
instance of this class, from which new obfuscated subnets and host
addresses will be generated.
This class should be built from an ``ipaddress.IPv6Network`` object. If
an obfuscation string is not passed, one will be created during init.
"""

def __init__(self, addr, obfuscation='', used_hexes=None):
"""Basic setup for the obfuscated network. Minor validation on the addr
used to create the instance, as well as on an optional ``obfuscation``
which if set, will serve as the obfuscated_network address.
:param addr: The *un*obfuscated network to be handled
:type addr: ``ipaddress.IPv6Network``
:param obfuscation: An optional pre-determined string representation of
the obfuscated network address
:type obfuscation: ``str``
:param used_hexes: A list of already used hexes for the first hextet
of a potential global address obfuscation
:type used_hexes: ``list``
"""
if not isinstance(addr, ipaddress.IPv6Network):
raise Exception('Invalid network: not an IPv6Network object')
self.addr = addr
self.prefix = addr.prefixlen
self.network_addr = addr.network_address.compressed
self.hosts = {}
if used_hexes is None:
self.first_hexes = ['534f']
else:
self.first_hexes = used_hexes
if not obfuscation:
self._obfuscated_network = self._obfuscate_network_address()
else:
if not isinstance(obfuscation, str):
raise TypeError(f"Pre-determined obfuscated network address "
f"must be str, not {type(obfuscation)}")
self._obfuscated_network = obfuscation.split('/')[0]

@property
def obfuscated_address(self):
return f"{self._obfuscated_network}/{self.prefix}"

@property
def original_address(self):
return self.addr.compressed

def _obfuscate_network_address(self):
"""Generate the obfuscated pair for the network address. This is
determined based on the netmask of the network this class was built
on top of.
"""
if self.addr.is_global:
return self._obfuscate_global_address()
elif self.addr.is_link_local:
# link-local addresses are always fe80::/64. This is not sensitive
# in itself, and retaining the information that an address is a
# link-local address is important for problem analysis, so don't
# obfuscate this network information.
return self.network_addr
elif self.addr.is_private:
return self._obfuscate_private_address()
return self.network_addr

def _obfuscate_global_address(self):
"""Global unicast addresses have a 48-bit global routing prefix and a
16-bit subnet. We set the global routing prefix to a static
sos-specific identifier that could never be seen in the wild,
'534f:'
We then randomize the subnet hextet.
"""
_hextets = self.network_addr.split(':')[1:]
_ob_hex = ['534f']
if all(not c for c in _hextets):
# we have only a single defined hextet, e.g. ff00::/64, so we need
# to not use the standard first-hex identifier or we'll overlap
# every similar address obfuscation.
# Set the leading bits to 53, but increment upwards from there for
# when we exceed 256 networks obfuscated in this manner.
_start = 53 + (len(self.first_hexes) // 256)
_ob_hex = f"{_start}{random_hex(2)}"
while _ob_hex in self.first_hexes:
# prevent duplicates
_ob_hex = f"{_start}{random_hex(2)}"
self.first_hexes.append(_ob_hex)
_ob_hex = [_ob_hex]
_ob_hex.extend(generate_hextets(_hextets))
return ':'.join(_ob_hex)

def _obfuscate_private_address(self):
"""The first 8 bits will always be 'fd', the next 40 bits are meant
to be a global ID, followed by 16 bits for the subnet. To keep things
relatively simply we maintain the first hextet as 'fd53', and then
randomize any remaining hextets
"""
_hextets = self.network_addr.split(':')[1:]
_ob_hex = ['fd53']
_ob_hex.extend(generate_hextets(_hextets))
return ':'.join(_ob_hex)

def obfuscate_host_address(self, addr):
"""Given an unobfuscated address, generate an obfuscated match for it,
and save it to this network for tracking during the execution of clean.
Note: another way to do this would be to convert the obfuscated network
to bytes, and add a random amount to that based on the number of
addresses that the network can support and from that new bytes count
craft a new IPv6 address. This has the advantage of absolutely
guaranteeing the new address is within the network space (whereas the
method employed below could *theoretically* generate an overlapping
address), but would in turn remove any ability to compress obfuscated
addresses to match the general format/syntax of the address it is
replacing. For the moment, it is assumed that being able to maintain a
quick mental note of "unobfuscated device ff00::1 is obfuscated device
53ad::a1b2" is more desireable than "ff00::1 is now obfuscated as
53ad::1234:abcd:9876:a1b2:".
:param addr: The unobfuscated IPv6 address
:type addr: ``ipaddress.IPv6Address``
:returns: An obfuscated address within this network
:rtype: ``str``
"""
def _generate_address():
return ''.join([
self._obfuscated_network,
':'.join(generate_hextets(_host.split(':')))
])

if addr.compressed not in self.hosts:
try:
_, _host = addr.compressed.split(self.network_addr.rstrip(':'))
except ValueError:
# network addr is simply '::'
_n, _host = addr.compressed.split(self.network_addr)
_host = _host.lstrip(':')
_ob_host = _generate_address()
while _ob_host in self.hosts.values():
_ob_host = _generate_address()
self.add_obfuscated_host_address(addr.compressed, _ob_host)
return self.hosts[addr.compressed]

def add_obfuscated_host_address(self, host, obfuscated):
"""Adds an obfuscated pair to the class for tracking and ongoing
consistency in obfuscation.
"""
self.hosts[host] = obfuscated

0 comments on commit 5306202

Please sign in to comment.