# Serialization

### Insecure Deserialization

A vulnerability leads to:
- abuse application logic,
- deny service, or
- execute arbitrary code when an object being deserialized

> [CWE-502: Deserialization of Untrusted Data](https://cwe.mitre.org/data/definitions/502.html)

> Don't compile and run untrusted pull request.<br>
Don't load pickle files you found on a street.

## Pickle

### Pickling

The pickle module implements binary protocols for serializing and de-serializing a Python object structure.

Preferred way to serializa python objects.

### Pickling Basics

In [102]:
import pickle

# An arbitrary collection of objects supported by pickle.
data = {
    'a': [1, 2.0, 3, 4+6j],
    'b': ("character string", b"byte string"),
    'c': {None, True, False}
}

canned_data = pickle.dumps(data)
print(canned_data)

b'\x80\x03}q\x00(X\x01\x00\x00\x00aq\x01]q\x02(K\x01G@\x00\x00\x00\x00\x00\x00\x00K\x03cbuiltins\ncomplex\nq\x03G@\x10\x00\x00\x00\x00\x00\x00G@\x18\x00\x00\x00\x00\x00\x00\x86q\x04Rq\x05eX\x01\x00\x00\x00bq\x06X\x10\x00\x00\x00character stringq\x07C\x0bbyte stringq\x08\x86q\tX\x01\x00\x00\x00cq\ncbuiltins\nset\nq\x0b]q\x0c(\x89\x88Ne\x85q\rRq\x0eu.'


In [103]:
pickle.loads(canned_data)

{'a': [1, 2.0, 3, (4+6j)],
 'b': ('character string', b'byte string'),
 'c': {False, None, True}}

### Pickle Security

`pickle` security warning at docs.python.org:

> **Warning:** The pickle module **is not secure**. Only unpickle data you trust.<br><br>
It is possible to construct malicious pickle data which will **execute arbitrary code during unpickling**. Never unpickle data that could have come from an untrusted source, or that could have been tampered with.

### Pickle Code Execution

Pickle can also handle the module imports and execute functions:

``` python
import pickle
pickle.loads(b"cos\nsystem\n(S'echo hello world'\ntR.")
```
This will call OS `echo hello world`:
``` bash
$ python pickle_system.py
hello world
```

### Secure `pickle` Usage

If you brave enough and need to use pickle here some techniques can reduce
security risks:

1. When loading from disk ensure strict permissions

2. When loading from network use cryptographic signature

### Signing Serialized Data

In [104]:
# pickle come complex object
import pickle
data = pickle.dumps({'a': [1, 2.0, 3, (4+6j)], 'b': ('character string', b'byte string')})

In [105]:
# assume both parties agreed on some random secret key for the session
import secrets
secret_random_key = secrets.token_bytes(32)

In [106]:
# send a pickled data along with its digest
import hmac
digest = hmac.new(secret_random_key, data, 'sha256').digest()
payload = digest + data
print(payload)

b'R\r\x06\x90\xdav5\x04\xf9\x98.\xd5\x8d\xcd-\x9e\xf3@A\xd9:A\xbfB\xe3\x0e_\x03M50\x83\x80\x03}q\x00(X\x01\x00\x00\x00aq\x01]q\x02(K\x01G@\x00\x00\x00\x00\x00\x00\x00K\x03cbuiltins\ncomplex\nq\x03G@\x10\x00\x00\x00\x00\x00\x00G@\x18\x00\x00\x00\x00\x00\x00\x86q\x04Rq\x05eX\x01\x00\x00\x00bq\x06X\x10\x00\x00\x00character stringq\x07C\x0bbyte stringq\x08\x86q\tu.'


In [107]:
# recieve the payload and verify the digest before using the data
def unpickle_payload(payload):
    rcv_digest = payload[:32]
    rcv_data = payload[32:]
    expected_digest = hmac.new(secret_random_key, rcv_data, 'sha256').digest()
    if not secrets.compare_digest(rcv_digest, expected_digest):
        raise Exception('Integrity check failed')
    return pickle.loads(rcv_data)

objects = unpickle_payload(payload)
print(objects)

{'a': [1, 2.0, 3, (4+6j)], 'b': ('character string', b'byte string')}


In [108]:
# check if unauthorized modification detected
unpickle_payload(payload + b'hack')

Exception: Integrity check failed

### Restricted Unpickler

Limit unpickler to 'safe' objects to mitigate risk of code execution.

> Restricted unpickler does not mitigate all security risks with `pickle.load()`.


At the end whis will not protect you. Dedicated hacker can always find a way.
Avoid using pickle. Consider JSON or protobuf as secure alternatives.

[python-can-i-safely-unpickle-untrusted-data](https://stackoverflow.com/questions/25353753/python-can-i-safely-unpickle-untrusted-data)

[reverse enginnering pickle](https://hackmd.io/@2KUYNtTcQ7WRyTsBT7oePg/BycZwjKNX?print-pdf#/)

In [111]:
import builtins
import pickle

safe_builtins = {
    'range',
    'complex',
    'set',
    'frozenset',
    'slice',
}

class RestrictedUnpickler(pickle.Unpickler):

    def find_class(self, module, name):
        # Only allow safe classes from builtins.
        if module == "builtins" and name in safe_builtins:
            return getattr(builtins, name)
        # Forbid everything else.
        raise pickle.UnpicklingError("global '%s.%s' is forbidden" %
                                     (module, name))

In [110]:
import io
def restricted_loads(s):
    """Helper function analogous to pickle.loads()."""
    return RestrictedUnpickler(io.BytesIO(s)).load()

In [112]:
# try loading builtins
restricted_loads(pickle.dumps({'a': [1, 2.0, 3, (4+6j)], 'b': ('character string', b'byte string')}))

{'a': [1, 2.0, 3, (4+6j)], 'b': ('character string', b'byte string')}

In [113]:
# try loading dangerous code
restricted_loads(b"cos\nsystem\n(S'echo hello world'\ntR.")

UnpicklingError: global 'os.system' is forbidden

### Numpy Loading

Numpy is vulnerable to remote code execution when loading arrays of objects.

Prefer using `allow_pickle=False` when saving and loading in numpy.

In [None]:
import numpy

import io
outfile = io.BytesIO()

numpy.save(outfile, numpy.arange(10), allow_pickle=False)
_ = outfile.seek(0) # Only needed here to simulate closing & reopening file
print(numpy.load(outfile, allow_pickle=False))


### Numpy Loading: CVE-2019-6446 exploit

allow_pickle is True by default before numpy version 1.16.3:

In [114]:
import numpy
import io
import pickle
import os

class Test(object):
    def __init__(self):
        self.a = 1

    def __reduce__(self):
        return (os.system,('echo Remote code executed',))
tmpdaa = Test()

numpy.load(io.BytesIO(pickle.dumps(tmpdaa)), allow_pickle=True)


0

## Yaml

### What is YAML?

Yaml is data serialization format.

It provides a simple reading and editing in plain text, thus frequiently used as a format for various configuration files.

### Yaml vs. JSON:

- Yaml:
``` yaml
klocwork_linux:
  os:
    ubuntu18
klocwork_win:
  os:
    windows
```
- JSON:
``` json
klocwork_linux: {
  os: ubuntu18,
},
klocwork_win: {
  os: windows
}
```

TODO: basic types

TODO: advanced python types

### Yaml Safe Loading

Prefer using `yaml.safe_load()` and `yaml.safe_dump()` when loading and saving yaml files.

`yaml.load()` is as powerfull as `pickle.load()`.

Yaml can construct arbitray python objects and `yaml.safe_load()` limits this to
simple Python objects like lists and integers.

In [None]:
import yaml
text = yaml.safe_dump({'a':1})
yaml.safe_load(text)

### Yaml Code Execution

In [146]:
import subprocess
subprocess.check_output('echo EXPLOIT!!!', shell=True)

import yaml
data = b"""
!!python/object/new:type
  args: ["z", !!python/tuple [], {"extend": !!python/name:exec }]
  listitems: "import os; os.system('echo EXPLOIT! >> exploit_yyaml.txt')"
"""
deserialized_data = yaml.load(data, Loader=yaml.FullLoader) # deserializing data


<class 'yaml.constructor.z'>


### Yaml Objects

PyYAML provides a hooks and helper classes to use full power of PyYAML safely.

To declare your object can be loaded safely you can inherit it from
yaml.YAMLObject and set `yaml_loader=yaml.SafeLoader`.

In [None]:
import yaml

class Monster(yaml.YAMLObject):
    yaml_tag = u'!Monster'
    yaml_loader = yaml.SafeLoader
    yaml_dumper = yaml.SafeDumper
    def __init__(self, name, hp, ac, attacks):
        self.name = name
        self.hp = hp
        self.ac = ac
        self.attacks = attacks
print(yaml.safe_dump(Monster(name='Cave lizard', hp=[3,6], ac=16, attacks=['BITE','HURT'])))
print(yaml.safe_load('!Monster {ac: 16, attacks: [BITE, HURT], hp: [3, 6], name: Cave lizard}'))

### Implicit Yaml Objects

What if you need advanced scalar construction?

``` yaml
tests:
- mark: pytest.mark.xfail
  name: feature_A_exists
- name: feature_B_exists
```

In [147]:
# First check what pytest mark is
import pytest
print(pytest.mark.xfail)
print(pytest.mark.__getattr__('xfail'))

MarkDecorator(mark=Mark(name='xfail', args=(), kwargs={}))
MarkDecorator(mark=Mark(name='xfail', args=(), kwargs={}))


In [126]:
# Now we can register constructor and representer:
import re
import yaml
import pytest

def pytest_mark_constructor(loader, node):
    value = loader.construct_scalar(node)
    return pytest.mark.__getattr__(value.rsplit(".", 1)[1])
def pytest_mark_representer(dumper, data):
    return dumper.represent_scalar('tag:yaml.org,2002:str', u'pytest.mark.%s' % data.name)
yaml.add_constructor(u'!pytest.mark', pytest_mark_constructor, Loader=yaml.SafeLoader)
yaml.add_implicit_resolver(u'!pytest.mark', re.compile(r'^pytest\.mark\.[a-zA-Z]+$'))
yaml.add_representer(type(pytest.mark.xfail), pytest_mark_representer, Dumper=yaml.SafeDumper)
# try load pytest.mark. scalar
print(yaml.safe_load('{mark: pytest.mark.xfail}'))
print(yaml.safe_load('mark: "pytest.mark.xfail"'))
# check is dumping pytest.mark. scalar works
print(yaml.safe_dump({'mark':pytest.mark.xfail}))
print ({'mark':pytest.mark.xfail})

{'mark': 'pytest.mark.xfail'}
{'mark': 'pytest.mark.xfail'}
mark: pytest.mark.xfail

{'mark': MarkDecorator(mark=Mark(name='xfail', args=(), kwargs={}))}


In [None]:
todo: add detailed explanation of constructor representer and types