# Files

There are times you want to read data from or save data to a file. In programming we use libraries that **serializing** data. 

> In computing, serialization is the process of translating a data structure or object state into a format that can be stored (e.g. files in secondary storage devices, data buffers in primary storage devices) or transmitted (e.g. data streams over computer networks) and reconstructed later (possibly in a different computer environment). When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object. ([Wikipedia ref](https://en.wikipedia.org/wiki/Serialization))

You can look at the root of the word **serial** which means one-after-the-other. Basically, we reorder data in a program so it is an array of bits. This allows us to easily write this to a file.

![](./pics/serial-data.webp)

Besides files, serializers can be use to write data to a network using TCP or UDP communications and send it across the internet.

Here are some simple but useful serializers used to read/write files:

| Type | Pro | Con | Serialize | Deserialize
|---|---|---|---|---|
| `json` | Used heavily by internet | No comments in file | `dump()` | `load()`
| `yaml` | Used as config files and supports comments | None | `dump()` | `load()`
| `pickle` | Binary format, so makes small files, but not human readable | Python only | `dump()` or `pack()` | `load()` or `unpack()`

In [1]:
import json
import yaml
import pickle

## Json

```json
{
    "tom": 21,
    "bob": {
        "a": 1,
        "b": 2
    },
    "Salley": [1,2,3]
}
```

In [35]:
j = {}
j["tom"] = 21
j["bob"] = {"a": 1, "b": 2}
j["Salley"] = [1,2,3,4]
print(j)

{'tom': 21, 'bob': {'a': 1, 'b': 2}, 'Salley': [1, 2, 3, 4]}


In [36]:
s = json.dumps(j) # turn into json
print(s)
print(type(s))
jj = json.loads(s)
print(jj)
print("Are they the same?", j == jj)

{"tom": 21, "bob": {"a": 1, "b": 2}, "Salley": [1, 2, 3, 4]}
<class 'str'>
{'tom': 21, 'bob': {'a': 1, 'b': 2}, 'Salley': [1, 2, 3, 4]}
Are they the same? True


In [37]:
# write data to file ... see "w" in command
with open("test.json","w") as fd:
    json.dump(j, fd)

In [38]:
# show what the test.json file looks like
!!cat test.json

['{"tom": 21, "bob": {"a": 1, "b": 2}, "Salley": [1, 2, 3, 4]}']

In [39]:
with open("test.json","r") as fd:
    jjj = json.load(fd)

print("Are they the same:", j == jjj)
print(jjj)

Are they the same: True
{'tom': 21, 'bob': {'a': 1, 'b': 2}, 'Salley': [1, 2, 3, 4]}


## Yaml

```yaml
# this is a comment and allows us to remove things
# from a file without deleting them
tom: 21
# Mark: 1234 this line will not be read
bob:
    a: 1
    b: 2
Salley:
    - 1
    - 2
    - 3
    - 4

```

In [40]:
print(j)
s = yaml.dump(j) # turn into json
print(s)
print(type(s))
jj = yaml.safe_load(s)
print(jj)
print("Are they the same?", j == jj)

{'tom': 21, 'bob': {'a': 1, 'b': 2}, 'Salley': [1, 2, 3, 4]}
Salley:
- 1
- 2
- 3
- 4
bob:
  a: 1
  b: 2
tom: 21

<class 'str'>
{'Salley': [1, 2, 3, 4], 'bob': {'a': 1, 'b': 2}, 'tom': 21}
Are they the same? True


In [41]:
with open("test.yaml","w") as fd:
    yaml.dump(j, fd)

In [42]:
!!cat test.yaml

['Salley:', '- 1', '- 2', '- 3', '- 4', 'bob:', '  a: 1', '  b: 2', 'tom: 21']

In [43]:
with open("test.yaml","r") as fd:
    jjj = yaml.safe_load(fd)

print("Are they the same:", j == jjj)
print(jjj)

Are they the same: True
{'Salley': [1, 2, 3, 4], 'bob': {'a': 1, 'b': 2}, 'tom': 21}


## Pickle

Unlike the two previous serializers (json and yaml), this produces a binary result.

```python
# example of pickled data, this is a binary string
b'\x80\x04\x958\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x03tom\x94K\x15\x8c\x03bob\x94}\x94(\x8c\x01a\x94K\x01\x8c\x01b\x94K\x02u\x8c\x06Salley\x94]\x94(K\x01K\x02K\x03K\x04eu.'
```

We say this is _not_ a human readable because you need to convert bites into letters, numbers, or other data. Below shows an example of a binary dump. The left side is binary data and the right is anything that can be put into a format a human can understand.

![](./pics/binary.jpg)

In [45]:
print(j)
s = pickle.dumps(j) # turn into json
print(s)
print(type(s))
jj = pickle.loads(s)
print(jj)
print("Are they the same?", j == jj)

{'tom': 21, 'bob': {'a': 1, 'b': 2}, 'Salley': [1, 2, 3, 4]}
b'\x80\x04\x958\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x03tom\x94K\x15\x8c\x03bob\x94}\x94(\x8c\x01a\x94K\x01\x8c\x01b\x94K\x02u\x8c\x06Salley\x94]\x94(K\x01K\x02K\x03K\x04eu.'
<class 'bytes'>
{'tom': 21, 'bob': {'a': 1, 'b': 2}, 'Salley': [1, 2, 3, 4]}
Are they the same? True


In [47]:
# since this is binary, notice the write is now "wb" where
# the 'b' is for binary
with open("test.pkl","wb") as fd:
    pickle.dump(j, fd)

In [48]:
!!cat test.pkl

['�\x04�8\x00\x00\x00\x00\x00\x00\x00}�(�\x03tom�K\x15�\x03bob�}�(�\x01a�K\x01�\x01b�K\x02u�\x06Salley�]�(K\x01K\x02K\x03K\x04eu.']

In [51]:
with open("test.pkl","rb") as fd:
    jjj = pickle.load(fd)

print("Are they the same:", j == jjj)
print(jjj)

Are they the same: True
{'tom': 21, 'bob': {'a': 1, 'b': 2}, 'Salley': [1, 2, 3, 4]}
