# Data Serialization 

**[What is data serialization?](https://docs.python-guide.org/)**

The process of converting structured data to a format that allows for sharing or storage of data. It is also intended to minimize data size, in order to reduce disk space or bandwidth requirements. 

## Flat or Nested
To begin with, we need to identify how the data should be structured, i.e. flat or nested. 

In [3]:
# Flat
flat = {'Type': 'A', 'field_1': 'value_1', 'field_2': 'value_2'}

# Nested
nested = {'A':
    {'field_1': 'value_1', 'field_2': 'value_2'}}

# Serializing Text
## CSV file - with flat data 


In [None]:
# Reading CSV content from a file 
import csv 
with open('/tmp/file.csv', newline='') as f: 
    reader = csv.reader(f) 
    for row in reader: 
        print(row)
        
# Writing CSV content to a file 
import csv 
with open('/tmp/file.csv', 'w', newline='') as f: 
    writer = csv.writer(f)
    writer.writerows(iterable)

## YAML file - with nested data 
YAML is a human-readable data-serialization language, commonly used for configuration files where data is transmitted or stored.

In [6]:
# Reading YAML content
import yaml 
with open('/tmp/file.yaml', 'r', newline='') as f:
    try: 
        print(yaml.load(f))
    except yaml.YAMLError as ymlexcp: 
        print(ymlexcp)

FileNotFoundError: [Errno 2] No such file or directory: '/tmp/file.yaml'

## JSON file - with nested data 
Python's JSON module can be used to read and write JSON Files.


In [None]:
# Reading
import json 
with open('/tmp/file.json', 'r') as f: 
    data = json.load(f)
    
# Writing
import json
with open('/tmp/file.josn', 'w') as f: 
    json.dump(data, f, sort_keys=True)

## XML file - with nested data
eXtensible Markup Language, similar to HTML, XML was designed to store and transport data. It is designed to be self-descriptive.

In [None]:
import xml.etree.ElementTree as ET 
tree = ET.parse('country_data.xml')
root = tree.getroot()

# Binary 
## NumPy Array - flat data 


In [7]:
import numpy as np 
# Converting NumPy array to byte format
byte_output = np.array([ [1, 2, 3], [4, 5, 6], [7, 8, 9] ]).tobytes()

# COnverting byte format back to NumPy array 
arra_format = np.frombuffer(byte_output)

## Pickle - nested data 
Pickle is the native data serialization module for Python. It converts Python objects (list, dict, etc.) into a character stream. The character stream reconstructs the object in another python script. 

In [8]:
import pickle 

# Dictionary 
grades = {'Addy': 90, "Bill":99, 'Cathy': 80}

# Use dumps to convert the object to a serialized string 
serial_grades = pickle.dumps(grades)

# Use loads to de-serialize an object 
received_grades = pickle.loads(serial_grades)


In [9]:
grades

{'Addy': 90, 'Bill': 99, 'Cathy': 80}

In [10]:
serial_grades

b'\x80\x03}q\x00(X\x04\x00\x00\x00Addyq\x01KZX\x04\x00\x00\x00Billq\x02KcX\x05\x00\x00\x00Cathyq\x03KPu.'

In [14]:
received_grades == grades

True