# Advanced Python - Building Scalable Applications

### Module 5

#### Marshalling and Data Persistence (contd.)
 - Using ```dill``` for session replication, suspend-resume semantics
 - Parsing common file formats: ```CSV```, ```JSON```, ```YAML```, ```TOML``` and ```MsgPack```
 - Parsing HTML and XML using ```lxml```
 - Using ```shelve``` for persistent dictionaries
 
#### Profiling and Debugging Techniques in Python
 - Using `sys.getsizeof()`, `sys.getrefcount()`, `system.getswitchinterval()`
 - Memory profiling using `pympler` and `memray`
 - Using `cProfile` and `timeit` modules
 - Using `line_profiler`
 - Using `inspect` and `pdb`
 - Using the `logging` module
 - An overview on `py-spy` and `scalene`





Install using the command: `pip install dill`
```
import dill
dill.dump_session("session.dat") # saves the current state of python environment to session.dat
dill.load_session("session.dat") # restores the state from the file session.dat
```

In [1]:
!pip install orjson

Collecting orjson
  Downloading orjson-3.11.7-cp314-cp314-macosx_15_0_arm64.whl.metadata (41 kB)
Downloading orjson-3.11.7-cp314-cp314-macosx_15_0_arm64.whl (125 kB)
Installing collected packages: orjson
Successfully installed orjson-3.11.7


In [2]:
import orjson as json

In [5]:
json.dumps({"name": "Alice", "age": 30}, option=json.OPT_INDENT_2)

b'{\n  "name": "Alice",\n  "age": 30\n}'

In [4]:
json.dumps?

[31mSignature:[39m json.dumps(obj, /, default=[38;5;28;01mNone[39;00m, option=[38;5;28;01mNone[39;00m)
[31mDocstring:[39m Serialize Python objects to JSON.
[31mType:[39m      builtin_function_or_method

In [7]:
import json

In [11]:
print(json.dumps({"name": "Alice", "age": 30}, indent=4))

{
    "name": "Alice",
    "age": 30
}


In [12]:
json.dumps({10, 20, 30, 40, 50})

TypeError: Object of type set is not JSON serializable

The json library allows serializing the following Python objects:

  - None  -> null
  - bool (True / False) -> (true / false)
  - int
  - float
  - tuple -> JSON Array
  - list -> JSON Array
  - dict -> JSON Object 

In [15]:
import json

d = {"name": "Alice", 
     "age": 30,
     "visited": ("New York", "Los Angeles", "Chicago"),
    "is_active": True,
    "last_login": None
}

print(json.dumps(d, indent=4))

{
    "name": "Alice",
    "age": 30,
    "visited": [
        "New York",
        "Los Angeles",
        "Chicago"
    ],
    "is_active": true,
    "last_login": null
}


In [16]:
import msgpack  # pip install msgpack

d = {"name": "Alice", 
     "age": 30,
     "visited": ("New York", "Los Angeles", "Chicago"),
    "is_active": True,
    "last_login": None
}

print(msgpack.dumps(d))

b'\x85\xa4name\xa5Alice\xa3age\x1e\xa7visited\x93\xa8New York\xabLos Angeles\xa7Chicago\xa9is_active\xc3\xaalast_login\xc0'


In [17]:
m = b'\x85\xa4name\xa5Alice\xa3age\x1e\xa7visited\x93\xa8New York\xabLos Angeles\xa7Chicago\xa9is_active\xc3\xaalast_login\xc0'
print(msgpack.loads(m))

{'name': 'Alice', 'age': 30, 'visited': ['New York', 'Los Angeles', 'Chicago'], 'is_active': True, 'last_login': None}


In [1]:
ls

Day_5_Notebook.ipynb
config.toml
cpu_hog.py
mem_hog.py
memray-flamegraph-mem_hog.py.55777.html
memray-mem_hog.py.55777.bin
session1
[34mshelve_examples[m[m/
[34mxml_html_parsing[m[m/


In [None]:
import tomllib 
# The tomllib module is available in Python 3.11 and later
# The tomllib module only supports reading TOML files, not writing them

with open("config.toml", "rb") as f:
    config = tomllib.load(f)

print(config)

{'user': {'name': 'John', 'visited': ['Chennai', 'Bengaluru', 'Noida'], 'active': False}, 'info': {'roles': 'user', 'place': 'Delhi'}}


In [1]:
pip install toml

Note: you may need to restart the kernel to use updated packages.


In [2]:
import toml # Supports both reading and writing TOML files

config = {
    "database": {
        "host": "localhost",
        "port": 5432,
        "username": "admin",
        "password": "secret"
    },
    "logging": {
        "level": "INFO",
        "file": "app.log"
    },
    "session": {
        "timeout": 30,
        "secure": True
    }
}

with open("server_config.toml", "w") as outs:
    toml.dump(config, outs)
    print("TOML configuration has been written to server_config.toml")

TOML configuration has been written to server_config.toml


In [3]:
with open("server_config.toml") as f:
    config = toml.load(f)
    print(config)
    

{'database': {'host': 'localhost', 'port': 5432, 'username': 'admin', 'password': 'secret'}, 'logging': {'level': 'INFO', 'file': 'app.log'}, 'session': {'timeout': 30, 'secure': True}}


In [8]:
with open("../.git/config") as f:
    config = toml.load(f)
    print(config)

TomlDecodeError: Invalid group name 'remote "origin"'. Try quoting it. (line 8 column 1 char 137)

In [9]:
toml.load?

[31mSignature:[39m toml.load(f, _dict=<[38;5;28;01mclass[39;00m [33m'dict'[39m>, decoder=[38;5;28;01mNone[39;00m)
[31mDocstring:[39m
Parses named file or files as toml and returns a dictionary

Args:
    f: Path to the file to open, array of files to read into single dict
       or a file descriptor
    _dict: (optional) Specifies the class of the returned toml dictionary
    decoder: The decoder to use

Returns:
    Parsed toml file represented as a dictionary

Raises:
    TypeError -- When f is invalid type
    TomlDecodeError: Error while decoding toml
    IOError / FileNotFoundError -- When an array with no valid (existing)
    (Python 2 / Python 3)          file paths is passed
[31mFile:[39m      /opt/anaconda3/envs/myenv/lib/python3.14/site-packages/toml/decoder.py
[31mType:[39m      function

In [10]:
pip install pyyaml

Note: you may need to restart the kernel to use updated packages.


In [12]:
import yaml

In [14]:
with open("server_config.yaml", "w") as outs:
    yaml.dump(config, outs)
    print("YAML configuration has been written to server_config.yaml")

YAML configuration has been written to server_config.yaml


In [16]:
with open("server_config.yaml") as ins:
    cfg = yaml.load(ins, Loader=yaml.FullLoader)
    print(cfg)

{'database': {'host': 'localhost', 'password': 'secret', 'port': 5432, 'username': 'admin'}, 'logging': {'file': 'app.log', 'level': 'INFO'}, 'session': {'secure': True, 'timeout': 30}}
