<span>
<img src="http://ash.readthedocs.io/en/latest/_static/ash.png" width="260px" align="right"/>
</span>
<span>
<b>Author:</b> <a href="https://andreafailla.github.io">Andrea Failla</a><br/>
<b>Python version:</b>  3.9<br/>
<b>ASH version:</b>  0.1.0<br/>
<b>Last update:</b> July 2025
</span>

<a id="attributed-stream-hypergraphs-ash"></a>
# Attributed Stream Hypergraphs (ASH) - IO & Serialization

In [13]:
#!pip install -e ../

<a id="table-of-contents"></a>
# Table of Contents

- [Introduction](#introduction)
- [Creating an Example ASH](#creating-example-ash)
- [Writing Node Profiles to CSV](#writing-profiles-csv)
- [Reading Node Profiles from CSV](#reading-profiles-csv)
- [Profiles JSONL (read/write + gzip)](#profiles-jsonl)
- [Hyperedges CSV (structure only)](#hyperedges-csv)
- [Full ASH JSON (read/write + gzip)](#ash-json)
- [HIF Format (Hypergraph Interchange Format)](#hif-format)
- [Integrity & Round-Trip Checks](#integrity-checks)
- [Best Practices & Tips](#best-practices)


<a id="introduction"></a>
## Introduction

This tutorial covers persistence and interchange of ASH objects and their related data (node profiles and hyperedges).
You will learn how to:

1. Export/import time-varying node profiles (CSV & JSONL)
2. Export/import timestamped hyperedges (CSV)
3. Serialize/deserialize a complete ASH (JSON, optional gzip)
4. Use the HIF (Hypergraph Interchange Format) exporter/importer for interoperability
5. Perform integrity checks after round-trips

[üîù To top](#table-of-contents)

<a id="creating-example-ash"></a>
## Creating an Example ASH

We'll build a small ASH with a few nodes, temporal node attributes, and hyperedges.
This instance will be serialized using the different formats.

[üîù To top](#table-of-contents)

In [1]:
from ash_model import ASH, NProfile
from ash_model.readwrite import (
    write_profiles_to_csv, read_profiles_from_csv,
    write_profiles_to_jsonl, read_profiles_from_jsonl,
    write_sh_to_csv, read_sh_from_csv,
    write_ash_to_json, read_ash_from_json,
    write_hif, read_hif
)
import os, gzip, json, tempfile, shutil

ash = ASH()
# Node temporal attributes
ash.add_node(1, start=0, end=2, attr_dict={"role": "admin", "score": 10})
ash.add_node(1, start=3, attr_dict={"role": "admin", "score": 12})
ash.add_node(2, start=0, end=3, attr_dict={"role": "user", "score": 5})
ash.add_node(3, start=1, end=3, attr_dict={"role": "guest", "score": 7})

# Hyperedges with attributes & temporal spans
ash.add_hyperedge([1,2], start=0, end=1, weight=1.0, kind="pair")
ash.add_hyperedge([2,3], start=1, end=2, weight=2.0, kind="pair")
ash.add_hyperedge([1,2,3], start=2, end=3, weight=3.0, kind="group")

print(f"Nodes: {list(ash.nodes())}")
print(f"Hyperedges: {list(ash.hyperedges())}")

# Temp working directory for artifacts
workdir = tempfile.mkdtemp(prefix="ash_io_demo_")
workdir

Nodes: [1, 2, 3]
Hyperedges: ['e3', 'e1', 'e2']


'/var/folders/ym/_r1d94191y5bkvmtmz9tv7cw0000gn/T/ash_io_demo_ys24r7li'

<a id="writing-profiles-csv"></a>
## Writing Node Profiles to CSV

`write_profiles_to_csv` exports one row per (node, timestamp) with all node attributes.
The header lists attribute names. Types are auto-inferred when reading.

[üîù To top](#table-of-contents)

In [2]:
profiles_csv = os.path.join(workdir, "profiles.csv")
write_profiles_to_csv(ash, profiles_csv)
print(open(profiles_csv).read())

node_id,tid,role,score
1,0,admin,10
1,1,admin,10
1,2,admin,10
1,3,admin,12
2,0,user,5
2,1,user,5
2,2,user,5
2,3,user,5
3,1,guest,7
3,2,guest,7
3,3,guest,7



<a id="reading-profiles-csv"></a>
## Reading Node Profiles from CSV

`read_profiles_from_csv` rebuilds a dict: `{node: {tid: NProfile}}` with type inference.

[üîù To top](#table-of-contents)

In [3]:
profiles_dict = read_profiles_from_csv(profiles_csv)
# Show reconstructed profile for node 1 over time
for tid, prof in profiles_dict[1].items():
    print(tid, prof.get_attributes())

['node_id,tid,role,score\n', '1,0,admin,10\n', '1,1,admin,10\n', '1,2,admin,10\n', '1,3,admin,12\n', '2,0,user,5\n', '2,1,user,5\n', '2,2,user,5\n', '2,3,user,5\n', '3,1,guest,7\n', '3,2,guest,7\n', '3,3,guest,7\n']
0 {'role': 'admin', 'score': 10}
1 {'role': 'admin', 'score': 10}
2 {'role': 'admin', 'score': 10}
3 {'role': 'admin', 'score': 12}


<a id="profiles-jsonl"></a>
## Profiles JSONL (read/write + gzip)

`write_profiles_to_jsonl` appends one JSON object per (node, tid). Optional gzip compression.
Recommended for large datasets due to streaming-friendly format.

[üîù To top](#table-of-contents)

In [4]:
profiles_jsonl = os.path.join(workdir, "profiles.jsonl")
write_profiles_to_jsonl(ash, profiles_jsonl)
print("Sample lines:")
print('\n'.join(open(profiles_jsonl).read().splitlines()[:3]))

# Gzip compressed
profiles_jsonl_gz = profiles_jsonl + ".gz"
write_profiles_to_jsonl(ash, profiles_jsonl_gz, compress=True)
with gzip.open(profiles_jsonl_gz, 'rt') as f:
    print("Compressed first line:", f.readline().strip())

# Read back
jsonl_profiles = read_profiles_from_jsonl(profiles_jsonl)
jsonl_profiles_gz = read_profiles_from_jsonl(profiles_jsonl_gz, compress=True)
print("Node 2 tids (plain):", list(jsonl_profiles[2].keys()))
print("Node 2 tids (gzip):", list(jsonl_profiles_gz[2].keys()))

Sample lines:
{"node_id": 1, "attrs": {"role": "admin", "score": 10}, "tid": 0}
{"node_id": 1, "attrs": {"role": "admin", "score": 10}, "tid": 1}
{"node_id": 1, "attrs": {"role": "admin", "score": 10}, "tid": 2}
Compressed first line: {"node_id": 1, "attrs": {"role": "admin", "score": 10}, "tid": 0}
Node 2 tids (plain): [0, 1, 2, 3]
Node 2 tids (gzip): [0, 1, 2, 3]


<a id="hyperedges-csv"></a>
## Hyperedges CSV (structure only)

`write_sh_to_csv` exports each hyperedge presence interval as a row: `n1,n2,...\tstart,end`.
Attributes are NOT preserved. Use for lightweight temporal structure dumps.

[üîù To top](#table-of-contents)

In [5]:
hedges_csv = os.path.join(workdir, "hyperedges.csv")
write_sh_to_csv(ash, hedges_csv)
print(open(hedges_csv).read())

ash_from_hedges = read_sh_from_csv(hedges_csv)
print("Recovered hyperedges count:", ash_from_hedges.number_of_hyperedges())

nodes	start,end
1,2,3	2,3
1,2	0,1
2,3	1,2

Recovered hyperedges count: 3


<a id="ash-json"></a>
## Full ASH JSON (read/write + gzip)

`write_ash_to_json` serializes the entire structure including attributes.
`read_ash_from_json` reconstructs it (re-adding hyperedges over time).
Use gzip for large networks.

[üîù To top](#table-of-contents)

In [6]:
ash_json = os.path.join(workdir, "ash.json")
write_ash_to_json(ash, ash_json)
print("JSON size (bytes):", os.path.getsize(ash_json))

ash_json_gz = ash_json + ".gz"
write_ash_to_json(ash, ash_json_gz, compress=True)
print("Gzip JSON size (bytes):", os.path.getsize(ash_json_gz))

ash_loaded = read_ash_from_json(ash_json)
print("Loaded nodes:", ash_loaded.number_of_nodes(), "Loaded hyperedges:", ash_loaded.number_of_hyperedges())

JSON size (bytes): 1513
Gzip JSON size (bytes): 280
Loaded nodes: 3 Loaded hyperedges: 3


<a id="hif-format"></a>
## HIF Format (Hypergraph Interchange Format)

`write_hif` exports a structured JSON containing:
- `incidences`: one record per node‚Äìhyperedge membership (with optional weight)
- `nodes`: each node with time-varying attributes collapsed into (start,end,value) spans plus `_presence`
- `edges`: hyperedge attributes plus `_presence` intervals

This enables interoperability while preserving temporal semantics compactly.

[üîù To top](#table-of-contents)

In [7]:
hif_path = os.path.join(workdir, "ash.hif.json")
write_hif(ash, hif_path, metadata={"dataset": "demo", "version": 1})
print("HIF excerpt:")
print('\n'.join(open(hif_path).read().splitlines()))

ash_hif_loaded = read_hif(hif_path)
print("HIF loaded nodes:", ash_hif_loaded.number_of_nodes(), "hyperedges:", ash_hif_loaded.number_of_hyperedges())

HIF excerpt:
{
  "network-type": "undirected",
  "metadata": {
    "dataset": "demo",
    "version": 1
  },
  "incidences": [
    {
      "node": 1,
      "edge": "e3",
      "weight": 3.0
    },
    {
      "node": 2,
      "edge": "e3",
      "weight": 3.0
    },
    {
      "node": 3,
      "edge": "e3",
      "weight": 3.0
    },
    {
      "node": 1,
      "edge": "e1"
    },
    {
      "node": 2,
      "edge": "e1"
    },
    {
      "node": 2,
      "edge": "e2",
      "weight": 2.0
    },
    {
      "node": 3,
      "edge": "e2",
      "weight": 2.0
    }
  ],
  "nodes": [
    {
      "node": 1,
      "attrs": {
        "role": [
          [
            0,
            3,
            "admin"
          ]
        ],
        "score": [
          [
            0,
            2,
            10
          ],
          [
            3,
            3,
            12
          ]
        ],
        "_presence": [
          [
            0,
            3
          ]
        ]
      }
   

<a id="integrity-checks"></a>
## Integrity & Round-Trip Checks

We compare basic statistics (counts, attribute sets) after reloading from different formats.
For deeper checks you could diff presence intervals & attribute timelines.

[üîù To top](#table-of-contents)

In [8]:
def summary(h):
    return {
        'nodes': h.number_of_nodes(),
        'hyperedges': h.number_of_hyperedges(),
        'node_attrs': sorted(h.list_node_attributes().keys()),
        'hedge_attrs': sorted(h.list_hyperedge_attributes().keys())
    }

print('Original:', summary(ash))
print('From hedges CSV (no attrs):', summary(ash_from_hedges))
print('From JSON:', summary(ash_loaded))
print('From HIF:', summary(ash_hif_loaded))

Original: {'nodes': 3, 'hyperedges': 3, 'node_attrs': ['role', 'score'], 'hedge_attrs': ['kind', 'weight']}
From hedges CSV (no attrs): {'nodes': 3, 'hyperedges': 3, 'node_attrs': [], 'hedge_attrs': ['weight']}
From JSON: {'nodes': 3, 'hyperedges': 3, 'node_attrs': ['role', 'score'], 'hedge_attrs': ['kind', 'weight']}
From HIF: {'nodes': 3, 'hyperedges': 3, 'node_attrs': ['role', 'score'], 'hedge_attrs': ['kind', 'weight']}


<a id="best-practices"></a>
## Best Practices & Tips

- Use JSONL (optionally gzipped) for very large node profile timelines (streaming friendly).
- Use full JSON (gzip) when you need a faithful snapshot including attributes.
- Use HIF for interoperability / archival: compact temporal attribute encoding.
- Hyperedges CSV is lossy (drops attributes) ‚Äî only for quick topology experiments.
- Always validate round-trip with basic stats (& optionally invariants like degree distribution).
- Keep metadata (e.g. preprocessing parameters) inside HIF `metadata`.

[üîù To top](#table-of-contents)

In [9]:
# Optional: cleanup temporary directory when done
shutil.rmtree(workdir)