<span>
<img src="http://ash-model.readthedocs.io/en/latest/_static/ash.png" width="260px" align="right"/>
</span>
<span>
<b>Author:</b> <a href="https://andreafailla.github.io">Andrea Failla</a><br/>
<b>Python version:</b>  3.9<br/>
<b>ASH version:</b>  1.0.0<br/>
<b>Last update:</b> November 2025
</span>

<a id="attributed-stream-hypergraphs-ash"></a>
# Attributed Stream Hypergraphs (ASH) - Generators

In [1]:
#!pip install -e ../

<a id="table-of-contents"></a>
# Table of Contents

- [Introduction](#introduction)
- [Overview of Generators](#overview)
- [Random Static Hypergraph](#random-static)
- [Random Temporal ASH](#random-temporal)
- [Homophily-driven BA Hypergraph](#homophily-ba)
- [Comparing Generators](#comparing)
- [Reproducibility & Seeding](#reproducibility)
- [Cleanup & Next Steps](#cleanup)


<a id="introduction"></a>
## Introduction

This notebook showcases synthetic data generators included in ASH. These tools help you:
- Prototype algorithms on controlled hypergraph structures
- Study effects of attribute homophily & degree bias
- Produce temporal or static datasets with attribute varieties

We cover three primary generators and how to tune their parameters.

[üîù To top](#table-of-contents)

<a id="overview"></a>
## Overview of Generators

| Generator | Function | Key Parameters | Output Type |
|-----------|----------|----------------|-------------|
| Random Static | `random_hypergraph` | `num_nodes`, `size_distr`, `node_attrs`, `seed` | Single-snapshot ASH (t=0) |
| Random Temporal | `random_ash` | `num_nodes`, `size_distr`, `time_steps`, `node_attrs`, `seed` | Multi-snapshot ASH |
| Homophily BA | `ba_with_homophily` | `num_nodes`, `m`, `homophily_rate`, `minority_size`, `n0`, `size_prob_distr` | Static preferential + homophily |

`size_distr` (static/temporal) maps hyperedge size ‚Üí count.
For BA homophily, hyperedge sizes are drawn from a discrete distribution (Pareto by default) with homophily + degree preference when attaching new nodes.

[üîù To top](#table-of-contents)

<a id="random-static"></a>
## Random Static Hypergraph (`random_hypergraph`)

Creates a single-snapshot hypergraph at time 0.
- `size_distr`: {hyperedge_size: count}
- `node_attrs`: {attr_name: [possible_values]}

[üîù To top](#table-of-contents)

In [1]:
from ash_model.generators import random_hypergraph
from ash_model import ASH

static_h = random_hypergraph(
    num_nodes=12,
    size_distr={2:5, 3:3, 4:2},
    node_attrs={'color':['red','blue'], 'dept':['A','B','C']},
    seed=42
)
print("Nodes:", static_h.number_of_nodes(), "Hyperedges:", static_h.number_of_hyperedges())
print("Sample hyperedge IDs:", list(static_h.hyperedges())[:5])
print("Node 0 attrs:", static_h.get_node_attributes(0))

Nodes: 12 Hyperedges: 10
Sample hyperedge IDs: ['e3', 'e1', 'e8', 'e10', 'e2']
Node 0 attrs: {0: {'color': 'red', 'dept': 'A'}}


<a id="random-temporal"></a>
## Random Temporal ASH (`random_ash`)
Generates a multi-snapshot ASH with nodes & hyperedges persistent across time range `[0, time_steps-1]`.

[üîù To top](#table-of-contents)

In [2]:
from ash_model.generators import random_ash

temporal_h = random_ash(
    num_nodes=15,
    size_distr={2:6, 3:4},
    time_steps=5,
    node_attrs={'team':['eng','mkt','sales']},
    seed=7
)
print("Snapshots:", temporal_h.temporal_snapshots_ids())
print("Avg nodes per snapshot:", temporal_h.avg_number_of_nodes())
print("Avg hyperedges per snapshot:", temporal_h.avg_number_of_hyperedges())

Snapshots: [0, 1, 2, 3, 4]
Avg nodes per snapshot: 15.0
Avg hyperedges per snapshot: 9.0


<a id="homophily-ba"></a>
## Homophily-driven BA Hypergraph (`ba_with_homophily`)
Adds nodes sequentially; each new node forms `m` hyperedges with existing nodes.
Attachment probability blends: (degree preference) √ó (homophily wrt `color`).
`minority_size` controls fraction of nodes with minority color.

[üîù To top](#table-of-contents)

In [3]:
from ash_model.generators import ba_with_homophily

homo_h = ba_with_homophily(
    num_nodes=40,
    m=3,
    homophily_rate=0.75,
    minority_size=0.3,
    n0=5
)
print("Nodes:", homo_h.number_of_nodes(), "Hyperedges:", homo_h.number_of_hyperedges())
print("Degree distribution (first 10):", list(homo_h.degree_distribution().items())[:10])

Nodes: 40 Hyperedges: 14
Degree distribution (first 10): [(10, 3), (4, 6), (7, 2), (12, 2), (9, 2), (8, 1), (0, 10), (1, 4), (2, 5), (5, 3)]


<a id="comparing"></a>
## Comparing Generators
We contrast basic statistics across the three outputs.

[üîù To top](#table-of-contents)

In [4]:
def summary(h):
    return {
        'nodes': h.number_of_nodes(),
        'hyperedges': h.number_of_hyperedges(),
        'avg_deg': sum(h.degree_distribution().values())/len(h.degree_distribution()) if h.degree_distribution() else 0,
        'attr_names': list(h.list_node_attributes().keys())
    }
print('Static:', summary(static_h))
print('Temporal:', summary(temporal_h))
print('Homophily BA:', summary(homo_h))

Static: {'nodes': 12, 'hyperedges': 10, 'avg_deg': 3.0, 'attr_names': ['color', 'dept']}
Temporal: {'nodes': 15, 'hyperedges': 9, 'avg_deg': 3.0, 'attr_names': ['start', 'end', 'team']}
Homophily BA: {'nodes': 40, 'hyperedges': 14, 'avg_deg': 3.3333333333333335, 'attr_names': ['color']}


<a id="reproducibility"></a>
## Reproducibility & Seeding
Use `seed` parameter (where available) or set `random.seed()` / NumPy RNG for deterministic runs.

[üîù To top](#table-of-contents)

In [5]:
h1 = random_hypergraph(8, {2:4}, seed=123)
h2 = random_hypergraph(8, {2:4}, seed=123)
print("Same edge count?", h1.number_of_hyperedges() == h2.number_of_hyperedges())

Same edge count? True


<a id="cleanup"></a>
## Cleanup & Next Steps
Consider exporting generated datasets using the IO methods shown in the IO notebook, or combining generators to create hybrid scenarios.

[üîù To top](#table-of-contents)