# Compute Capacity Analysis

This notebook analyzes compute capacities for various actors and visualizes the relationship between the available int8 operations and the number of stakeholders. It reads metadata from Markdown files, compiles them into a CSV, and then generates an SVG scatter plot with neatly arranged labels.

## Imports and Setup

In the first code cell we import standard Python libraries used throughout the notebook. `os` and `csv` help with file system access and CSV creation, `math` enables logarithmic scaling, and `matplotlib` draws the final chart. These libraries are part of the base Python environment in this project, so no external installations are necessary.

In [1]:

# Import modules for file handling, mathematics, and plotting
import os
import csv
import math
import matplotlib.pyplot as plt


## Collect front matter and build `data.csv`

The Markdown files in the `analysis_compute` directory store metadata about each actor in a YAML front‑matter block. This code walks through every `.md` file in that directory and its subfolders, extracts the `name`, `compute` (renamed to `ops`), and `stakeholder` (renamed to `stakeholders`) fields, converts them to numeric types, and writes them out to a CSV file called `data.csv`. This CSV will be used for plotting in the next step.

In [2]:

# Define where to look for Markdown files containing compute metadata
base_dir = 'analysis_compute'
# Prepare a list to hold the parsed rows
rows = []

for root, dirs, files in os.walk(base_dir):
    for fname in files:
        if fname.endswith('.md'):
            path = os.path.join(root, fname)
            with open(path, 'r', encoding='utf-8') as f:
                lines = f.readlines()
            # Extract front matter lines between the first two '---'
            in_front_matter = False
            front_lines = []
            for line in lines:
                if line.strip() == '---':
                    if not in_front_matter:
                        in_front_matter = True
                        continue
                    else:
                        # end of front matter
                        break
                if in_front_matter:
                    front_lines.append(line.strip())
            # Parse the key: value pairs
            data = {}
            for l in front_lines:
                if ':' in l:
                    key, value = l.split(':', 1)
                    key = key.strip()
                    value = value.strip()
                    data[key] = value
            # Only process entries with a name and non-zero compute
            name = data.get('name')
            compute = data.get('compute')
            stakeholder = data.get('stakeholder')
            if name is None or compute is None or stakeholder is None:
                continue
            # Parse compute as float (scientific notation allowed)
            try:
                ops_value = float(compute)
            except ValueError:
                # Try to convert with replacements
                tmp = compute.lower().replace('×', 'e').replace('^', '')
                tmp = tmp.replace('int8', '')
                try:
                    ops_value = float(tmp)
                except Exception:
                    continue
            try:
                stakeholders_value = int(float(stakeholder))
            except ValueError:
                continue
            # Skip entries with zero compute
            if ops_value == 0:
                continue
            rows.append((name, ops_value, stakeholders_value))

# Write the collected data to a CSV file
csv_path = 'data.csv'
with open(csv_path, 'w', newline='', encoding='utf-8') as file_out:
    writer = csv.writer(file_out)
    writer.writerow(['name', 'ops', 'stakeholders'])
    for row in rows:
        writer.writerow(row)

# Print a preview of the rows collected
print(f"Collected {len(rows)} entries:")
for r in rows:
    print(r)


Collected 0 entries:


## Plot compute versus stakeholders

With the CSV in place, the final code cell constructs a scatter plot using Matplotlib. The horizontal axis shows the number of stakeholders on a linear scale, while the vertical axis shows the estimated int8 operations per second on a logarithmic scale. To keep labels legible, the code sorts the points by their y‑coordinate and spaces the labels evenly on the right side of the chart. A short horizontal connector leads from each label back to its point. The result is saved as an SVG file (`chart.svg`).

In [3]:
# Read the data from the CSV file
points = []
with open('data.csv', 'r', encoding='utf-8') as file_in:
    reader = csv.DictReader(file_in)
    for row in reader:
        name = row['name']
        ops = float(row['ops'])
        stakeholders = float(row['stakeholders'])
        points.append((name, ops, stakeholders))

# Ensure there is data to plot
if not points:
    raise ValueError('No data points found in data.csv')

# Compute axis ranges
ops_values = [p[1] for p in points]
stakeholder_values = [p[2] for p in points]
min_ops = min(ops_values)
max_ops = max(ops_values)
min_stakeholder = min(stakeholder_values)
max_stakeholder = max(stakeholder_values)

# Define figure dimensions in pixels and convert to inches for matplotlib
fig_width, fig_height = 800, 500
dpi = 100
fig, ax = plt.subplots(figsize=(fig_width / dpi, fig_height / dpi), dpi=dpi)

# Set up axes
ax.set_xlabel('Number of stakeholders')
ax.set_ylabel('Estimated int8 ops per second')
ax.set_yscale('log')
ax.set_xlim(min_stakeholder - (max_stakeholder - min_stakeholder) * 0.05,
            max_stakeholder + (max_stakeholder - min_stakeholder) * 0.3)
ax.set_ylim(min_ops * 0.9, max_ops * 1.1)

# Plot the points
for name, ops, stakeholders in points:
    ax.scatter(stakeholders, ops, color='tab:blue', zorder=3)

# Sort points by log-scale y coordinate to determine label placement
sorted_points = sorted(points, key=lambda p: math.log10(p[1]))

# Calculate label y positions evenly spaced along the plotting area
n = len(sorted_points)
if n > 1:
    y_min_log = math.log10(min_ops)
    y_max_log = math.log10(max_ops)
    spacing = (y_max_log - y_min_log) / (n - 1)
    y_positions = [10 ** (y_min_log + i * spacing) for i in range(n)]
else:
    y_positions = [min_ops]

# Place labels on the right side of the figure
label_x = max_stakeholder + (max_stakeholder - min_stakeholder) * 0.15

for (name, ops, stakeholders), label_y in zip(sorted_points, y_positions):
    # Draw connector line
    ax.plot([stakeholders, label_x * 0.98], [ops, label_y], color='gray', linewidth=0.5, zorder=1)
    # Draw label text
    ax.text(label_x, label_y, name, va='center', ha='left', fontsize=8, zorder=2)

# Add grid for readability
ax.grid(True, which='both', linestyle='--', linewidth=0.5, zorder=0)

# Save the figure as an SVG file
svg_filename = 'chart.svg'
fig.tight_layout()
fig.savefig(svg_filename, format='svg')

# Display the SVG inline
from IPython.display import SVG, display as display_svg
with open(svg_filename, 'r', encoding='utf-8') as f:
    svg_content = f.read()
display_svg(SVG(svg_content))


ValueError: min() iterable argument is empty