# SLD Jazelle Data Reader Tutorial

## 1. Introduction & Overview

Welcome to the **Jazelle Reader**. This library reads data from the SLD Experiment (Stanford Large Detector) at SLAC. Originally stored in a legacy format called "Jazelle" (managed via Fortran and Java), this package provides a high-performance C++ reader with Python bindings to convert this data into modern formats suitable for analysis (like NumPy, Awkward Arrays, Parquet, and HDF5).

### Data Structure Layout

The Jazelle format is hierarchical. Below is a schematic of how the data is structured and how this library maps it to Python objects:

```
[ JazelleFile ] 
      |
      +--- Metadata (Filename, Creation Date, etc.)
      |
      +--- [ Event 0 ]
      |       |
      |       +--- [ IEVENTH ] (Event Header: Run #, Event #, Time)
      |       |
      |       +--- [ Family: MCPART ] (Monte Carlo Particles)
      |       |       |
      |       |       +--- [ Bank 0 ] (Particle 1: px, py, pz, id...)
      |       |       +--- [ Bank 1 ] (Particle 2: px, py, pz, id...)
      |       |       +--- ...
      |       |
      |       +--- [ Family: PHCHRG ] (Charged Tracks)
      |               |
      |               +--- [ Bank 0 ] (Track 1)
      |               +--- ...
      |
      +--- [ Event 1 ] ...
```

## 2. Setup and Installation

First, ensure the package is installed.

In [2]:
!pip install jazelle



Import the necessary libraries:

In [1]:
import jazelle
import awkward as ak
import numpy as np
import os

print(f"Jazelle Reader Version: {jazelle.__version__}")

Jazelle Reader Version: 0.2.0


## 3. Opening a Jazelle File

There are two ways to open a file. The `jazelle.open()` function returns a `JazelleFile` instance.

### Method A: Using a Context Manager (Best Practice for Scripts)

This automatically closes the file when the block ends, ensuring resources are freed immediately.

In [2]:
filepath = (
    "/global/cfs/projectdirs/m5115/SLD/minidst_renamed/"   # REPLACE THIS with your actual file path
    "qf1065.qf1065.5nrec97v18_mdst_1.7b1.jazelle"
)

if os.path.exists(filepath):
    with jazelle.open(filepath) as temp_f:
        print(f"Temporarily opened: {temp_f.fileName}")
        # File closes automatically here

Temporarily opened: TAPE0101


### Method B: Persistent Instance (Best for Notebooks)

For this tutorial, we will open the file once and assign it to a variable f. We can reuse f in all subsequent cells.

> Important: When you are done, you should manually close the file using `f.close()`. We will do this at the end of the tutorial.

In [3]:
# Open the file and keep it open
f = jazelle.open(filepath)

print(f"File opened successfully: {f.fileName}")

File opened successfully: TAPE0101


## 4. Inspecting File Contents

Now that `f` is open, we can inspect it directly.

In [7]:
# 1. Print general file info
f.info()

# 2. Access specific metadata
print(f"Created: {f.creationDate}")
print(f"Modified: {f.modifiedDate}")
print(f"Total Events: {len(f)}")

----------------------------------------
JazelleFile Info
----------------------------------------
File         : TAPE0101
Events       : 9994
Created      : 2001-09-21T11:46:07
Modified     : 2003-10-02T09:25:41
----------------------------------------
Created: 2001-09-21 11:46:07
Modified: 2003-10-02 09:25:41
Total Events: 9994


### Visualizing Contents (Head & Tail)

You can peek at the beginning or end of the file to see a summary of the events. In Jupyter, this renders as a styled HTML table.

In [8]:
# Show the first 5 events, including counts for specific bank families
display(f.head(n=5, banks=['MCPART', 'PHCHRG']))

# Show the last 3 events
display(f.tail(n=3))

id,run,event,evttime,weight,n_MCPART,n_PHCHRG
1,37418,46,1997-07-13 08:53:10,1.0,0,19
1,37418,71,1997-07-13 08:54:05,1.0,0,33
1,37418,94,1997-07-13 08:55:06,1.0,0,16
1,37418,107,1997-07-13 08:55:27,1.0,0,4
1,37418,124,1997-07-13 08:56:01,1.0,0,14


id,run,event,evttime,weight
1,38003,6791,1997-08-11 13:55:48,1.0
1,38004,25,1997-08-11 14:03:05,1.0
1,38004,126,1997-08-11 14:09:49,1.0


You can also display the content in ASCII format using the `print` method.

In [10]:
# Show the first 5 events, including counts for specific bank families
print(f.head(n=5, banks=['MCPART', 'PHCHRG']))

print()
# Show the last 3 events
print(f.tail(n=3))

[Events [1 - 5]]
id | run   | event | evttime             | weight  | n_MCPART | n_PHCHRG
---+-------+-------+---------------------+---------+----------+---------
1  | 37418 | 46    | 1997-07-13 08:53:10 | 1.00000 | 0        | 19      
1  | 37418 | 71    | 1997-07-13 08:54:05 | 1.00000 | 0        | 33      
1  | 37418 | 94    | 1997-07-13 08:55:06 | 1.00000 | 0        | 16      
1  | 37418 | 107   | 1997-07-13 08:55:27 | 1.00000 | 0        | 4       
1  | 37418 | 124   | 1997-07-13 08:56:01 | 1.00000 | 0        | 14      

[Events [9992 - 9994]]
id | run   | event | evttime             | weight 
---+-------+-------+---------------------+--------
1  | 38003 | 6791  | 1997-08-11 13:55:48 | 1.00000
1  | 38004 | 25    | 1997-08-11 14:03:05 | 1.00000
1  | 38004 | 126   | 1997-08-11 14:09:49 | 1.00000


## 5. Reading and Navigating Events

### Random Access (Indexing)

We can jump instantly to any event using standard Python indexing.

In [12]:
# Get the first event
event = f[0]
print(f"Loaded Event 0: {event}")

print()
# Get the 10th event
event_10 = f[9]
print(f"Loaded Event 9: {event_10}")

Loaded Event 0: JazelleEvent(Run=37418, Event=46, Type=0, Time=1997-07-13 08:53:10, Weight=1.0)

Family  | Count
--------+------
MCHEAD  | 1    
MCPART  | 0    
PHPSUM  | 61   
PHCHRG  | 19   
PHKLUS  | 55   
PHWIC   | 1    
PHCRID  | 0    
PHKTRK  | 0    
PHKELID | 12   

Loaded Event 9: JazelleEvent(Run=37418, Event=175, Type=0, Time=1997-07-13 08:58:30, Weight=1.0)

Family  | Count
--------+------
MCHEAD  | 1    
MCPART  | 0    
PHPSUM  | 48   
PHCHRG  | 19   
PHKLUS  | 40   
PHWIC   | 0    
PHCRID  | 0    
PHKTRK  | 0    
PHKELID | 9    


### Detailed Event Inspection

Simply executing the variable name in a cell displays a rich interactive summary showing the Event Header and a list of Data Banks.

In [15]:
display(event)

0,1
Run,37418
Event,46
Type,0
Time,1997-07-13 08:53:10
Weight,1.00000

Family,Count
MCHEAD,1
MCPART,0
PHPSUM,61
PHCHRG,19
PHKLUS,55
PHWIC,1
PHCRID,0
PHKTRK,0
PHKELID,12


You don't actually need to use `display` if you display the variable directly in a code block

In [16]:
event

0,1
Run,37418
Event,46
Type,0
Time,1997-07-13 08:53:10
Weight,1.00000

Family,Count
MCHEAD,1
MCPART,0
PHPSUM,61
PHCHRG,19
PHKLUS,55
PHWIC,1
PHCRID,0
PHKTRK,0
PHKELID,12


###Accessing Families and Banks

Data within an event is organized into Families (e.g., `MCPART`, `PHCHRG`).

In [18]:
event

0,1
Run,37418
Event,46
Type,0
Time,1997-07-13 08:53:10
Weight,1.00000

Family,Count
MCHEAD,1
MCPART,0
PHPSUM,61
PHCHRG,19
PHKLUS,55
PHWIC,1
PHCRID,0
PHKTRK,0
PHKELID,12


In [20]:
# Access a family by attribute or dictionary style
particle_summary = event.phpsum        # Attribute access (lowercase)
charged_tracks = event['PHCHRG']   # Dictionary access (case-insensitive)

print(f"Family: {particle_summary.name}")
print(f"Number of particles: {len(particle_summary)}")

# Display family contents (renders as table)
display(particle_summary)

Family: PHPSUM
Number of particles: 61


id,px,py,pz,x,y,z,charge,status,ptot
1,-0.14409,-0.18608,-0.49087,0.32142,-0.07373,12.10152,-1.00000,0,0.54438
2,-0.60555,-1.82386,0.76399,0.00776,0.17233,0.19764,-1.00000,0,2.06805
3,0.26923,-0.21411,-0.49925,0.00866,0.18475,0.20388,1.00000,0,0.60628
4,0.05635,-2.57435,0.72191,-0.00167,0.17464,0.19816,1.00000,0,2.67424
5,0.83203,-1.87015,-1.67063,-0.00071,0.17408,0.19383,-1.00000,0,2.64211
...,...,...,...,...,...,...,...,...,...
57,0.21757,-0.67692,-0.68508,0.00066,0.17469,0.21435,0.00000,0,0.98737
58,-0.00900,0.00991,-0.01676,0.00066,0.17469,0.21435,0.00000,0,0.02145
59,0.00803,-0.00885,-0.01845,0.00066,0.17469,0.21435,0.00000,0,0.02199
60,0.02961,0.03595,-0.03012,0.00066,0.17469,0.21435,0.00000,0,0.05546


You can drill down to individual **Banks** (rows in the family):

In [25]:
# Get the first particle in the PHPSUM family
particle = particle_summary[0]

print(f"Particle ID: {particle.id}")
print(f"Momentum (px, py, pz): {(particle.px, particle.py, particle.pz)}")
print(f"Charge: {particle.charge}")

Particle ID: 1
Momentum (px, py, pz): (-0.14409300684928894, -0.1860826313495636, -0.4908719062805176)
Charge: -1.0


## 6. Iteration and Batching

### Standard Iteration

You can iterate over the file object `f` like a standard Python list.

In [26]:
# Iterate through the first 3 events
for i, evt in enumerate(f):
    print(f"Event {i}: Run {evt.ieventh.run}, Trigger {evt.ieventh.trigger}")
    if i >= 2: break

Event 0: Run 37418, Trigger 52
Event 1: Run 37418, Trigger 116
Event 2: Run 37418, Trigger 116


### High-Performance Batch Iteration

For large files, `iterate()` with a `batch_size` allows the C++ backend to read events in chunks, enabling parallel pre-fetching.

In [27]:
# Iterate in batches of 1000 events
for batch in f.iterate(batch_size=1000):
    print(f"Processing batch of {len(batch)} events...")
    
    # Process events in the batch
    first_evt = batch[0]
    # ... do analysis ...
    break # Stop for demo

Processing batch of 1000 events...


## 7. Converting to Modern Data Structures

This is the core feature for analysis: converting Jazelle data into **NumPy** or **Awkward Arrays**.

`to_dict`: The Low-Level Data Representation

Returns a Python dictionary where keys are bank names and values are dictionaries of NumPy arrays (Columnar format).

In [29]:
# Convert next 50 events to a dictionary
data = f.to_dict(count=50, layout='columnar')

In [34]:
charge_array = data['PHCHRG']['charge']
print(f"Raw charge Array Shape: {charge_array.shape}") # Note: 'charge' is usually flattened here (over the 50 events)

Raw charge Array Shape: (758,)


`to_arrays`: Awkward Arrays (Recommended)

Awkward Arrays are ideal for HEP data because they handle the "jagged" nature of events (variable number of particles per event) natively.

In [37]:
# Read 100 events into an Awkward Array
# num_threads=0 enables auto-detected parallelism
events = f.to_arrays(count=100, num_threads=0)
events

In [38]:
events.PHPSUM

In [40]:
# checking the x-momentum of the particles in the first event
events.PHPSUM[0].px

## 8. Parallelism and Performance

The Jazelle Reader is built on C++ threads. You can control the concurrency for heavy operations directly in the method calls.

In [46]:
import time

start = time.time()

# Read using 4 parallel threads
arrays = f.to_arrays(num_threads=4)

print(f"Read {len(arrays)} events in {time.time() - start:.4f} seconds")

Read 9994 events in 0.1316 seconds


## 9. Exporting / Streaming Data

We provide "Streamers" to convert Jazelle files directly into modern storage formats.

Supported formats: `Parquet`, `HDF5`, `JSON`, `Feather`.

### Converting to Parquet (Columnar, Compressed)

In [47]:
output_file = "sld_data.parquet"

# Convert and save directly from the open file instance
f.to_parquet(
    output_file, 
    count=1000,       # Limit count for demo
    compression="zstd", 
    num_threads=4
)

# Verify reading back using standard Awkward/PyArrow
reloaded = ak.from_parquet(output_file)
print(f"Reloaded {len(reloaded)} events from Parquet.")

Reloaded 1000 events from Parquet.


### Converting to HDF5 (Hierarchical)

In [48]:
output_h5 = "sld_data.h5"

f.to_hdf5(output_h5, count=1000)
    
import h5py
with h5py.File(output_h5, 'r') as hf:
    print("HDF5 Keys:", list(hf['jazelle_events'].keys()))

HDF5 Keys: ['IEVENTH', 'MCHEAD', 'PHCHRG', 'PHKELID', 'PHKLUS', 'PHPSUM', 'PHWIC']


You can directly use helper methods to read and write to different file formats. The intermediate data format can be either dictionary (from `to_dict`) or arrays (from `to_arrays`).

In [50]:
arrays = f.to_arrays()

jazelle.to_hdf5(arrays, output_h5)

reloaded = jazelle.from_hdf5(output_h5)

reloaded

## 10. Customizing Displays

You can control how data tables are rendered in Jupyter using `set_display_options`.

In [51]:
# Configure display: show more rows, fewer array elements
jazelle.set_display_options(
    max_rows=20,
    max_array_elements=2, # Truncate long arrays like [1.0, 2.0, ...]
    float_precision=3     # Cleaner floats
)

display(f.head(3))
display(f[0].phchrg) # Check the charged tracks family with new settings

id,run,event,evttime,weight
1,37418,46,1997-07-13 08:53:10,1.0
1,37418,71,1997-07-13 08:54:05,1.0
1,37418,94,1997-07-13 08:55:06,1.0


id,bnorm,impact,b3norm,impact3,charge,smwstat,status,tkpar0,length,chi2dt,imc,ndfdt,nhit,nhite,nhitp,nmisht,nwrght,nhitv,chi2,chi2v,vxdhit,mustat,estat,dedx,hlxpar,dhlxpar,tkpar,dtkpar
1,25.748,0.406,420.742,5.169,-1,0,3076,24.425,35.231,10.995,0,9,22,24,32,0,0,2,64.173,24.731,2056,100,301,318742740,"[4.053, 4.249, ...]","[0.000, -0.000, ...]","[4.175, -41.560, ...]","[0.000, 0.000, ...]"
2,4.104,0.007,3.832,0.007,-1,0,3076,24.421,76.53,19.375,0,21,78,80,80,0,0,3,6.201,3.253,648,21,13,1202888040,"[4.392, 0.520, ...]","[0.000, -0.000, ...]","[4.402, 9.990, ...]","[0.000, 0.000, ...]"
3,1.234,0.013,1.236,0.013,1,0,3073,24.424,54.595,9.454,0,10,35,35,40,0,0,3,22.482,8.442,321,100,301,551721056,"[5.611, 2.907, ...]","[0.000, 0.000, ...]","[5.546, -35.252, ...]","[0.000, -0.000, ...]"
4,1.717,0.002,1.527,0.002,1,0,3073,24.42,73.856,14.776,0,19,59,80,80,0,0,4,8.292,6.082,2186,21,15,736205657,"[4.734, 0.388, ...]","[0.000, 0.000, ...]","[4.726, 7.001, ...]","[0.000, 0.000, ...]"
5,0.8,0.001,1.658,0.003,-1,0,3074,24.422,91.813,9.936,0,19,71,80,80,0,0,3,9.509,6.352,1089,20,13,1134369301,"[5.131, 0.489, ...]","[0.000, -0.000, ...]","[5.144, -20.027, ...]","[0.000, 0.000, ...]"
6,17.727,0.012,12.252,0.013,-1,0,3076,24.421,49.118,7.007,0,16,49,52,80,0,0,4,2.488,1.393,2186,22,14,668638058,"[4.730, 0.087, ...]","[0.000, -0.000, ...]","[4.731, 8.552, ...]","[0.000, 0.000, ...]"
7,0.905,0.002,0.993,0.002,1,0,3073,24.422,84.841,24.325,0,19,67,80,80,0,0,3,11.428,6.157,1041,0,15,938090142,"[5.051, 0.482, ...]","[0.000, 0.000, ...]","[5.043, -15.851, ...]","[0.000, 0.000, ...]"
8,15.553,0.01,10.056,0.01,1,0,3076,24.421,75.973,16.631,0,19,68,80,80,0,0,3,9.315,5.288,546,22,13,1039343497,"[4.599, 0.069, ...]","[0.000, 0.000, ...]","[4.597, 9.268, ...]","[0.000, 0.000, ...]"
9,1.067,0.005,1.348,0.006,1,0,1025,24.42,71.793,7.204,0,20,61,80,80,0,0,4,7.456,6.436,561,100,200,934091165,"[1.480, 1.674, ...]","[0.000, 0.000, ...]","[1.443, 2.602, ...]","[0.000, 0.000, ...]"
10,58.594,0.067,49.828,0.069,1,0,1028,24.42,71.859,9.832,0,19,71,80,80,0,0,3,4.791,2.465,273,21,15,1050483648,"[1.510, 0.267, ...]","[0.000, 0.000, ...]","[1.507, -3.248, ...]","[0.000, 0.000, ...]"


## 11. Cleanup

Since we opened the file manually in Section 3 (Method B), we should close it now to free up system resources.

It is also fine to not call the close method as the resource will be freed automatically once the file went out of scope.

In [52]:
f.close()
print("File closed.")

File closed.
