# LineListMaker — Chainable Line-List Filtering & Export

This notebook demonstrates the `LineListMaker` class, a fluent builder for filtering spectral line-list data and exporting to CSV, `.par`, or DataFrame.

**Key features:**
- Accepts a `MoleculeLineList` or raw `pd.DataFrame` as input
- Every filter method returns `self` so calls can be chained
- Species label auto-derived from `molecule_id`, with manual override
- Built-in filter inspection, undo (`pop_filter`), and full `reset`
- Export to CSV, `.par`, `DataFrame`, or a new `MoleculeLineList`
- Merge / append multiple species into a single line list

In [1]:
# Imports
import numpy as np
import pandas as pd
from pathlib import Path
from IPython.display import display

# iSLAT imports
from iSLAT.Modules.DataTypes.MoleculeLineList import MoleculeLineList
from iSLAT.Modules.DataProcessing.LineListMaker import LineListMaker
from iSLAT.Modules.FileHandling import hitran_data_folder_path

print(f"Pandas version: {pd.__version__}")
print(f"NumPy version:  {np.__version__}")

Pandas version: 2.3.3
NumPy version:  2.3.5


## 1. Load Molecular Data

Load a HITRAN `.par` file into a `MoleculeLineList`, then wrap it in a `LineListMaker`.

In [None]:
# Load H2O line list from a HITRAN .par file
h2o_lines = MoleculeLineList(
    molecule_id="H2O",
    filename=hitran_data_folder_path / "data_Hitran_H2O.par"
)

# Create a LineListMaker — species is auto-derived from molecule_id
maker = LineListMaker(h2o_lines)

print(maker)                     # __repr__
print(f"Total lines: {len(maker)}")  # __len__
display(maker.df)

<LineListMaker species='H2O' lines=305561 filters=0>
Total lines: 305561


Unnamed: 0,species,nr,lev_up,lev_low,lam,freq,a_stein,e_up,e_low,g_up,g_low
0,H2O,0,0_0_0|10_2_9,0_0_0|9_3_6,933.27661,321225700000.0,6e-06,1861.25073,1845.83411,63,57
1,H2O,1,0_0_1|5_1_5,0_0_1|4_2_2,928.2218,322975000000.0,9e-06,5865.74316,5850.24268,33,27
2,H2O,2,0_2_0|6_5_1,0_2_0|7_4_4,926.64453,323524700000.0,2.6e-05,6039.06494,6023.53857,13,15
3,H2O,3,0_1_0|14_3_12,0_1_0|13_4_9,926.56085,323554000000.0,9e-06,6021.03809,6005.50977,87,81
4,H2O,4,0_0_0|5_1_5,0_0_0|4_2_2,922.00464,325152900000.0,1.2e-05,469.9411,454.33624,11,9
5,H2O,5,1_0_0|5_1_5,1_0_0|4_2_2,917.6756,326686700000.0,1e-05,5722.67236,5706.99365,11,9
6,H2O,6,0_2_0|3_2_1,0_2_0|4_1_4,905.37897,331123700000.0,3.3e-05,4881.40918,4865.51709,21,27
7,H2O,7,0_1_0|5_2_3,0_1_0|6_1_6,891.63464,336227900000.0,1.1e-05,2955.20264,2939.06616,33,39
8,H2O,8,0_0_0|16_6_11,0_0_0|17_3_14,884.22882,339044000000.0,8e-06,5499.35889,5483.08691,99,105
9,H2O,9,0_0_0|17_4_13,0_0_0|16_7_10,844.94141,354808600000.0,9e-06,5780.87305,5763.8457,105,99


## 2. Chainable Filtering

Every filter returns `self`, so you can chain multiple calls in a single expression. Here we select H₂O lines in the 10–20 µm range with E_up ≤ 5000 K and Einstein-A ≥ 0.01 s⁻¹.

In [None]:
# Chain several filters in one expression
maker = (
    LineListMaker(h2o_lines)
    .filter_wavelength(min_val=10, max_val=20)
    .filter_eup(max_val=5000)
    .filter_astein(min_val=1e-2)
    .sort("lam")
)

print(f"Filtered down to {len(maker)} lines")
display(maker.df)

Filtered down to 265 lines


Unnamed: 0,species,nr,lev_up,lev_low,lam,freq,a_stein,e_up,e_low,g_up,g_low
0,H2O,7588,0_1_0|8_3_6,0_0_0|9_6_3,10.01192,29943560000000.0,0.01572,3784.26001,2347.19604,51,57
1,H2O,7504,0_0_0|13_4_9,0_0_0|12_1_12,10.24647,29258110000000.0,0.02053,3645.5625,2241.39502,81,75
2,H2O,7492,0_0_0|13_6_8,0_0_0|12_1_11,10.27239,29184290000000.0,0.07314,3953.90161,2553.27686,27,25
3,H2O,7487,0_1_0|12_2_11,0_0_0|13_3_10,10.29171,29129510000000.0,0.2925,4872.24414,3474.2478,75,81
4,H2O,7462,0_0_0|13_7_7,0_0_0|12_2_10,10.34268,28985970000000.0,0.1148,4211.40771,2820.30103,27,25
5,H2O,7459,0_1_0|9_3_7,0_0_0|10_6_4,10.3523,28959020000000.0,0.02124,4088.18433,2698.37085,19,21
6,H2O,7458,0_1_0|9_2_8,0_0_0|10_5_5,10.35316,28956600000000.0,0.01886,3871.16382,2481.46631,19,21
7,H2O,7445,0_0_0|11_11_0,0_0_0|10_8_3,10.39599,28837320000000.0,0.02128,4627.38428,3243.41113,69,63
8,H2O,7444,0_0_0|11_11_1,0_0_0|10_8_2,10.396,28837290000000.0,0.02128,4627.38428,3243.41187,23,21
9,H2O,7432,0_1_0|12_1_11,0_0_0|13_4_10,10.41823,28775750000000.0,0.2729,4871.77344,3490.75537,25,27


## 3. Quantum-State Filtering

Use `filter_quantum()` to select lines by their upper/lower quantum labels. Set `contains=True` for substring matching — here we keep only v1-1 band lines (both labels start with `0_1_0`).

In [None]:
# Select only v1-1 band lines (substring match on quantum labels)
v1_maker = (
    LineListMaker(h2o_lines)
    .filter_wavelength(min_val=10, max_val=20)
    .filter_quantum(lev_up="0_1_0", contains=True)
    .filter_quantum(lev_low="0_1_0", contains=True)
    .sort("lam")
)

print(f"v1-1 lines (10-20 µm): {len(v1_maker)}")
display(v1_maker.df)

v1-1 lines (10–20 µm): 471


Unnamed: 0,species,nr,lev_up,lev_low,lam,freq,a_stein,e_up,e_low,g_up,g_low
0,H2O,7587,0_1_0|11_10_1,0_1_0|10_7_4,10.01481,29934920000000.0,0.06886,6861.86133,5425.21143,69,63
1,H2O,7585,0_1_0|11_10_2,0_1_0|10_7_3,10.01494,29934520000000.0,0.06886,6861.86133,5425.23047,23,21
2,H2O,7584,0_1_0|20_4_16,0_1_0|19_3_17,10.01604,29931230000000.0,12.58,10027.41309,8590.94141,41,39
3,H2O,7577,0_1_0|13_9_4,0_1_0|12_6_7,10.03925,29862030000000.0,0.4789,7365.63818,5932.48682,81,75
4,H2O,7567,0_1_0|13_9_5,0_1_0|12_6_6,10.06264,29792630000000.0,0.4796,7365.63721,5935.81641,27,25
5,H2O,7560,0_1_0|13_8_6,0_1_0|13_3_11,10.07834,29746220000000.0,0.001678,7009.64453,5582.05176,27,27
6,H2O,7553,0_1_0|16_7_9,0_1_0|15_4_12,10.09625,29693450000000.0,2.076,8239.45508,6814.39453,33,31
7,H2O,7548,0_1_0|13_7_7,0_1_0|13_2_12,10.10953,29654430000000.0,0.001822,6680.7876,5257.59961,27,27
8,H2O,7542,0_1_0|8_7_1,0_1_0|7_2_6,10.12396,29612170000000.0,0.00031,4757.021,3335.86133,17,15
9,H2O,7541,0_1_0|12_6_7,0_1_0|11_1_10,10.12637,29605120000000.0,0.08133,5932.48779,4511.66553,75,69


## 4. Inspecting State — `summary()`, `.filters`, `len()`

The maker tracks every filter you apply. Use `.summary()` for a human-readable overview, or `.filters` for the raw list of `(name, kwargs)` tuples.

In [5]:
# Pretty-print the current state
print(v1_maker.summary())
print()

# Raw filter log
for i, (name, kwargs) in enumerate(v1_maker.filters):
    print(f"  [{i}] {name}: {kwargs}")

<LineListMaker species='H2O' lines=471 filters=3>
  λ range : 10.01481 – 19.91264 µm  (471 lines)
  Active filters:
    • filter_wavelength(min_val=10, max_val=20)
    • filter_quantum(lev_up='0_1_0', lev_low=None, contains=True)
    • filter_quantum(lev_up=None, lev_low='0_1_0', contains=True)

  [0] filter_wavelength: {'min_val': 10, 'max_val': 20}
  [1] filter_quantum: {'lev_up': '0_1_0', 'lev_low': None, 'contains': True}
  [2] filter_quantum: {'lev_up': None, 'lev_low': '0_1_0', 'contains': True}


## 5. Undoing Filters — `pop_filter()` and `reset()`

Made a mistake? `pop_filter()` removes the last filter and replays the rest. `reset()` drops all filters entirely.

In [6]:
# Work on a copy so we don't disturb v1_maker
demo = v1_maker.copy()
print(f"Before pop:  {len(demo)} lines, {len(demo.filters)} filters")

# Remove the last filter (the second quantum filter on lev_low)
demo.pop_filter()
print(f"After pop:   {len(demo)} lines, {len(demo.filters)} filters")

# Full reset — back to all original lines
demo.reset()
print(f"After reset: {len(demo)} lines, {len(demo.filters)} filters")

Before pop:  471 lines, 3 filters
After pop:   799 lines, 2 filters
After reset: 305561 lines, 0 filters


## 6. Custom Filters

Use `filter_custom()` to apply any arbitrary boolean mask via a lambda or function. The `label` argument is recorded in the filter log for traceability.

In [None]:
# Keep only lines whose Einstein-A is above the median of the current set
strong_lines = (
    LineListMaker(h2o_lines)
    .filter_wavelength(min_val=10, max_val=20)
    .filter_custom(
        lambda df: df["a_stein"] > df["a_stein"].median(),
        label="above_median_astein"
    )
    .sort("a_stein", ascending=False)
)

print(strong_lines.summary())
display(strong_lines.df)

<LineListMaker species='H2O' lines=1319 filters=2>
  λ range : 10.01115 – 19.99860 µm  (1319 lines)
  Active filters:
    • filter_wavelength(min_val=10, max_val=20)
    • filter_custom(label='above_median_astein')


Unnamed: 0,species,nr,lev_up,lev_low,lam,freq,a_stein,e_up,e_low,g_up,g_low
0,H2O,6306,0_0_0|17_17_1,0_0_0|16_16_0,14.21061,21096380000000.0,205.7,10062.63574,9050.16992,35,33
1,H2O,6305,0_0_0|17_17_0,0_0_0|16_16_1,14.21061,21096380000000.0,205.6,10062.63574,9050.16992,105,99
2,H2O,6443,0_0_0|18_16_2,0_0_0|17_15_3,13.76219,21783780000000.0,190.2,10297.83105,9252.37305,37,35
3,H2O,6444,0_0_0|18_16_3,0_0_0|17_15_2,13.76219,21783780000000.0,190.2,10297.83105,9252.37305,111,105
4,H2O,6291,0_0_0|17_16_1,0_0_0|16_15_2,14.24957,21038710000000.0,184.8,9660.12109,8650.42188,105,99
5,H2O,6290,0_0_0|17_16_2,0_0_0|16_15_1,14.24957,21038710000000.0,184.7,9660.12109,8650.42188,35,33
6,H2O,6136,0_0_0|16_16_0,0_0_0|15_15_1,14.79079,20268860000000.0,179.5,9050.16992,8077.41846,33,31
7,H2O,6137,0_0_0|16_16_1,0_0_0|15_15_0,14.79079,20268860000000.0,179.5,9050.16992,8077.41846,99,93
8,H2O,6551,0_0_0|19_15_5,0_0_0|18_14_4,13.41866,22341460000000.0,174.6,10540.89258,9468.67188,39,37
9,H2O,6552,0_0_0|19_15_4,0_0_0|18_14_5,13.41866,22341460000000.0,174.6,10540.89258,9468.67188,117,111


## 7. Overriding the Species Label

The species is auto-derived from `molecule_id`, but you can override it at construction time or change it later via the `.species` property.

In [None]:
# Override at construction
custom = LineListMaker(h2o_lines, species="H2O_custom")
print(f"Species at init: {custom.species}")

# Change later via property
custom.species = "H2O_v1-1"
print(f"Species after:   {custom.species}")
display(custom.df[["species", "lam"]])

Species at init: H2O_custom
Species after:   H2O_v1-1


Unnamed: 0,species,lam
0,H2O_v1-1,933.27661
1,H2O_v1-1,928.2218
2,H2O_v1-1,926.64453
3,H2O_v1-1,926.56085
4,H2O_v1-1,922.00464


## 8. Export to CSV

Write the filtered line list to a CSV in the standard iSLAT format (`species, lev_up, lev_low, lam, a_stein, e_up, e_low, g_up, g_low`). Pass `extended=True` to include `xmin` / `xmax` columns.

In [None]:
# Output to a folder next to this notebook
output_dir = Path("output").resolve()
output_dir.mkdir(exist_ok=True)

csv_path = v1_maker.to_csv(output_dir / "H2O_v1-1_10-20um.csv")
print(f"Saved {len(v1_maker)} lines → {csv_path.name}")

# Quick verify
reloaded_csv = pd.read_csv(csv_path)
print(f"Columns: {list(reloaded_csv.columns)}")
display(reloaded_csv)

Saved 471 lines → H2O_v1-1_10-20um.csv
Columns: ['species', 'lev_up', 'lev_low', 'lam', 'a_stein', 'e_up', 'e_low', 'g_up', 'g_low', 'nr', 'freq']


Unnamed: 0,species,lev_up,lev_low,lam,a_stein,e_up,e_low,g_up,g_low,nr,freq
0,H2O,0_1_0|11_10_1,0_1_0|10_7_4,10.01481,0.06886,6861.86133,5425.21143,69,63,7587,29934920000000.0
1,H2O,0_1_0|11_10_2,0_1_0|10_7_3,10.01494,0.06886,6861.86133,5425.23047,23,21,7585,29934520000000.0
2,H2O,0_1_0|20_4_16,0_1_0|19_3_17,10.01604,12.58,10027.41309,8590.94141,41,39,7584,29931230000000.0
3,H2O,0_1_0|13_9_4,0_1_0|12_6_7,10.03925,0.4789,7365.63818,5932.48682,81,75,7577,29862030000000.0
4,H2O,0_1_0|13_9_5,0_1_0|12_6_6,10.06264,0.4796,7365.63721,5935.81641,27,25,7567,29792630000000.0


## 9. Export to `.par` File

The `.par` export delegates to `MoleculeLineList.write_par_file()`, preserving the partition function and HITRAN header. This requires the maker to have been initialised from a `MoleculeLineList` (not a raw DataFrame).

In [None]:
# Export to .par (HITRAN format) with an optional header override
header = pd.DataFrame({
    "source": ["Filtered from data_Hitran_H2O.par — v1-1 lines, 10-20 µm"],
})

par_path = v1_maker.to_par(output_dir / "H2O_v1-1_10-20um.par", header=header)
print(f"Saved {len(v1_maker)} lines → {par_path.name}")

# Verify by reloading
reloaded_par = MoleculeLineList(molecule_id="H2O_v1-1", filename=par_path)
print(f"Reloaded lines: {len(reloaded_par)}")
display(reloaded_par.get_pandas_table())

Saved 471 lines → H2O_v1-1_10-20um.par
[CACHE MISS] Parsing H2O_v1-1 from source file...
Molar_mass: 18.010565
[CACHE SAVED] H2O_v1-1 cached for faster loading
Reloaded lines: 471


Unnamed: 0,nr,lev_up,lev_low,lam,freq,a_stein,e_up,e_low,g_up,g_low
0,7587,0_1_0|11_10_1,0_1_0|10_7_4,10.01481,29934920000000.0,0.06886,6861.86133,5425.21143,69,63
1,7585,0_1_0|11_10_2,0_1_0|10_7_3,10.01494,29934520000000.0,0.06886,6861.86133,5425.23047,23,21
2,7584,0_1_0|20_4_16,0_1_0|19_3_17,10.01604,29931230000000.0,12.58,10027.41309,8590.94141,41,39
3,7577,0_1_0|13_9_4,0_1_0|12_6_7,10.03925,29862030000000.0,0.4789,7365.63818,5932.48682,81,75
4,7567,0_1_0|13_9_5,0_1_0|12_6_6,10.06264,29792630000000.0,0.4796,7365.63721,5935.81641,27,25


## 11. Convert Back to a `MoleculeLineList`

Use `to_linelist()` to create a new `MoleculeLineList` from the filtered data. This is useful when you want to feed the result back into other iSLAT processing steps.

In [11]:
# Convert filtered data back to a MoleculeLineList object
new_linelist = v1_maker.to_linelist()

print(f"Type:        {type(new_linelist).__name__}")
print(f"molecule_id: {new_linelist.molecule_id}")
print(f"Lines:       {len(new_linelist)}")

Type:        MoleculeLineList
molecule_id: H2O
Lines:       471


## 12. Merging Multiple Species

Load a second molecule and use `LineListMaker.merge()` to combine filtered line lists from different species into a single multi-species line list.

In [None]:
# Load a second species
co_lines = MoleculeLineList(
    molecule_id="CO",
    filename=hitran_data_folder_path / "data_HITEMP_2019_CO.par"
)

# Filter each species independently
h2o_maker = (
    LineListMaker(h2o_lines)
    .filter_wavelength(min_val=4.5, max_val=5.5)
    .filter_eup(max_val=6000)
)

co_maker = (
    LineListMaker(co_lines)
    .filter_wavelength(min_val=4.5, max_val=5.5)
    .filter_eup(max_val=6000)
)

print(f"H2O lines: {len(h2o_maker)}")
print(f"CO  lines: {len(co_maker)}")

# Merge into a single multi-species maker
combined = LineListMaker.merge(h2o_maker, co_maker)
print(f"\nCombined:  {len(combined)} lines")
display(combined.df)

H2O lines: 925
CO  lines: 56

Combined:  981 lines


Unnamed: 0,species,nr,lev_up,lev_low,lam,freq,a_stein,e_up,e_low,g_up,g_low
0,H2O,14851,0_1_0|8_6_3,0_0_0|9_3_6,5.49993,54508400000000.0,0.0005124,4461.82471,1845.83411,51,57
1,H2O,14854,1_0_0|2_0_2,0_1_0|3_3_1,5.49907,54516980000000.0,0.003587,5360.79883,2744.39673,5,7
2,H2O,14869,0_1_0|6_0_6,0_0_0|4_0_4_q,5.49533,54554070000000.0,3.044e-07,2937.66626,319.48441,13,9
3,H2O,14883,1_0_0|4_2_3,0_1_0|5_3_2,5.49203,54586830000000.0,0.2111,5685.06006,3065.30591,27,33
4,H2O,14890,0_1_0|10_3_7,0_0_0|9_4_6,5.49036,54603420000000.0,2.583,4549.78418,1929.23413,21,19
5,H2O,14900,0_2_0|7_3_5,0_1_0|6_2_4,5.4872,54634890000000.0,7.004,5803.47119,3181.40991,15,13
6,H2O,14906,0_1_0|10_1_9,0_0_0|9_2_8,5.48618,54644990000000.0,9.508,4176.979,1554.43347,21,19
7,H2O,14911,1_0_0|4_1_4,0_1_0|5_2_3,5.48364,54670350000000.0,0.04779,5578.96484,2955.20264,27,33
8,H2O,14913,0_2_0|4_4_0,0_1_0|5_1_5,5.48262,54680550000000.0,0.0002475,5390.77393,2766.52173,9,11
9,H2O,14914,1_0_0|5_2_3,0_1_0|6_3_4,5.48183,54688370000000.0,0.2135,5893.11523,3268.4873,33,39


## 13. Export the Merged List

Save the combined multi-species line list as a CSV. You can also use `filter_species()` on a merged maker to re-select a subset later.

In [None]:
# Save the merged list
merged_csv = combined.sort("lam").to_csv(output_dir / "H2O_CO_4p5-5p5um.csv")
print(f"Saved {len(combined)} lines → {merged_csv.name}")

# Filter the merged maker to just CO
co_only = combined.copy().filter_species("CO").sort("lam")
print(f"\nCO-only subset: {len(co_only)} lines")
display(co_only.df)

Saved 981 lines → H2O_CO_4p5-5p5um.csv

CO-only subset: 56 lines


Unnamed: 0,species,nr,lev_up,lev_low,lam,freq,a_stein,e_up,e_low,g_up,g_low
0,CO,6489,1|R_22,0|R_22,4.50096,66606340000000.0,19.59,4593.96172,1397.37989,47,45
1,CO,6479,1|R_21,0|R_21,4.50705,66516320000000.0,19.48,4468.30888,1276.04738,45,43
2,CO,6469,1|R_20,0|R_20,4.51324,66425150000000.0,19.37,4348.08446,1160.1983,43,41
3,CO,6459,1|R_19,0|R_19,4.51952,66332840000000.0,19.26,4233.29301,1049.83698,41,39
4,CO,6439,1|R_18,0|R_18,4.52589,66239390000000.0,19.15,4123.93904,944.96772,39,37


## 14. Extended CSV & Extra Columns

Pass `extended=True` to include `xmin` / `xmax` columns (filled with `NaN` by default). You can also inject arbitrary extra columns via `extra_columns`.

In [None]:
# Extended format with extra "note" column
ext_path = v1_maker.to_csv(
    output_dir / "H2O_v1-1_extended.csv",
    extended=True,
    extra_columns={"note": ""},
)

reloaded_ext = pd.read_csv(ext_path)
print(f"Columns: {list(reloaded_ext.columns)}")
display(reloaded_ext)

Columns: ['species', 'lev_up', 'lev_low', 'lam', 'a_stein', 'e_up', 'e_low', 'g_up', 'g_low', 'xmin', 'xmax', 'note', 'nr', 'freq']


Unnamed: 0,species,lev_up,lev_low,lam,a_stein,e_up,e_low,g_up,g_low,xmin,xmax,note,nr,freq
0,H2O,0_1_0|11_10_1,0_1_0|10_7_4,10.01481,0.06886,6861.86133,5425.21143,69,63,,,,7587,29934920000000.0
1,H2O,0_1_0|11_10_2,0_1_0|10_7_3,10.01494,0.06886,6861.86133,5425.23047,23,21,,,,7585,29934520000000.0
2,H2O,0_1_0|20_4_16,0_1_0|19_3_17,10.01604,12.58,10027.41309,8590.94141,41,39,,,,7584,29931230000000.0
3,H2O,0_1_0|13_9_4,0_1_0|12_6_7,10.03925,0.4789,7365.63818,5932.48682,81,75,,,,7577,29862030000000.0
4,H2O,0_1_0|13_9_5,0_1_0|12_6_6,10.06264,0.4796,7365.63721,5935.81641,27,25,,,,7567,29792630000000.0


## 15. Cleanup

Remove the example output files created by this notebook.

In [15]:
'''import shutil

# Remove the output directory and all files we created
if output_dir.exists():
    shutil.rmtree(output_dir)
    print("Cleaned up output/ directory.")
else:
    print("Nothing to clean up.")'''

'import shutil\n\n# Remove the output directory and all files we created\nif output_dir.exists():\n    shutil.rmtree(output_dir)\n    print("Cleaned up output/ directory.")\nelse:\n    print("Nothing to clean up.")'