Skip to content

SD pipeline revamp#281

Merged
e-koch merged 21 commits intoPhangsTeam:masterfrom
e-koch:sd_pipeline_revamp
Feb 9, 2026
Merged

SD pipeline revamp#281
e-koch merged 21 commits intoPhangsTeam:masterfrom
e-koch:sd_pipeline_revamp

Conversation

@e-koch
Copy link
Copy Markdown
Collaborator

@e-koch e-koch commented Jan 6, 2026

Updating the pipeline to use the default ALMA SD pipeline with custom selection for baseline fitting.

@e-koch
Copy link
Copy Markdown
Collaborator Author

e-koch commented Jan 14, 2026

Local tests working up to revamped SD imaging stage.

Passing hsd_baseline the line frequency ranges per targeting is working as expected.

@e-koch
Copy link
Copy Markdown
Collaborator Author

e-koch commented Jan 14, 2026

@thomaswilliamsastro this is ready for a first pass review.

I'm changing the behaviour from our usual loop to process all products per target and (under the hood) this changes the ordering of operations a bit in SingleDishHandler.

Some parts are kludgy but work. Most of the clean-up reflects metadata handling improvements in the ALMA pipeline or the CASA task (e.g., tsdimaging handles the units and solving for the beam)

@e-koch
Copy link
Copy Markdown
Collaborator Author

e-koch commented Jan 14, 2026

To-dos:

  • Ensure new imaging loop over all products works as intended
  • Add optional image-plane baseline subtraction (seems to be needed for TP data with a fixed position OFF and the ON vs OFF have differing elevation; M33, and the LV LP data need this)
  • Consider how to handle combining TP from multiple projects. (My opinion is this is SD postprocessing and we combine SD cubes in the image plane).

@e-koch
Copy link
Copy Markdown
Collaborator Author

e-koch commented Jan 18, 2026

On a full test of ngc7793_c from 2025.1.00576.L, the custom baseline fitting inputs with the ALMA pipeline tasks appear to be working as expected.

Here is SPW 19 with 13CO and C18O masked after baseline subtraction (incl. the default +/-200 km/s padding beyond the target range that is probably overkill):
image

and the pipeline default (no lines detected):
image

@e-koch
Copy link
Copy Markdown
Collaborator Author

e-koch commented Jan 18, 2026

And the same for SPW 21 with 12CO:

W/ target freq range from the phangs pipeline and a linear fit:
image

Default ALMA pipeline:
image

@e-koch e-koch force-pushed the sd_pipeline_revamp branch from a0d932f to b435919 Compare January 20, 2026 14:50
@e-koch e-koch marked this pull request as ready for review January 20, 2026 15:06
@e-koch
Copy link
Copy Markdown
Collaborator Author

e-koch commented Jan 20, 2026

I have another full test run going locally after merging in #286 and #280. Otherwise this is ready for review.

There's some follow-up steps that can be addressed later after more discussion:

  • exporting calibrated spectra to take advantage of other SD baseline/gridding tools
  • implementation of image baseline subtraction as needed

Copy link
Copy Markdown
Collaborator

@low-sky low-sky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great; I don't see any showstoppers, but added some low utility comments.

width=str(chan_dv_kms)+'km/s',
start=str(start_vel)+'km/s',
veltype ="radio",
outframe='LSRK',
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These will be nice to override in the future.


logger.info(f"Using these ASDM files: {EBsnames}")

if len(EBsnames) == 0:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this come before the renaming step?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. moved it up

vel_line_mask = product_dict[this_product]['vel_line_mask']

# Convert velocity range to frequency range
freq_line_mask = (vel_line_mask * u.km / u.s).to(u.Hz, u.doppler_optical(freq_rest * u.MHz)).value
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u.doppler_radio for consistency with hardwire convention choice?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured this would be minor. but agreed we should stick to the same velocity convention throughout.

@e-koch
Copy link
Copy Markdown
Collaborator Author

e-koch commented Jan 21, 2026

Add most of the suggestions in; thanks @low-sky !

@e-koch
Copy link
Copy Markdown
Collaborator Author

e-koch commented Jan 21, 2026

@thomaswilliamsastro I have one more complete test run on-going; the storage is slow so it's taking longer than it should.

I would like it to finish to make sure I caught all the renaming of the SD products.

After that and your review, I'm ready to merge this.

@thomaswilliamsastro
Copy link
Copy Markdown
Collaborator

@e-koch I should have some time to run this end of the week. Just let me know when your tests are done and I'll check it does what I'd expect on my end

@e-koch
Copy link
Copy Markdown
Collaborator Author

e-koch commented Jan 21, 2026

@thomaswilliamsastro alright my test made it through. Ready for your testing!

@e-koch
Copy link
Copy Markdown
Collaborator Author

e-koch commented Jan 22, 2026

Confirmed that there are minimal changes between the legacy TP pipeline output cube and this version.

This is the average 12CO(2-1) spectrum for ngc7793_3:

image

Blue is this version. Green is the legacy TP version.

@thomaswilliamsastro
Copy link
Copy Markdown
Collaborator

thomaswilliamsastro commented Jan 23, 2026

My testing is still ongoing but first couple of test cases:

  1. NGC0300_1 using the monolithic CASA pipeline. Runs fine, cube is essentially identical. There's more channels in the revamped pipeline than the old one
Screenshot 2026-01-23 at 08 37 10
  1. NGC5236_1 using pip-installed CASA pipeline. Pipeline runs but the cube has been trimmed weirdly, half of it seems to have been cut off. For the non cut-off bit, spectral profile is pretty much identical. UPDATE With the new commits, this now also runs fine
Screenshot 2026-01-25 at 15 59 14
  1. NGC7793_2 using pip-installed CASA pipeline. Works as expected

  2. NGC7793_1 using pip-installed CASA pipeline. Still issues here but might be me not reflecting changes

@e-koch
Copy link
Copy Markdown
Collaborator Author

e-koch commented Jan 23, 2026

I'll check on the larger spectral range. I'm probably passing the padded range for the baseline fitting

@thomaswilliamsastro
Copy link
Copy Markdown
Collaborator

@e-koch made some updates above. Only one problem left on my end!

@e-koch
Copy link
Copy Markdown
Collaborator Author

e-koch commented Jan 28, 2026

@thomaswilliamsastro - alright I think I've fixed things re: #277 . I have a test run going for ngc7793_1 and _2.

The pipeiine is now going to re-run the whole calibration separately for each part, despite being observed together. I'm 99% sure the legacy pipeline was doing the same thing.

It's not the most efficient approach but this is likely to remain a corner-case for most nearby galaxy ALMA obs having multiple small mosaics in a single EB. If that changes, we can figure out how to reconcile this via the fname_dict checking for identical paths in the ms_file_key.txt

@thomaswilliamsastro
Copy link
Copy Markdown
Collaborator

The pipeiine is now going to re-run the whole calibration separately for each part, despite being observed together. I'm 99% sure the legacy pipeline was doing the same thing.

Exactly, I did the dumb thing. Running this case now

@e-koch
Copy link
Copy Markdown
Collaborator Author

e-koch commented Jan 29, 2026

A few more notes from corner cases:

  • 2025.1.00576.L specified fields with offsets from a common target location. The TP sources reflect this central field, not the offset. For TP MSs with multiple sources, the concat step requires respectname=True because the TP fields have the same coordinates in the source/field table.
  • Right now, we concat EBs in TOPO, letting tsdimaging handle the coordinate transforms. This path has no issues. For many EBs, concat (correctly) does not combine SPWs leading to many SPWs in the concat MS. Deep concat MSs could hit a limit on number of SPWs. I doubt we'll hit this but just noting in case someone did hit this issue in the future.

@e-koch
Copy link
Copy Markdown
Collaborator Author

e-koch commented Jan 29, 2026

The line wing padding for the baseline fitting is currently set to 200 km/s beyond the source velocity range. This can likely be smaller and may hit some issues where the masked region covers a whole SPW.

We haven't hit an issue with this yet so I suggest not changing it in this PR for consistency with the legacy TP pipeline

@low-sky
Copy link
Copy Markdown
Collaborator

low-sky commented Feb 4, 2026

First, I'm not 100% on the data model being used here so this might be way off.

Having faffed around with this for a while, I more convinced that we want to separate the SD calibration step from the gridding step. The pipeline is capable of doing all the calibration for all the SPWs at once, which can put a per EB calibrated stack of spectra into a directory tree. This can serve as the analogue of interferometric MS and we can then pick stack of MS with products to do the gridding into a single cube. Then, I think we will want to be throwing all the spectra bundles into single gridding operations for the whole cube on a per-line basis. This also makes it simpler to incorporate archival SD data, provided it can stumble through the pipeline. It does blow up the bookkeeping a bit (potentially arguing for another keyfile analogous to ms_file_key) but gives good flexibility and preserves our data philosophy.

@e-koch
Copy link
Copy Markdown
Collaborator Author

e-koch commented Feb 5, 2026

This is a good point and we should do this.

@low-sky do you have an opinion on merging the current PR now as "updated and working with current data model" or including the data model changes together here? I think either could work as it's already a significant change but isn't breaking the API.

@low-sky
Copy link
Copy Markdown
Collaborator

low-sky commented Feb 7, 2026

I think probably pull the trigger on this merge and then do a refactor. This is clearly an important increment. I'm not sure that the best practice is stable enough across multiple MS imaging that next steps will be fast.

@e-koch e-koch merged commit 720c128 into PhangsTeam:master Feb 9, 2026
@e-koch e-koch deleted the sd_pipeline_revamp branch February 9, 2026 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SD calibration doesn't import QA2 flags Handling multiple lines with the single dish pipeline

3 participants