## get_obs_paths

This notebook reads in a CSV table containing a column of PACS obsids, and adds a column containing the path to the level 2.5 high-pass filtered blue map (in FITS format) corresponding to each observation, where available. This is useful if you have a folder full of data downloaded from *Herschel* Science Archive (HSA) but don't know specifically where the observation you're looking for is located. In particular, the table produced by this notebook can be used as input to the disc modelling code `pacs_model_batch.py`.

The notebook assumes that the data have been downloaded from HSA and placed in some specified root directory, and that the file structure has not been altered following the download &#150; the obsid, level and processing method of a particular image are inferred from its path. It also expects that the input table includes columns `wave` and `star_jy`, containing wavelengths and predicted stellar fluxes for each system at 70/100 μm (i.e. there are at least two rows for each system). For the final output table, the appropriate row will be selected, given the wavelength of the obsid as determined from the FITS file.

### Setup

In [1]:
import pandas as pd
import numpy as np
from astropy.io import fits
from pathlib import Path

In [2]:
#don't truncate path strings when displaying them in a DataFrame
pd.set_option('display.max_colwidth', -1)

#display the full tables
pd.set_option('display.max_rows', None)

### Obtain a list of relevant images

This is achieved by walking recursively through all directories contained in `rootdir` and appending the paths that match the specified pattern to a list. For now, only look for level 2.5 high-pass filtered images.

In [3]:
#all data should be somewhere inside this directory
rootdir = '/data/wyatt_archive/gkennedy/hsa/auto'

#look for files matching this pattern
pattern = '*/level2_5/HPPHPFMAPB/*.fits.gz'

In [4]:
#get the paths of all desired maps within rootdir

paths = []
obsids_from_paths = []
mtimes = []
obs_waves = []

for path in Path(rootdir).rglob(pattern):
    
    full_path = path.resolve()
    paths.append(str(full_path))
    
    #need to go two levels up to get the obsid, given the file structure provided by HSA
    obsids_from_paths.append(int(full_path.parents[2].name))
    
    #append the last modification time, so that we can find the most up-to-date file if duplicates are found
    mtimes.append(path.lstat().st_mtime)
    
    #extract observation wavelength from FITS file
    with fits.open(str(full_path))as datafile:
        obs_waves.append(int(datafile[0].header['WAVELNTH']))

In [5]:
#store the results in a DataFrame, so that we can easily join them onto the provided CSV table
df_paths = pd.DataFrame({'obsid': obsids_from_paths, 'obs_wave': obs_waves, 'path': paths, 'mtime': mtimes},
                        columns = ['obsid', 'obs_wave', 'path', 'mtime'])

#sort by obsid, with the oldest files (i.e. smallest mtimes) at the top
df_paths.sort_values(by = ['obsid', 'mtime'], inplace = True)

In [6]:
#if multiple images were found for a particular obsid, keep only the most recently modified file
df_paths.drop_duplicates(subset = ['obsid'], keep = 'last', inplace = True)

#modification times were only necessary for sorting; drop these now
df_paths.drop(['mtime'], axis = 1, inplace = True)

In [7]:
#look at the final path table
df_paths

Unnamed: 0,obsid,obs_wave,path
8,1342186612,70,/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278292694/1342186612/level2_5/HPPHPFMAPB/hpacs_25HPPHPFMAPB_blue_0547_m5104_00_v1.0_1470446695527.fits.gz
9,1342186619,70,/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278292695/1342186619/level2_5/HPPHPFMAPB/hpacs_25HPPHPFMAPB_blue_1837_p3847_00_v1.0_1470459933858.fits.gz
5,1342187075,100,/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278292685/1342187075/level2_5/HPPHPFMAPB/hpacs_25HPPHPFMAPB_green_2009_m6611_00_v1.0_1470425839613.fits.gz
6,1342187145,100,/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278292690/1342187145/level2_5/HPPHPFMAPB/hpacs_25HPPHPFMAPB_green_2218_m5338_00_v1.0_1470430547048.fits.gz
10,1342188369,70,/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278292702/1342188369/level2_5/HPPHPFMAPB/hpacs_25HPPHPFMAPB_blue_0018_m6329_00_v1.0_1470497983585.fits.gz
11,1342188370,70,/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278292703/1342188370/level2_5/HPPHPFMAPB/hpacs_25HPPHPFMAPB_blue_2358_m6418_00_v1.0_1470497507110.fits.gz
12,1342188371,70,/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278292705/1342188371/level2_5/HPPHPFMAPB/hpacs_25HPPHPFMAPB_blue_0240_m6816_00_v1.0_1470497166323.fits.gz
13,1342188372,70,/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278292706/1342188372/level2_5/HPPHPFMAPB/hpacs_25HPPHPFMAPB_blue_0237_m5203_00_v1.0_1470497570637.fits.gz
14,1342188377,70,/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278292707/1342188377/level2_5/HPPHPFMAPB/hpacs_25HPPHPFMAPB_blue_0158_m2154_00_v1.0_1470498863467.fits.gz
15,1342188484,70,/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278292710/1342188484/level2_5/HPPHPFMAPB/hpacs_25HPPHPFMAPB_blue_0035_m6155_00_v1.0_1470504188436.fits.gz


### Read in the CSV table 

In [8]:
csv_filename = 'input/obs_list.csv'
df_in = pd.read_csv(csv_filename)

In [9]:
#look at the input table
df_in

Unnamed: 0,obsid,xid,dist_pc,filter,wave,model_jy,star_jy
0,1342231387,* 19 LMi,28.809477,PACS100,100,0.01036,0.0103601
1,1342231387,* 19 LMi,28.809477,PACS70,70,0.021045,0.0210451
2,1342237460,* 29 Ari,28.688807,PACS100,100,0.005801,0.00580063
3,1342237460,* 29 Ari,28.688807,PACS70,70,0.011767,0.0117672
4,1342233372,2MASS J08101691-4856291,5055.611729,PACS100,100,0.015913,1.79513e-05
5,1342233372,2MASS J08101691-4856291,5055.611729,PACS70,70,0.028901,3.62485e-05
6,1342265601,CD-54 4621,110.619469,PACS100,100,0.001754,0.000249808
7,1342265601,CD-54 4621,110.619469,PACS70,70,0.003425,0.000504832
8,1342265609,CD-52 5008,137.741047,PACS100,100,0.004761,0.000326981
9,1342265609,CD-52 5008,137.741047,PACS70,70,0.007121,0.000662903


### Left join the path table onto the input table

The rows of the resulting table will be the same as that of the input table, with a path column added.

In [10]:
merged = pd.merge(df_in, df_paths, how = 'left', on = 'obsid')

In [11]:
#look at the merged table
merged

Unnamed: 0,obsid,xid,dist_pc,filter,wave,model_jy,star_jy,obs_wave,path
0,1342231387,* 19 LMi,28.809477,PACS100,100,0.01036,0.0103601,70.0,/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278294084/1342231387/level2_5/HPPHPFMAPB/hpacs_25HPPHPFMAPB_blue_0958_p4103_00_v1.0_1471855756793.fits.gz
1,1342231387,* 19 LMi,28.809477,PACS70,70,0.021045,0.0210451,70.0,/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278294084/1342231387/level2_5/HPPHPFMAPB/hpacs_25HPPHPFMAPB_blue_0958_p4103_00_v1.0_1471855756793.fits.gz
2,1342237460,* 29 Ari,28.688807,PACS100,100,0.005801,0.00580063,70.0,/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278294097/1342237460/level2_5/HPPHPFMAPB/hpacs_25HPPHPFMAPB_blue_0233_p1502_00_v1.0_1471942106823.fits.gz
3,1342237460,* 29 Ari,28.688807,PACS70,70,0.011767,0.0117672,70.0,/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278294097/1342237460/level2_5/HPPHPFMAPB/hpacs_25HPPHPFMAPB_blue_0233_p1502_00_v1.0_1471942106823.fits.gz
4,1342233372,2MASS J08101691-4856291,5055.611729,PACS100,100,0.015913,1.79513e-05,70.0,/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278294133/1342233372/level2_5/HPPHPFMAPB/hpacs_25HPPHPFMAPB_blue_0810_m4857_00_v1.0_1471903507890.fits.gz
5,1342233372,2MASS J08101691-4856291,5055.611729,PACS70,70,0.028901,3.62485e-05,70.0,/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278294133/1342233372/level2_5/HPPHPFMAPB/hpacs_25HPPHPFMAPB_blue_0810_m4857_00_v1.0_1471903507890.fits.gz
6,1342265601,CD-54 4621,110.619469,PACS100,100,0.001754,0.000249808,100.0,/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278294512/1342265601/level2_5/HPPHPFMAPB/hpacs_25HPPHPFMAPB_green_1213_m5520_00_v1.0_1472398231574.fits.gz
7,1342265601,CD-54 4621,110.619469,PACS70,70,0.003425,0.000504832,100.0,/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278294512/1342265601/level2_5/HPPHPFMAPB/hpacs_25HPPHPFMAPB_green_1213_m5520_00_v1.0_1472398231574.fits.gz
8,1342265609,CD-52 5008,137.741047,PACS100,100,0.004761,0.000326981,100.0,/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278294513/1342265609/level2_5/HPPHPFMAPB/hpacs_25HPPHPFMAPB_green_1223_m5334_00_v1.0_1472398635616.fits.gz
9,1342265609,CD-52 5008,137.741047,PACS70,70,0.007121,0.000662903,100.0,/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278294513/1342265609/level2_5/HPPHPFMAPB/hpacs_25HPPHPFMAPB_green_1223_m5334_00_v1.0_1472398635616.fits.gz


In [12]:
#look at systems from the CSV for which no image was found
merged[pd.isna(merged.path)]

Unnamed: 0,obsid,xid,dist_pc,filter,wave,model_jy,star_jy,obs_wave,path
10,1342224188,TWA 26,49.66649,PACS100,100,4e-05,4e-05,,
11,1342224188,TWA 26,49.66649,PACS70,70,7.4e-05,7.4e-05,,
18,1342245708,V* AS Dra,43.000765,PACS100,100,0.001589,0.001589,,
19,1342245708,V* AS Dra,43.000765,PACS70,70,0.00323,0.00323,,
98,1342246704,G 263-10,21.906477,PACS100,100,0.006586,0.001,,
99,1342246704,G 263-10,21.906477,PACS70,70,0.002445,0.002023,,
372,1342255513,V* DS Leo,11.937448,PACS100,100,0.002867,0.002867,,
373,1342255513,V* DS Leo,11.937448,PACS70,70,0.005805,0.005805,,
374,1342188367,HD 105,38.476337,PACS100,100,0.162964,0.001491,,
375,1342188367,HD 105,38.476337,PACS70,70,0.13194,0.003022,,


### Clean up the final table

In [13]:
#select appropriate wavelength and remove unnecessary columns
merged = merged[(merged.wave == merged.obs_wave)][['obsid', 'xid', 'dist_pc', 'wave', 'star_jy', 'path']]

In [14]:
#convert the stellar flux into mJy, as expected by pacs_model.py
merged.star_jy *= 1000
merged.rename(columns = {"star_jy": "star_mjy"}, inplace = True)

In [15]:
#take a look at the result
merged

Unnamed: 0,obsid,xid,dist_pc,wave,star_mjy,path
1,1342231387,* 19 LMi,28.809477,70,21.0451,/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278294084/1342231387/level2_5/HPPHPFMAPB/hpacs_25HPPHPFMAPB_blue_0958_p4103_00_v1.0_1471855756793.fits.gz
3,1342237460,* 29 Ari,28.688807,70,11.7672,/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278294097/1342237460/level2_5/HPPHPFMAPB/hpacs_25HPPHPFMAPB_blue_0233_p1502_00_v1.0_1471942106823.fits.gz
5,1342233372,2MASS J08101691-4856291,5055.611729,70,0.036248,/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278294133/1342233372/level2_5/HPPHPFMAPB/hpacs_25HPPHPFMAPB_blue_0810_m4857_00_v1.0_1471903507890.fits.gz
6,1342265601,CD-54 4621,110.619469,100,0.249808,/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278294512/1342265601/level2_5/HPPHPFMAPB/hpacs_25HPPHPFMAPB_green_1213_m5520_00_v1.0_1472398231574.fits.gz
8,1342265609,CD-52 5008,137.741047,100,0.326981,/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278294513/1342265609/level2_5/HPPHPFMAPB/hpacs_25HPPHPFMAPB_green_1223_m5334_00_v1.0_1472398635616.fits.gz
13,1342239416,V* AG Dor,34.188268,70,3.08027,/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278294105/1342239416/level2_5/HPPHPFMAPB/hpacs_25HPPHPFMAPB_blue_0407_m5234_00_v1.0_1471964495565.fits.gz
15,1342225244,V* AR Lac,42.7716,70,16.4885,/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278294068/1342225244/level2_5/HPPHPFMAPB/hpacs_25HPPHPFMAPB_blue_2209_p4545_00_v1.0_1471794900970.fits.gz
17,1342238095,HD 8357,44.345898,70,9.47741,/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278294103/1342238095/level2_5/HPPHPFMAPB/hpacs_25HPPHPFMAPB_blue_0123_p0725_00_v1.0_1471956108070.fits.gz
21,1342196038,V* AU Mic,9.909821,70,18.129,/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278292659/1342196038/level2_5/HPPHPFMAPB/hpacs_25HPPHPFMAPB_blue_2045_m3120_00_v1.0_1470641895447.fits.gz
23,1342223909,BD+20 307,118.20331,70,0.917525,/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278294131/1342223909/level2_5/HPPHPFMAPB/hpacs_25HPPHPFMAPB_blue_0155_p2118_00_v1.0_1471776213720.fits.gz


### Save the output

In [16]:
#'input' since this will likely be used as input to pacs_model_batch.py!
output_file = 'input/obs_path_list.csv'

In [17]:
#no need to save the indices
merged.to_csv(output_file, index = False)

## Some miscellaneous experimentation below...

In [18]:
#try joining the other way round (i.e. see which systems we have data for, but weren't in the input list)
pd.merge(df_paths, df_in, how = 'left', on = 'obsid')

Unnamed: 0,obsid,obs_wave,path,xid,dist_pc,filter,wave,model_jy,star_jy
0,1342186612,70,/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278292694/1342186612/level2_5/HPPHPFMAPB/hpacs_25HPPHPFMAPB_blue_0547_m5104_00_v1.0_1470446695527.fits.gz,* bet Pic,19.440124,PACS100,100.0,9.41445,0.0150242
1,1342186612,70,/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278292694/1342186612/level2_5/HPPHPFMAPB/hpacs_25HPPHPFMAPB_blue_0547_m5104_00_v1.0_1470446695527.fits.gz,* bet Pic,19.440124,PACS70,70.0,15.355,0.0306833
2,1342186619,70,/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278292695/1342186619/level2_5/HPPHPFMAPB/hpacs_25HPPHPFMAPB_blue_1837_p3847_00_v1.0_1470459933858.fits.gz,* alf Lyr,7.678722,PACS100,100.0,6.39061,0.386178
3,1342186619,70,/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278292695/1342186619/level2_5/HPPHPFMAPB/hpacs_25HPPHPFMAPB_blue_1837_p3847_00_v1.0_1470459933858.fits.gz,* alf Lyr,7.678722,PACS70,70.0,8.19487,0.786301
4,1342187075,100,/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278292685/1342187075/level2_5/HPPHPFMAPB/hpacs_25HPPHPFMAPB_green_2009_m6611_00_v1.0_1470425839613.fits.gz,* del Pav,6.108362,PACS100,100.0,0.074044,0.0740444
5,1342187075,100,/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278292685/1342187075/level2_5/HPPHPFMAPB/hpacs_25HPPHPFMAPB_green_2009_m6611_00_v1.0_1470425839613.fits.gz,* del Pav,6.108362,PACS70,70.0,0.149867,0.149867
6,1342187145,100,/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278292690/1342187145/level2_5/HPPHPFMAPB/hpacs_25HPPHPFMAPB_green_2218_m5338_00_v1.0_1470430547048.fits.gz,HD 211415,14.016812,PACS100,100.0,0.012716,0.0127157
7,1342187145,100,/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278292690/1342187145/level2_5/HPPHPFMAPB/hpacs_25HPPHPFMAPB_green_2218_m5338_00_v1.0_1470430547048.fits.gz,HD 211415,14.016812,PACS70,70.0,0.025798,0.0257976
8,1342188369,70,/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278292702/1342188369/level2_5/HPPHPFMAPB/hpacs_25HPPHPFMAPB_blue_0018_m6329_00_v1.0_1470497983585.fits.gz,HD 1466,42.955326,PACS100,100.0,0.006227,0.00140431
9,1342188369,70,/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278292702/1342188369/level2_5/HPPHPFMAPB/hpacs_25HPPHPFMAPB_blue_0018_m6329_00_v1.0_1470497983585.fits.gz,HD 1466,42.955326,PACS70,70.0,0.010572,0.00286554


In [19]:
#look at obsids with more than one star listed
df_in[df_in.obsid.isin(df_in.obsid.value_counts().loc[lambda x: x>2].reset_index()['index'])]

Unnamed: 0,obsid,xid,dist_pc,filter,wave,model_jy,star_jy
58,1342209059,CD-64 1208,29.239766,PACS100,100,0.002019,0.002019
59,1342209059,CD-64 1208,29.239766,PACS70,70,0.004078,0.004078
100,1342224850,GJ 3305,,PACS100,100,0.001605,0.001605
101,1342224850,GJ 3305,,PACS70,70,0.003248,0.003248
354,1342224850,* c Eri,29.429076,PACS100,100,0.01294,0.005957
355,1342224850,* c Eri,29.429076,PACS70,70,0.023331,0.012128
1012,1342224848,* alf Cen B,1.254831,PACS100,100,0.725506,0.725506
1013,1342224848,* alf Cen B,1.254831,PACS70,70,1.47126,1.47126
2150,1342209059,HD 172555,28.54696,PACS100,100,0.089189,0.007246
2151,1342209059,HD 172555,28.54696,PACS70,70,0.192049,0.014784


In [20]:
#look for level 2 observations
for path in Path(rootdir).rglob('*/level2/HPPPMAPB/*'):
    
    full_path = path.resolve()
    print(full_path)

/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278292664/1342187248/level2/HPPPMAPB/hpacs1342187248_20hpppmapb_00_1469224771737.fits.gz
/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278292666/1342187364/level2/HPPPMAPB/hpacs1342187364_20hpppmapb_00_1469228027740.fits.gz
/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278292668/1342187365/level2/HPPPMAPB/hpacs1342187365_20hpppmapb_00_1469226870160.fits.gz
/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278292670/1342187805/level2/HPPPMAPB/hpacs1342187805_20hpppmapb_00_1469231900903.fits.gz
/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278292672/1342187806/level2/HPPPMAPB/hpacs1342187806_20hpppmapb_00_1469228550974.fits.gz
/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278292674/1342187807/level2/HPPPMAPB/hpacs1342187807_20hpppmapb_00_1469231950545.fits.gz
/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278292676/1342187808/level2/HPPPMAPB/hpacs1342187808_20hpppmapb_00_1469228514727.fits.gz
/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278292678/1342187835/leve

/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278292903/1342215474/level2/HPPPMAPB/hpacs1342215474_20hpppmapb_00_1469416693339.fits.gz
/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278292905/1342215476/level2/HPPPMAPB/hpacs1342215476_20hpppmapb_00_1469415173775.fits.gz
/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278292909/1342215482/level2/HPPPMAPB/hpacs1342215482_20hpppmapb_00_1469415612142.fits.gz
/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278292932/1342215623/level2/HPPPMAPB/hpacs1342215623_20hpppmapb_00_1469416636323.fits.gz
/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278292936/1342216032/level2/HPPPMAPB/hpacs1342216032_20hpppmapb_00_1469417947653.fits.gz
/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278292938/1342216034/level2/HPPPMAPB/hpacs1342216034_20hpppmapb_00_1469417434125.fits.gz
/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278292942/1342216422/level2/HPPPMAPB/hpacs1342216422_20hpppmapb_00_1469419547014.fits.gz
/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278292945/1342216426/leve

/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278306526/1342243482/level2/HPPPMAPB/hpacs1342243482_20hpppmapb_00_1469591989843.fits.gz
/data/wyatt_archive/gkennedy/hsa/auto/AIOURL278306528/1342243483/level2/HPPPMAPB/hpacs1342243483_20hpppmapb_00_1469592092538.fits.gz
