The code below is to be used for semi-automatically identifying emission features in a given spectra by taking advantage of the fact that while the locations of lines change with redshift, the ratios between line positions are redshift-invariant. The ratios of all possible line pairs in an input calibration list are compared  with the ratios of all possible pairs of observed liness, and matches are suggested if a pair of observed wavelengths has a ratio within a user-specified error of a calibration pair. The suggested identifications for these pairs are then printed out, and their redshifts (and averaged redshfit estimate with uncertainty) can also be printed out if desired. Any additions, changes, or improvements to the code below are more than welcome!

-Armaan Goyal

In [17]:
import numpy as np

def check_redshift(cal_names, cal_wav, obs_wav, offset, redshift = False, **kwargs):
    '''
    Attempts to identify features in spectra by comparing ratios of observed lines to those of catalogued rest-frame lines.
    Optionally calculates and ouptuts redshift.
    
    Parameters:
    -----------
    cal_names (list of str): list of transition names of catalogued lines (e.g. "OII" or "H-beta")
    cal_wav (list of float): list of rest-frame wavelengths of catalogued lines in the same order as cal_names
    obs_wav (list of float): list of observed wavelengths of features in data
    offset (float): Maximum percent error of ratio (reccomend .01 for first pass and .001 afterwards)
    redshift (bool): Toggle for redshift calculation (defaults to False)
    z_low, z_high (float, optional): Optional limits for redshift determination
    clip (float, optional): Optional radius for redshift clipping (only lines with redshifts between median-clip and median+clip will be considered)
    
    Returns:
    --------
    Various print statements detailing line candidates, ratio values, and redshift estimates (if needed).
    Outputs list of "line objects" where each element is a tuple organized as 
    (line name, observed wavelength, rest wavelength, redshift (if desired))
    
    '''
    ratios = []
    ratio_names = []
    error = []
    candidates = []
    zs = []
    output = []
    zs_to_rem = []
    lines_to_rem = []
    z_low = kwargs.get("z_low", 0)
    z_high = kwargs.get("z_high", 10)
    clip = kwargs.get("clip", .1)
    print("LINE PAIR CANDIDATES (WAVELENGTH RATIO WITHIN %.2f%% OF LAB VALUE)"%(100.*offset))
    print("----------------------------------------------------------------------------")
    for i in range(len(cal_names)):
        for j in range(len(cal_names[(i+1):])):
            ratios.append(cal_wav[(i+1):][j]/cal_wav[i])
            ratio_names.append(cal_names[i] + " & " + cal_names[(i+1):][j])
    for r in range(len(ratios)):
        for i in range(len(obs_wav)):
            for j in range(len(obs_wav[(i+1):])):
                obs_ratio = obs_wav[(i+1):][j]/obs_wav[i]
                percent_error = abs((obs_ratio - ratios[r])/ratios[r])
                if percent_error <= offset:
                    error.append(percent_error)
                    candidates.append(((i+1, j+2+i), ratio_names[r]))
                    print("Line %d and %d (ratio %.6f) possible candidates for %s pair (ratio %.6f)"%(i+1, j+2+i, obs_ratio, ratio_names[r], ratios[r]))
    if redshift:
        print()
        print("INDIVIDUAL LINE REDSHIFT CALCULATIONS (AND REJECTIONS)")
        print("----------------------------------------------------------------------------")
    for pair in candidates:
        selected_lines = pair[0]
        names = pair[1].split(" & ")
        (obs1, obs2) = (obs_wav[selected_lines[0]-1], obs_wav[selected_lines[1]-1])
        (line1, line2) = (cal_wav[cal_names.index(names[0])], cal_wav[cal_names.index(names[1])])
        line_object1 = [names[0], obs1, line1]
        line_object2 = [names[1], obs2, line2]
        if line_object1 not in output:
            output.append(list(line_object1))
        if line_object2 not in output:
            output.append(list(line_object2))
    if redshift:
        for line_obj in output:
            z = (line_obj[1]/line_obj[2]) - 1
            line_obj.append(z)
            zs.append(z)
            print("Identifying line at %.2f ang as %s yields redshift of approx. z = %.4f"%(line_obj[1], line_obj[0], z))
        avg_z = np.mean(np.array(zs))
        std_z = np.std(np.array(zs))
        med_z = np.median(np.array(zs))
        for line_obj in output:
            if line_obj[3] < med_z - clip or line_obj[3] > med_z + clip:
                zs_to_rem.append(line_obj[3])
                lines_to_rem.append(line_obj)
                print("%s candidate at %.2f ang rejected by redshift clipping (clipping radius = %.2f)"%(line_obj[0], line_obj[1], clip))

            if line_obj[3] < z_low or line_obj[3] > z_high:
                zs_to_rem.append(line_obj[3])
                lines_to_rem.append(line_obj)
                print("%s candidate at %.2f ang rejected by redshift constraints (z between %.2f and %.2f)"%(line_obj[0], line_obj[1], z_low, z_high))

        for z in zs_to_rem:
            if z in zs:
                zs.remove(z)
        for line in lines_to_rem:
            if line in output:
                output.remove(line)  
            
        print()
        print("FINAL REDSHIFT CALCULATION (N = %d)"%(np.size(zs)))
        print("----------------------------------------------------------------------------")
        avg_z = np.mean(np.array(zs))
        std_z = np.std(np.array(zs))
        med_z = np.median(np.array(zs))
        for i in range(len(zs)):
            zs[i] = round(zs[i], 4)
        print("Individual Line Redshifts: " + str(zs)[1:-1])
        print("Mean: z = %.4f, Median: z = %.4f, Std. Dev.: %.4f"%(avg_z, med_z, std_z))
    return output

Here are a few different calibration line lists for optical spectra of galaxies. The first list consists of the vaccum wavelengths of the five most common lines in galactic spectra, while the other two respectively provide air and vacuum values for a more comprehensive set of lines.

In [18]:
cal_names_vac_basic = ["O II", 'Hβ', 'O III-1', 'O III-2', 'Hα']
cal_wav_vac_basic = [3726.03, 4861.33, 4958.92, 5006.84, 6563]

cal_names_air = ['Mg II', 'O II', 'Hδ', 'Hγ', 'Hβ', 'O III-1', 'O III-2', 'N II', 'Hα', "N II", "S II", "S II"]
cal_wav_air = [2799.117, 3727.092, 4102.89, 4341.68, 4862.68, 4960.295, 5008.24, 6549.86, 6564.61, 6585.27, 6718.29, 6732.67]

cal_names_vac = ["CaH", "CaK", "O II", 'Ne III', 'Hδ', 'Hγ', 'O III-3', 'Hβ', 'O III-1', 'O III-2', 'Hα']
cal_wav_vac = [3933.663, 3968.468, 3726.03, 3969, 4101.76, 4340.47, 4363, 4861.33, 4958.92, 5006.84, 6563]

A good first pass generally consists of using the basic line list with an offset of around .003-.005 (.3-.5%). Our general benchmark for accurately identified pairs is around .001 (.1%), so keep this in mind as you scan the results to determine which pairs are the best candidates for further runs. An example of this with sample data is shown below:

In [19]:
obs_wav = [5419.57, 7067.71, 7209.96, 7279.55, 9538.86]
offset = .003

output = check_redshift(cal_names_vac_basic, cal_wav_vac_basic, obs_wav, offset, redshift = True)

LINE PAIR CANDIDATES (WAVELENGTH RATIO WITHIN 0.30% OF LAB VALUE)
----------------------------------------------------------------------------
Line 1 and 2 (ratio 1.304109) possible candidates for O II & Hβ pair (ratio 1.304694)
Line 1 and 3 (ratio 1.330356) possible candidates for O II & O III-1 pair (ratio 1.330886)
Line 1 and 4 (ratio 1.343197) possible candidates for O II & O III-2 pair (ratio 1.343747)
Line 1 and 5 (ratio 1.760077) possible candidates for O II & Hα pair (ratio 1.761392)
Line 2 and 3 (ratio 1.020127) possible candidates for Hβ & O III-1 pair (ratio 1.020075)
Line 2 and 4 (ratio 1.029973) possible candidates for Hβ & O III-2 pair (ratio 1.029932)
Line 2 and 5 (ratio 1.349639) possible candidates for Hβ & Hα pair (ratio 1.350042)
Line 3 and 4 (ratio 1.009652) possible candidates for O III-1 & O III-2 pair (ratio 1.009663)
Line 3 and 5 (ratio 1.323012) possible candidates for O III-1 & Hα pair (ratio 1.323474)
Line 4 and 5 (ratio 1.310364) possible candidates for O II

There seem to be no repeat pairs (every line ID is single valued), so this first pass seems to be good. If we tighten the restrictions we see that nothing changes:

In [20]:
obs_wav = [5419.57, 7067.71, 7209.96, 7279.55, 9538.86]
offset = .001

output = check_redshift(cal_names_vac_basic, cal_wav_vac_basic, obs_wav, offset, redshift = True)

LINE PAIR CANDIDATES (WAVELENGTH RATIO WITHIN 0.10% OF LAB VALUE)
----------------------------------------------------------------------------
Line 1 and 2 (ratio 1.304109) possible candidates for O II & Hβ pair (ratio 1.304694)
Line 1 and 3 (ratio 1.330356) possible candidates for O II & O III-1 pair (ratio 1.330886)
Line 1 and 4 (ratio 1.343197) possible candidates for O II & O III-2 pair (ratio 1.343747)
Line 1 and 5 (ratio 1.760077) possible candidates for O II & Hα pair (ratio 1.761392)
Line 2 and 3 (ratio 1.020127) possible candidates for Hβ & O III-1 pair (ratio 1.020075)
Line 2 and 4 (ratio 1.029973) possible candidates for Hβ & O III-2 pair (ratio 1.029932)
Line 2 and 5 (ratio 1.349639) possible candidates for Hβ & Hα pair (ratio 1.350042)
Line 3 and 4 (ratio 1.009652) possible candidates for O III-1 & O III-2 pair (ratio 1.009663)
Line 3 and 5 (ratio 1.323012) possible candidates for O III-1 & Hα pair (ratio 1.323474)
Line 4 and 5 (ratio 1.310364) possible candidates for O II

Now that we have a good idea of what these lines are, let us double check by running the routine again with one of the more comprehensive lists:

In [21]:
obs_wav = [5419.57, 7067.71, 7209.96, 7279.55, 9538.86]
offset = .001

output = check_redshift(cal_names_vac, cal_wav_vac, obs_wav, offset, redshift = True)

LINE PAIR CANDIDATES (WAVELENGTH RATIO WITHIN 0.10% OF LAB VALUE)
----------------------------------------------------------------------------
Line 3 and 4 (ratio 1.009652) possible candidates for CaH & CaK pair (ratio 1.008848)
Line 3 and 4 (ratio 1.009652) possible candidates for CaH & Ne III pair (ratio 1.008983)
Line 1 and 2 (ratio 1.304109) possible candidates for O II & Hβ pair (ratio 1.304694)
Line 1 and 3 (ratio 1.330356) possible candidates for O II & O III-1 pair (ratio 1.330886)
Line 1 and 4 (ratio 1.343197) possible candidates for O II & O III-2 pair (ratio 1.343747)
Line 1 and 5 (ratio 1.760077) possible candidates for O II & Hα pair (ratio 1.761392)
Line 2 and 3 (ratio 1.020127) possible candidates for Hβ & O III-1 pair (ratio 1.020075)
Line 2 and 4 (ratio 1.029973) possible candidates for Hβ & O III-2 pair (ratio 1.029932)
Line 2 and 5 (ratio 1.349639) possible candidates for Hβ & Hα pair (ratio 1.350042)
Line 3 and 4 (ratio 1.009652) possible candidates for O III-1 & O 

Here we see other possibilites arise, but seeing as the line pair with lines 3 & 4 is doubly identified, and that the first two redshift values are not at all consistent with the others, we can say with confidence that these identifications are not correct. We may tighten our sample by forcing a redshift range between z_high and/or z_low, but note here that the discrepant lines were automatically exluded by the default redshift clipping radius of 0.1 (this can be altered with the clip parameter).

In [15]:
obs_wav = [5419.57, 7067.71, 7209.96, 7279.55, 9538.86]
offset = .001

output = check_redshift(cal_names_vac, cal_wav_vac, obs_wav, offset, z_low = .05, z_high = .5, redshift = True)

LINE PAIR CANDIDATES (WAVELENGTH RATIO WITHIN 0.10% OF LAB VALUE)
----------------------------------------------------------------------------
Line 3 and 4 (ratio 1.009652) possible candidates for CaH & CaK pair (ratio 1.008848)
Line 3 and 4 (ratio 1.009652) possible candidates for CaH & Ne III pair (ratio 1.008983)
Line 1 and 2 (ratio 1.304109) possible candidates for O II & Hβ pair (ratio 1.304694)
Line 1 and 3 (ratio 1.330356) possible candidates for O II & O III-1 pair (ratio 1.330886)
Line 1 and 4 (ratio 1.343197) possible candidates for O II & O III-2 pair (ratio 1.343747)
Line 1 and 5 (ratio 1.760077) possible candidates for O II & Hα pair (ratio 1.761392)
Line 2 and 3 (ratio 1.020127) possible candidates for Hβ & O III-1 pair (ratio 1.020075)
Line 2 and 4 (ratio 1.029973) possible candidates for Hβ & O III-2 pair (ratio 1.029932)
Line 2 and 5 (ratio 1.349639) possible candidates for Hβ & Hα pair (ratio 1.350042)
Line 3 and 4 (ratio 1.009652) possible candidates for O III-1 & O 

We see that these line identifications are consistent with those before. For the sake of completeness, we can compare these values to those yielded by a run using the air wavelengths:

In [23]:
obs_wav = [5419.57, 7067.71, 7209.96, 7279.55, 9538.86]
offset = .001

output = check_redshift(cal_names_air, cal_wav_air, obs_wav, offset, z_low = .1, z_high = .5,redshift = True)

LINE PAIR CANDIDATES (WAVELENGTH RATIO WITHIN 0.10% OF LAB VALUE)
----------------------------------------------------------------------------
Line 1 and 3 (ratio 1.330356) possible candidates for Mg II & O II pair (ratio 1.331524)
Line 1 and 2 (ratio 1.304109) possible candidates for O II & Hβ pair (ratio 1.304685)
Line 1 and 3 (ratio 1.330356) possible candidates for O II & O III-1 pair (ratio 1.330875)
Line 1 and 4 (ratio 1.343197) possible candidates for O II & O III-2 pair (ratio 1.343739)
Line 1 and 5 (ratio 1.760077) possible candidates for O II & Hα pair (ratio 1.761322)
Line 2 and 3 (ratio 1.020127) possible candidates for Hβ & O III-1 pair (ratio 1.020074)
Line 2 and 4 (ratio 1.029973) possible candidates for Hβ & O III-2 pair (ratio 1.029934)
Line 2 and 5 (ratio 1.349639) possible candidates for Hβ & Hα pair (ratio 1.349998)
Line 3 and 4 (ratio 1.009652) possible candidates for O III-1 & O III-2 pair (ratio 1.009666)
Line 3 and 5 (ratio 1.323012) possible candidates for O II

We see that these values are in agreement with only a slight natural discrepancy. Whether you use the vacuum or air wavelengths is simply a matter of personal or group preference - just make sure you stay consistent throughout your analysis!