# **4.** Identify Lines Using HITRAN Manually
***

## Table of Contents
* [**4.1** | Load the baseline corrected spectrum](#4.1---Load-the-original-and-baseline-corrected-spectra)
* [**4.2** | Load the HITRAN linelist for that molecule](#4.2---Load-HITRAN-linelist-and-parse-them)
* [**4.3** | Find the peaks manually using Bokeh, click and print](#4.3---Identify-Peaks-Manually)
    * [**4.3.1** | Select range](#4.3.1---Select-wavelength-range-for-analysis)
    * [**4.3.2** | Find peaks](#4.3.2---Find-the-peaks-manually)
* [**4.4** | Match the HITRAN lines to the peaks](#4.4---Identify-the-lines)
* [**4.5** | Report, plot, and save the results](#4.5---Save-the-results:-Plots,-dfs)

In [1]:
# Import necessary modules
from Xpectra.SpecFitAnalyzer import SpecFitAnalyzer
from Xpectra.LineAssigner import *
from Xpectra.SpecStatVisualizer import plot_fitted_als_bokeh, plot_spectra_errorbar_bokeh

## **4.1** - Load the original and baseline-corrected spectra

$\rightarrow$ In step 2, we corrected the spectral baseline and saved it as a CSV file in the processed_data directory. Here we load that data by converting to a DataFrame: 

In [2]:
# Call environment variable and assign path to data
__reference_data_path__ = os.getenv("Xpectra_reference_data")

# Import baseline corrected spectrum
corrected_spectrum = pd.read_csv(os.path.join(__reference_data_path__,'processed_data','arpls_baseline_corrected_methane_spectrum.csv'))

# Assign wavenumber (x) and signal (y) arrays
x = corrected_spectrum['original_x'].dropna().to_numpy() 
y = corrected_spectrum['original_y'].dropna().to_numpy()

x_baseline_corr = corrected_spectrum['baseline_corrected_x'].dropna().to_numpy() 
y_baseline_corr = corrected_spectrum['baseline_corrected_y'].dropna().to_numpy()

$\rightarrow$ Visualize the imported spectra:

In [3]:
# Obtain previously fitted baseline by reverse correcting the spectrum
spectral_baseline = y - y_baseline_corr

plot_fitted_als_bokeh(wavenumber_values = x, 
                      signal_values = y,
                      fitted_baseline = spectral_baseline,
                      baseline_type = 'arpls'
                     )

## **4.2** - Load HITRAN linelist and parse them 

$\rightarrow$ The next step is to upload the HITRAN line list to a DataFrame. For this, we use the LineAssigner module, instantiating it with the baseline-corrected spectrum and HITRAN file path.

In [4]:
# Call environment variable and assign path to data
__reference_data_path__ = os.getenv("Xpectra_reference_data")

# Define path to HITRAN data
input_file = os.path.join(__reference_data_path__, 'datasets','CH4_nu3.par')

# Initialize LineAssigner
assign = LineAssigner(wavenumber_values = x_baseline_corr,
                      signal_values = y_baseline_corr,
                      hitran_file = input_file,
                      absorber_name= 'CH4')

$\rightarrow$ With the class initialized, we now parse the line list to a DataFrame. The default columns converted to the DataFrame are: 'local_iso_id', 'nu', 'sw', 'gamma_air', 'local_upper_quanta', and 'ierr'. 
    
$\rightarrow$ This function automatically seperates terms from local quanta into J quantum number, N quantum number, and symmetry. 

In [5]:
# Parse file to DataFrame
assign.parse_file_to_dataframe()

Unnamed: 0,molec_id,local_iso_id,nu,sw,a,gamma_air,gamma_self,elower,n_air,delta_air,...,iref,line_mixing_flag,gp,gpp,J_low,sym_low,N_low,J_up,sym_up,N_up
0,6,2,2900.000621,1.825000e-25,0.023890,0.0490,0.067,814.6845,0.63,-0.005800,...,64,3,3253433.0,,12,A1,1,13,A2,9
1,6,2,2900.005693,6.307000e-27,0.005030,0.0470,0.065,1096.0334,0.62,-0.005800,...,64,3,3253433.0,,14,F2,3,14,F1,40
2,6,2,2900.022027,3.048000e-27,0.022620,0.0460,0.060,1593.6378,0.61,-0.005800,...,64,3,3253433.0,,17,F2,2,17,F1,47
3,6,1,2900.027223,1.891000e-25,0.000465,0.0480,0.067,815.1315,0.63,-0.005800,...,34,3,3245363.0,,12,F1,3,13,F2,21
4,6,2,2900.035027,1.905000e-25,0.067460,0.0400,0.067,815.0317,0.63,-0.005800,...,64,3,3253433.0,,12,E,2,12,E,25
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
41623,6,2,3299.877822,1.652000e-29,0.000353,0.0450,0.059,1780.0695,0.60,-0.006500,...,54,3,3253433.0,,18,F2,3,19,F1,85
41624,6,1,3299.900527,5.946000e-29,0.000004,0.0380,0.061,1416.5543,0.61,-0.006500,...,34,3,3240363.0,,16,E,1,17,E,52
41625,6,3,3299.901848,7.204000e-29,0.000221,0.0589,0.077,532.9581,0.75,-0.006346,...,44,4,2243323.0,,11,E,4,11,E,2
41626,6,1,3299.984795,2.838000e-25,0.035670,0.0470,0.099,1526.2146,0.75,-0.006600,...,32,3,3333232.0,,6,A2,1,6,A1,31


$\rightarrow$ The HITRAN Dataframe is now accessible through class attribute hitran_df

In [6]:
# Display header and first 3 rows
assign.hitran_df.head(3)

Unnamed: 0,molec_id,local_iso_id,nu,sw,a,gamma_air,gamma_self,elower,n_air,delta_air,...,iref,line_mixing_flag,gp,gpp,J_low,sym_low,N_low,J_up,sym_up,N_up
0,6,2,2900.000621,1.825e-25,0.02389,0.049,0.067,814.6845,0.63,-0.0058,...,64,3,3253433.0,,12,A1,1,13,A2,9
1,6,2,2900.005693,6.307e-27,0.00503,0.047,0.065,1096.0334,0.62,-0.0058,...,64,3,3253433.0,,14,F2,3,14,F1,40
2,6,2,2900.022027,3.0480000000000002e-27,0.02262,0.046,0.06,1593.6378,0.61,-0.0058,...,64,3,3253433.0,,17,F2,2,17,F1,47


## **4.3** - Identify Peaks Manually 

$\rightarrow$ We move on to identifying the location (in wavenumber) of each peak in our methane spectrum. To accomplish this, we use the LineAssigner module.

### **4.3.1** - Select wavelength range for analysis

$\rightarrow$ Many times, we are only interested in a certain part of the spectrum, or the entire spectrum has too many peaks to process all at once. We select a range of wavenumbers for our analysis:

In [7]:
wavenumber_range = (2911.15, 2911.9) # cm^-1

$\rightarrow$ Lets visualize the spectrum within this wavenumber range:

In [8]:
plot_spectra_errorbar_bokeh(wavenumber_values = x_baseline_corr,
                            signal_values = y_baseline_corr,
                            wavenumber_range = wavenumber_range,
                            absorber_name = 'CH4',
                            plot_type = 'line')

### **4.3.2** - Find the peaks manually

$\rightarrow$ Manually find spectral peaks, clicking on figure to print values 

In [9]:
assign.line_finder_manual(wavenumber_range=wavenumber_range)

$\rightarrow$ Paste peak coordinates into list, and define peak centers

In [10]:
guesses_list = [[2911.187, 0.504], [2911.262, 0.594], [2911.287, 0.403], 
                [2911.350, 0.545], [2911.402, 0.450], [2911.518, 0.160], 
                [2911.623, 0.549], [2911.676, 0.100], [2911.698, 0.195]]

initial_guesses = np.array(guesses_list)

peak_centers = initial_guesses[:,0]
peak_heights = initial_guesses[:,1]

$\rightarrow$ Manually update class instance

In [11]:
assign.peak_centers_manual = peak_centers

## **4.4** - Identify the lines

$\rightarrow$ Compare peaks with known lines
    
$\rightarrow$ Find the closest line from HITRAN line list for each peak in the lab spectrum

In [12]:
# Filters HITRAN line list
filters = {'local_iso_id' : [1,2]} # Only search for common isotopologue 


# Match found lines, plot them over spectrum, and display DataFrame
assign.hitran_line_assigner(threshold = 0.02,
                            filters = filters,
                            columns_to_print = ['local_iso_id', 'J_up','nu','peak_center'], # Print over each line
                            wavenumber_range = wavenumber_range,
                            __print__ = True, # Display the fitted HITRAN DataFrame
                            __plot_bokeh__ = True, # Plot interactively with Bokeh
                            __plot_seaborn__ = False
                           )

Unnamed: 0,molec_id,local_iso_id,nu,sw,a,gamma_air,gamma_self,elower,n_air,delta_air,...,line_mixing_flag,gp,gpp,J_low,sym_low,N_low,J_up,sym_up,N_up,peak_center
0,6,1,2911.186061,5.2840000000000005e-23,0.05794,0.0576,0.07,575.2596,0.67,-0.00758,...,3,4345363.0,,10,F1,2,9,F2,35,2911.187
1,6,1,2911.261561,6.751e-23,0.07401,0.0572,0.07,575.1841,0.67,-0.00848,...,3,4345363.0,,10,F1,1,9,F2,35,2911.262
2,6,1,2911.28578,3.9030000000000005e-23,0.04281,0.0576,0.07,575.2852,0.67,-0.0076,...,3,4345363.0,,10,F2,3,9,F1,36,2911.287
3,6,2,2911.348367,5.866e-23,0.6027,0.0618,0.085,104.7777,0.75,-0.002122,...,3,3335212.0,,4,A1,1,5,A2,6,2911.35
4,6,1,2911.40108,4.3310000000000003e-23,0.04748,0.0573,0.07,575.1699,0.67,-0.00889,...,3,4345363.0,,10,F2,2,9,F1,36,2911.402
5,6,1,2911.51848,1.271e-23,0.01393,0.0583,0.07,575.0525,0.67,-0.00843,...,3,4345363.0,,10,F2,1,9,F1,36,2911.518
6,6,1,2911.622555,5.719e-23,0.0376,0.0587,0.07,575.0555,0.67,-0.00833,...,3,4345363.0,,10,A2,1,9,A1,11,2911.623
7,6,1,2911.674563,7.653e-24,3.939,0.039,0.062,1817.8431,0.75,-0.005823,...,3,3333232.0,,9,F2,7,8,F1,75,2911.676
8,6,2,2911.697399,3.172e-29,0.000225,0.045,0.06,1594.1021,0.61,-0.0058,...,3,3253433.0,,17,F1,4,18,F2,27,2911.698


## **4.5** - Save the results: Plots, dfs 

$\rightarrow$ Use plot saving functionality

In [13]:
assign.hitran_line_assigner(threshold = 0.02,
                            filters = filters,
                            columns_to_print = ['nu','peak_center'],
                            wavenumber_range = wavenumber_range,
                            __save_plot__ = True, # Save the plot (seaborn version)
                           __reference_data__ = __reference_data_path__)

<Figure size 7000x4200 with 0 Axes>

In [14]:
# Add peak_heights
assign.fitted_hitran['peak_heights'] = peak_heights

$\rightarrow$ Save fitted HITRAN DataFrame to CSV file

In [15]:
df = assign.fitted_hitran

# Define file name
file_name = "closest_hitran_lines_manual.csv"

# Save DataFrame to CSV
df.to_csv(os.path.join(__reference_data_path__,'processed_data',file_name), index=False)