# Elastic Net Regression and Chianti Line List Analysis
-----

This notebook contains the code and step-by-step instructions for reproducing the statistical analysis of the solar EUV spectrum with the Chianti Atomic Database. 

All of the required files (except for arrays_compressed.npz) can be found on github: https://github.com/GriffinKeener-lasp/EUV-Elastic-Net

----
First, import the packages and initialize the NetRegressor class



In [None]:
import numpy as np
from euvRegression import ElasticNet, ChiantiPrep, NetRegressor, Plot
import seaborn as sns 
import matplotlib.pyplot as plt

# Ignore the warnings here

In [None]:
net = NetRegressor(file_path = 'arrays_compressed.npz', num_nets = 100)

# Prepare the Data and Perform the Elastic Net Regression

## net.prep_data()
----
This function simply reads in the data, and combines the DEM with the EXIS data. This is so that we can create a model in which the EVE spectrum can be reproduced from the DEM + EXIS. 

## run_regression_pipeline
----
This is where the bulk of the computation happens. (Note: on an M2, 8 core CPU this process takes ~1 hour to complete)

### Processes
1. Initialize temperature ranges and empty arrays
2. Train_test_split
   * Split the data into training and testing sets, default test size for this is 25% of the data
3. Standardization
   * Uses sklearn.StandardScaler(). We use fit_transform() on the training data, and transform() on the testing data
   * Standardization is important to make the model happy. The model only works on data with a mean of 0 and a variance of ~1. During the regression, the mean and variance are printed periodically to ensure that these are true. 

4. Elastic Net Regression
   * Here we run the regression under the following conditions:
     ElasticNet(fit_intercept=False, max_iter=100, random_state=42, alpha=1e-6, l1_ratio=0.5, selection='random', warm_start=True)
   * a lower value for alpha will yield more coefficients, and a higher value will produce less. Higher alphas take more time to compute, as they will push more coefficients to zero. A more in depth description can be found on the sklearn docs: https://scikit-learn.org/dev/modules/generated/sklearn.linear_model.ElasticNet.html
  
5. Align results with Chianti
   * For each regression, a chianti dataframe is created. The ion that most closely matches in wavelength to an EVE line is selected, and the coefficients are stored in that row. For example, HeII has an emission wavelength of 304 A. The model will produce the coefficients for a line at 304.4 A. So the lines that have these coefficients are stored in the dataframe in the HeII row.
   * Then, the ChiantiPrep class calls functions to determine the location of the contributing ions based on their temperature in the database

6. Find ratios of contribution over time, for each temperature range
   * For each temperature zone, we find the ratio of contribution from the Corona and the Transition Region (TR). For example, if the lower TR contains only 1 ion, and that ion has contributions from 6 Corona ions and 4 TR ions, then the ratio is 0.4 (40% of the contribution comes from the TR)

In [None]:
net.prep_data()

### Inputs: 

1. ion_list_file: The Chianti atomic database file path
2. wavelength_file: file path to the wavelength array
3. output_dir: The desired output directory for the modified chainti dataframes.
   * If None, the dataframes will not be saved. Saving the dataframes is useful if you want to use them, but not necessary for most plotting purposes.

4. t_ranges: The temperature ranges of interest, default is t_ranges = np.array([[4, 5], [5, 5.5], [5.5, 6], [6, 6.5], [6.5, 7], [7, 8]]). If other temperature ranges are desired, then make sure they are of the same format

In [None]:
fractions_all = net.run_regression_pipeline(ion_list_file = 'Ion_line_list.xlsx', wavelength_file='wavelength_arr.npy', output_dir = None, t_ranges=None)

In [None]:
# Save the fractions array, this is necessary for plotting 

np.save('fractions_all.npy', fractions_all)

# Plotting/Analysis 

The code below shows some basic plots. Make sure to run the np.save line in the above cell, as that array is the single output of the entire regression pipeline. 

Inputs
--------
To initialize the plotting class, only the 'fractions_all' file path is required. 

Then, to make a ratio plot over time, the time_file argument must be passed to the function. This is the file path to the 'gregorian_dates.pkl' file.

In [None]:
plot = Plot('fractions_all.npy')

In [None]:
plot.ratio_hist()

In [None]:
plot.plot_contribution_ratios('gregorian_dates.pkl')