### Part 4: Investigating ways to improve your regression model

Goals: investigate if you can improve your regression model using additional fluxes or morphology information 

Specifics: The data that I provided contain some additional information that might be useful in redshift estimation, such as the sizes of the galaxies, and the different flux measurements are sensitive to light from different parts of the galaxies.

You will need to:

1. look at correlations between various flux measurements and see if you can find additional information that might be useful,
2. try to add this information to a redshift estimation algorithm
3. look at correlations with additional morphology information such as the sizes of galaxies, and see if you can find additional information that might be useful,
4. try to add this morphology information to a redshift estimation algorithm.



If you want to see what things should look like, you can have a look:

1. in the notebook [07_GalaxySize.ipynb](https://github.com/KIPAC/MACSS/blob/main/nb/07_GalaxySize.ipynb) to see examples of computing magnitudes and colors.

2. in the notebook [08_Morphology.ipynb](https://github.com/KIPAC/MACSS/blob/main/nb/08_Morphology.ipynb) to see some studies of how the different flux measurement are affected by the Galaxy size, and how that correlates with redshift.

3. in the notebook [09_CosmoSize.ipynb](https://github.com/KIPAC/MACSS/blob/main/nb/09_CosmoSize.ipynb) to see how a Milkyway like Galaxy might appear at different redshifts.

4. in the notebook [10_SklearnMorphology.ipynb](https://github.com/KIPAC/MACSS/blob/main/nb/10_SklearnMorphology.ipynb) to see an example of adding some morphology information to a redshift estimation algorithm.


In [None]:
import os
import tables_io
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.figure import Figure
from sklearn import preprocessing
from sklearn.decomposition import PCA
from sklearn.ensemble import (HistGradientBoostingRegressor, ExtraTreesRegressor, AdaBoostRegressor)
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.isotonic import IsotonicRegression
from sklearn.linear_model import HuberRegressor, LinearRegression, QuantileRegressor
from sklearn.svm import NuSVR
from sklearn.neighbors import KNeighborsRegressor, RadiusNeighborsRegressor
from sklearn import preprocessing
from sklearn.decomposition import PCA

#### Change this to match the correct location

In [None]:
HOME = os.environ['HOME']
pz_dir = f'{HOME}/macss'

#### Here we are going to prepare load test and training data

In [None]:
train = tables_io.read(f'{pz_dir}/data/lsst_cat_matched_nonan_train.hdf5')
test = tables_io.read(f'{pz_dir}/data/lsst_cat_matched_nonan_test.hdf5')


#### Here is an example of adding shape information to the ML "features"

In [None]:
def get_features_with_sersic_index(t):
    sersic_x = t['sersic_reff_x']  
    sersic_y = t['sersic_reff_y']
    sersic_trace = np.nan_to_num(sersic_x*sersic_x + sersic_y*sersic_y, nan=0)
    sersic_index = np.nan_to_num(t['sersic_index'], 0)

    g_trace = np.nan_to_num(t['g_ixx'] + t['g_iyy'], nan=0)
    z_trace = np.nan_to_num(t['z_ixx'] + t['z_iyy'], nan=0)

    feature_list = [sersic_index, g_trace, z_trace, sersic_trace]
    feature_list += [np.nan_to_num(t[f'{band}_psfMag'], nan=30) for band in 'ugrizy']
    
    return np.vstack(feature_list).T

#### Here is an example of adding more that one type of flux to the ML "features"

In [None]:
def get_features_with_two_fluxes(t):

    feature_list = []
    feature_list += [np.nan_to_num(t[f'{band}_cModelMag'], nan=30) for band in 'ugrizy']
    feature_list += [np.nan_to_num(t[f'{band}_psfMag'], nan=30) for band in 'ugrizy']
    
    return np.vstack(feature_list).T