# SDSS eBOSS Data 
## Script on reading and pre-processing data, and generation of a catalogue of desirable galaxy types

This script extracts useful data from the spPlate and spAll_redrock fits files, and generates the required training data set.

1. **Defining input parameters**
2. **Reading and pre-processing the data**
3. **Applying selection cuts**
5. **Generating the training data set**

**Data**: 14th Oct, 2019. <br>
**Author**: Soumya Shreeram <br>
**Supervised by**: Anand Raichoor <br>
**Script adapted from**: S. Ben Nejma


## 1. Defining input parameters

In [None]:
# data directory on lesta with all the spAll_redrock files
spPlate_dir = r'/hpcstorage/raichoor/spplatelist_v5_13_0/spPlate'
spAll_redrock_dir = r'/hpcstorage/raichoor/Catalogs/' \
            'spall_redrock_v5_13_0.valid.fits'

## 2. Reading the data

In [None]:
def setName(data_dir, plate, mjd):
    file_name = filetype+'-'+str(plate)+'-'+str(mjd)+'.fits'
    data_file = os.path.join(data_dir, file_name)
    return data_file

def readFile(filename):
    """
    Function opens the file, uses the plate number, p, and MJD, m, to find their programmes
    @input filename :: name of the file
    
    @returns pms_unique, prog_unique :: unique array of p-m, and programmes
    """
    hdu = fits.open(filename)
    data = hdu[1].data
    
    # defining the PLATE number, p, and MJD, m for all the files
    pms = np.array([str(p)+'-'+ str(m) for p, m in zip(data['PLATE'],
data['MJD'])])
    
    # selecting only the unique plates-mjd, and find their programmes
    pms_unique, idx = np.unique(pms, return_index=True)
    prog_unique = data['programname'][idx]
    return pms_unique, prog_unique



In [None]:
# reads the file spAll_redrock and generates arrays of unique plate-MJD and programs
pms_unique, prog_unique = readFile(filename)

## 3. Applying selection cuts

The functions below implement various selection cuts to obtain the desired data. They are summarized below:
* Select plates that observe **E**mission-**L**ine type **G**alaxies (ELGs), LRGs, and QSOs
* Select wavelength that are common to all plates
* Removing sky spectra and certain configurations
* Select redshift range (Zspec fibres)

In [10]:
def galaxyType(pms_unique, prog_unique, names, gal_type, num_p):
    """
    Function chooses the file name based of desired galaxy type
    @params pms_unique, prog_unique :: unique array of plate nos.-MJD & programmes
    @param names :: array of names of the galaxies/programmes to select/de-select
    @param gal_type :: string to distinguish the desired operations
    @param num_p :: number of plates of each galaxy to select
    
    @returns sub_plates :: array names of selected plates 
    """
    if gal_type == 'ELG': # select ELG plates
        sub_plates = np.random.choice(pms_unique[(prog_unique==names[0]) | \
                                    (prog_unique==names[1])],size=num_p).tolist()
    elif gal_type == 'LRG+QSO': # select LRG+QSO plates
         sub_plates = np.random.choice(pms_unique[(prog_unique==names[0]) & \
                                            (prog_unique!=names[1]) & \
                                            (prog_unique!=names[2])],size=num_p).tolist()
    else: # select boss plates
        sub_plates = np.random.choice(pms_unique[(prog_unique==names[0])],size=num_p).tolist()
    return sub_plates
 
def selectPlates(pms_unique, prog_unique, num_pl):
    """
    Function the selects plates containing ELGs, LRG+QSOs, and some random.
    @param pms_unique :: arroy of plate nos. and MJDs
    @param prog_unique :: list of unique programmes (eBoss/Boss)
    @param num_pl :: number of plates of each category to select
    
    @returns selected_plates :: array of the file names containing desired galaxies
    """
    selected_plates = []
    
    # select 4 eboss ELG plates
    names_elg = ['ELG_NGC', 'ELG_SGC']
    selected_plates += galaxyType(pms_unique, prog_unique, names_elg, 'ELG', num_pl)
    
    # select 4 eboss LRG+QSO plates
    names_lrgQso = ['eboss', 'ELG_NGC', 'ELG_SGC']
    selected_plates += galaxyType(pms_unique, prog_unique, names_lrgQso, 'LRG+QSO', num_pl)
    
    # select 4 random boss plates
    names_boss = ['boss']
    selected_plates += galaxyType(pms_unique, prog_unique, names_boss, 'boss plates', num_pl)
    
    return selected_plates

def writeToFile(pms, outfilename, selected_plates):
    """
    Function extracts the info from desired files and writes to a new file
    @param pms :: complete array of plate nos. and MJD
    @param outfilename :: output file name
    @param selected_plates :: array of all the selected plates
    """    
    # extract those plate-mjd files
    extract_files = np.in1d(pms, selected_plates)
    
    # write info to new fits file
    hdu[1].data = hdu[1].data[extract_files]
    return hdu.writeto(outfilename, overwrite=True)

def selectWavelengths():
    return

def selectSpectralType():
    
    return


def selectTargetType():
    return

In [None]:
# select plates containing ELGs, LRGs, QSOs, and some boss plates
selected_plates = selectPlates(pms_unique, prog_unique, num_pl)

# write the info to a new file
writeToFile(pms, outfilename, selected_plates)