## S82 training set for AGN selection
This directory hosts the Stripe 82 training set for AGN classification in LSST. At the end of this notebook, we list a set of tasks that we have accomplished regarding building the training set and those that we have planned on doing. We are aware that the SciServer ecosystem is **NOT** identical to the future LSST Science Platform, but we think that SciServer is a good place to share data and code among the AGN SC to encourage collaborations at this moment, plus SciServer provides free computing resource and also adopts Jupyter environment. Please do not hesitate to suggest alternative solutions.   

The training data (both catalog and light curves) are stored in zarr files, which stores data in chuncks and thus enables parallel read/write. There are many good features of zarr to mention, and using zarr as the backend storage is only an experiment to enable faster data I/O. To minimize the learning curve, I wrote some functions to easily explore and access the catalog and light curves ([Training Set V1](./DataV1_EDA.ipynb)/[Training Set V2](./DataV2_EDA.ipynb)). 

### V1

The first version of this training set uses old public data from [Ivezic et al. 2007](http://faculty.washington.edu/ivezic/sdss/catalogs/S82variables.html) and [MacLeod et al. 2010](http://faculty.washington.edu/ivezic/macleod/qso_dr7/Southern.html). The training set consists of two catalogs, a quasar catalog (10696 sources) and a non-AGN variables catalog (59491 sources), and the associated light curves. All quasars have spectroscopic confirmations, but not every source in the non-AGN catalog is guaranteed to be non-AGN (needs spectroscopic confirmation). Moreover, the objects in the non-AGN catalog don't have associated CRTS light curves at the moment, only SDSS light curves are included. 

#### Finished:
- Compliled a catalog of quasars and a catalog of non-AGN (not 100% pure) variables using SDSS DR7 and DR14
- Collected SDSS light curves for objects found above
- Collected available CRTS light curves for QSOs in the catalog
- Merged in available SpIES (~90 degree^2) MIR detections for objects in the QSO and non-AGN variables catalogs
- Merged in Gaia DR2 proper motion measurement for for objects in the QSO and non-AGN variables catalogs
- Calculated colors, if not already exist, for objects in the QSO catalog using best-fit PSF mags
- Write functions to perform simple light curve merging (gri \& crts+sdss)

**Since we are working on a second version of this training set, once the second version is completed, this version will be archived**  

- [Access Training Set V1](./DataV1_EDA.ipynb)

### V2

The second version is build completely from scratch, with all SDSS photometry measurement directly queried using SDSS casjob. Since many more AGN candidates were confirmed in BOSS and eBOSS. The new AGN catalog now contains ~25k confirmed sources. At the same time, the number of non-AGN varaibles in our catalog has reduced to ~25k, for two reason, first, we found that the S82 variables catalog from Ivezic acctually covers a larger area than what is defined by in the SDSS database, and two, simple due to the increased number confirmed AGNs. In addtion to the change of population, we added extinction values in all 5 bands, astrometric offset and airmass for every observation in the SDSS light curves. The Gaia proper motion and parallax measurements, and SpIES IR detection are kept unchanged.  Lastly, since the second version is still under active development, we currently don't have light curves from other surveys besides SDSS.  
- [Access Training Set V2](./DataV2_EDA.ipynb)

#### Finished:
- Compliled a catalog of quasars and a catalog of non-AGN (not 100% pure) variables using SDSS DR7 and DR14
- Collected SDSS light curves for objects found above
- Merged in available SpIES (~90 degree^2) MIR detections for objects in the QSO and non-AGN variables catalogs
- Merged in Gaia DR2 proper motion measurement for for objects in the QSO and non-AGN variables catalogs
- Write functions to perform simple light curve merging (gri)

#### To do:
- [ ] Clean up the non-AGN sample (remove contaminated AGNs if possible)
- [ ] Fit DHO model to merged light curves (crts + other surveys)
- [x] Get colors (best-fit mags) for all sources in the two catalogs using casjob
- [ ] Get corresponding CRTS light curves (DR3) for all sources

---
**!! Note:**   
It is recommended for everyone to copy the `Script_Nbs` directory to your own `persistent` storage and play with the notebook from there. Otherwise, issues might occur when multiple people execute the same notebook at the same time. If you really wish to work offline, you can download the zipped directory, but just be aware that the ``utils`` module assums the existing directory hierarchy and moving the data diretory to other places must accompanied by an update to the `utils` moodule (qso_path and var_path variables).   

To learn more about the zarr file structure: [Zarr](./Zarr.ipynb).