## Environment Setup
#### make venv for project
1. create virtual environment for project
- `python3 -m venv` my_venv
2. activate your virtual environment
- `source my_venv/bin/activate`
- note: to deactivate virtual environment 
    - `deactivate`
#### install extra libraries for project
1. install python libraries via pip
- `pip install`  my_cool_python_lib
2. once done save environment dependencies 
- `pip freeze > requirements.txt`
- note: to install dependencies given requirements.txt
    - `pip install -r /{PATH_TO}/requirements.txt`
#### add venv to vs code
1. search for python interpreters 
- `ctrl+shift+p`
- search python interpreters
- select *Python: Select Interpreter* 
- select *Enter interpreter path..*
    - enter absolute path to virtual environment python
        - ex: */home/usr/my_venv/bin/python3*
#### [markdown pro-tips](https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax)

# change local links fpaths once removed from main notebook
## Datasets

#### ICESat-2
1. Description
- explain general
- altimeter data, lots of diff data products.. see image below
![alt text](./icesat2_data_products.png "ICESat-2 Data Products")
- Read more [here](https://nsidc.org/data/icesat-2/products)
- formats: h5py
2. where to access
- use icepyx to access
    - Read more[here](https://icepyx.readthedocs.io/en/latest/example_notebooks/IS2_data_access.html)
- use earthaccess 
    - Read more [here](https://earthaccess.readthedocs.io/en/latest/quick-start/)
- hosted by nasa earthdata.nasta.gov
    - see more public data [here](earthdata.nasa.gov)
3. Specific Processing Tools
- [icepyx](https://github.com/icesat2py/icepyx/blob/main/doc/source/example_notebooks/IS2_data_access.ipynb)
- [earthaccess](https://github.com/nsidc/earthaccess)
    - add instructions for [.netrc] creation.. need account on [urs.earthdata.gov](https://earthaccess.readthedocs.io/en/latest/quick-start/)
- spacepy or h5py
4. Potential applications
- look at elevation changes of land ice, sea ice, vegetation, atmosphere etc. see data products
#### GRACE & GRACE-FO
1. Description
- explain general
    - The GRACE and GRACE-FO “mascons” (e.g. RL06M) add *why*(pg 29-30 L3 handbook)
    - Read more [here](./resources/data_docs/GRACE-FO_L3_Handbook_JPL.pdf)
- version: JPL RL06M Version 2.0
    - Read more [here](./resources/data_docs/GRACE_GRACE-FO_ReleaseNotes_JPL_MASCON.txt)
- formats: *netcdf, ascii, geotiff* (land only).
- find other docs [here](https://podaac.jpl.nasa.gov/gravity/gracefo-documentation)
2. Data Access
- access through PO.DAAC at JPL or ISDC at GFZ(pg 28 L3 GRACE handbook)
    - [PO.DAAC data here](https://podaac.jpl.nasa.gov/dataset/TELLUS_GRAC-GRFO_MASCON_CRI_GRID_RL06_V2)
    - note: ocean and land girds are published separately
    - [ISDC data here](https://isdc.gfz.de/homepage/)
3. Specific Processing Tools
- Tool 1 see L3 hand book for guidence
    - github link
- Tool 2 need something for handling data formats(*netcdf, ascii, geotiff*)
4. Potential Applications
- Weather Forecasting, Earthquake Observation, Ice Mass Change, etc.
5. Step Summary
- preliminary exploration:explore data using GRACE(-FO) Data Analysis Tool to find areas of interest 
    - [GRACE(-FO) Data Analysis Tool here](https://grace.jpl.nasa.gov/data/data-analysis-tool/)
- download data sets for areas of interest
- pre-processing: multiply mascon by gain factor(also called scale factor) to enhance spatial resolution(L3 handbook pg 30)
- visualize data(see example L3 handbook pg 31)

## Question Motivation
#### Motivation
1. combine altimeter data (ICESat-2) with gravity measurements
- *Combining observations of sea level from altimeters with GRACE observations of ocean mass change provides a new constraint on the rate of thermal expansion in the global ocean, and hence on ocean heat content change, which enable a more complete estimation of the global sea Level budget*(L3 handbook pg 24)
2. why classify data?
3. why look at ice thickness?
#### Questions
1. What percentage of (area of interest) is land, sea ice, land ice,vegetation, sea(insert other categories)? How does this ratio change over (time period of dataset)?
- classification.. kmeans? look for something with classification of areas using ICESat-2/GRACE datasets?
2. What is current thickness of ice in (area of interest)? Thickness is related to ice mass change(L3 handbook pg 24) How does change over (time period of dataset) and seasonally?
- use regression to forecast mass/vol over (time period of dataset)
3. How does seasonal meltwater dynamics affect the relationship btw mass change and elevation? How does meltwater dynamics over (time period of dataset) in (area of interest) how does this affect future sea level rise?
- use elevation changes of ice from q1, mass/volume change over time from q2, and gravity changes to calc meltwater.
- use temporal meltwater volume variations from land ice q3.1, vegetation growth(more plants suck more water)q1, and elevation changes of sea q1.

## Dataset Motivation
#### 1. Personal: Understanding processes within the realm of earth science and how remote observations can fuel insights of components that comprise these systems. Some examples are as follows:
- What impact do solar cycles have on the conditions in space near earth. Specifically, how do solar storms deform earth's magnetic field. What does this magnetic field deformation mean for the reliability/ functionality of our satellites, communication systems, and power grids.
- How do asymmetries existing at the core-mantel boundary influence geodynamo currents and mantel convection. If these CMB asymmetries do influence mantel convection(drives tectonic plate motion) can we use core-mantle asymmetries as a predictive method for earthquakes. 
- How can observations from space help us understand the non-linear behavior of the global climate?
#### 2. Professional: Work as a software developer in the space research group at Los Alamos National Lab. I help manage creation of higher level data products for several satellite constellations, and would like to understand when data fusion from multiple sources is appropriate, how is it applied and how can it help us leverage existing equipment to the max? 
#### 3. Academic:
- Multi-source remote sensing data fusion: status and trends: Zhang
- A review of practical AI for remote sensing in earth sciences
- The Ice, Cloud, and Land Elevation Satellite 2 Mission: A Global Geolocated Photon Product Derived From the Advanced Topographic Laser Altimeter System: Neumann
- Community estimate of global glacier mass changes from 2000 to 2023: The GlaMBIE Team
- A comparison of coincident GRACE and ICESat data over Antarctica: Gunter
- Integrating Models and REmote Sensing Data for Distributed Glacier Mass balance Estimation: Podsiadlo
- Measuring glacier mass changes from space - a review: Berthier
- Comparing elevation and backscatter retrievals from CryoSat-2 and ICESat-2 over Arctic summer ice: Dawson
- Review article Multisensor image fusion in remote sensing: concepts, methods, and applications: Pohl
- add GRACE and ICESat-2 docs
- maybe missing something from scopus??




In [89]:
import pandas as pd
import numpy as np
# read in datasets
df_star = pd.read_csv('/home/scotty/dsc_207_final_project/sdss_star_ds.csv',skiprows=1)
df_galaxy_agn = pd.read_csv('/home/scotty/dsc_207_final_project/sdss_galaxy_agn_ds.csv',skiprows=1)
df_galaxy_broadline = pd.read_csv('/home/scotty/dsc_207_final_project/sdss_galaxy_broadline_ds.csv',skiprows=1)
df_galaxy_starburst = pd.read_csv('/home/scotty/dsc_207_final_project/sdss_galaxy_starburst_ds.csv',skiprows=1)
df_galaxy_starforming = pd.read_csv('/home/scotty/dsc_207_final_project/sdss_galaxy_starforming_ds.csv',skiprows=1)
df_galaxy_unclassified = pd.read_csv('/home/scotty/dsc_207_final_project/sdss_galaxy_unclassified_ds.csv',skiprows=1)
df_quasar = pd.read_csv('/home/scotty/dsc_207_final_project/sdss_qso_ds.csv',skiprows=1)

# combine into one
df = pd.concat([df_star,df_galaxy_agn,df_galaxy_broadline,df_galaxy_starburst,df_galaxy_starforming,df_galaxy_unclassified,df_quasar])
print(df['class'].value_counts())
#print(df_star.head())
#print(df_galaxy.head())
#print(df_quasar.head())
df.reset_index(inplace=True,drop=True)
df.to_csv('sdss_mixed.csv',header=True,index=False)

class
STAR      50000
GALAXY    50000
QSO       50000
Name: count, dtype: int64


In [55]:
# look at top 5 rows
print(df.head())

                                                                                                                                                                                                #Table1
objid               ra               dec               dered_u  dered_g  dered_r  dered_i  dered_z  run  rerun camcol field specobjid           class redshift      plate mjd   fiberid probPSF    mode
1237651274035560615 128.882532702746 54.928917456568   19.43687 17.76148 17.13218 16.91798 16.80427 1350 301   6      179   3785283816004886528 STAR  -0.0001525885 3362  54939 30      1             1
1237651274035560626 128.871463576431 54.9678830823743  19.22869 18.083   17.66953 17.49974 17.44089 1350 301   6      179   3785282991371165696 STAR  6.084921E-05  3362  54939 27      1             1
1237652630712942658 8.6426989238844  -9.46154357342101 20.478   19.48022 19.40721 19.40105 19.31647 1666 301   5      249   3496060304062783488 STAR  -0.0005251435 3105  54825 513     1             1


In [54]:
print(df.head())
# convert mjd to datetime
def mjd_to_datetime(mjd_series):
    # Convert MJD to datetime
    new_date_col = {'date':[]}
    for mjd_date in mjd_series:
        new_date = pd.to_datetime(mjd_date, unit='D', origin='1858-11-17').date()
        new_date_col['date'].append(new_date)
    return(new_date_col['date'])



                                                                                                                                                                                                #Table1
objid               ra               dec               dered_u  dered_g  dered_r  dered_i  dered_z  run  rerun camcol field specobjid           class redshift      plate mjd   fiberid probPSF    mode
1237651274035560615 128.882532702746 54.928917456568   19.43687 17.76148 17.13218 16.91798 16.80427 1350 301   6      179   3785283816004886528 STAR  -0.0001525885 3362  54939 30      1             1
1237651274035560626 128.871463576431 54.9678830823743  19.22869 18.083   17.66953 17.49974 17.44089 1350 301   6      179   3785282991371165696 STAR  6.084921E-05  3362  54939 27      1             1
1237652630712942658 8.6426989238844  -9.46154357342101 20.478   19.48022 19.40721 19.40105 19.31647 1666 301   5      249   3496060304062783488 STAR  -0.0005251435 3105  54825 513     1             1


In [51]:
df = pd.read_csv('sdss_quasar.csv')
print(df.columns)

Index(['dered_u', 'dered_g', 'dered_r', 'dered_i', 'dered_z', 'mode', 'clean',
       'type', 'probPSF', 'ra', 'dec', 'mjd', 'redshift', 'ObjID', 'class'],
      dtype='object')
