This notebook is intended to run the full LandTrendr Optimization (LTOP) workflow implemented in Python. This tool is intended to be used to select the 'optiumum' version of the LandTrendr change detection algorithm (Kennedy et al., 2010, 2018) for different areas on a landscape. 

#### Notes
- You need to authenticate the run with a specific GEE account. In the import params statement its going to trigger the ee.Authenticate() protocal which will prompt you to a browser page to authenticate using your GEE account. 
- Subsequent scripts have just the ee.Initialize protocal because you should have already authenicated and this should only happen once. 
- Right now this is going to import the params from a separate .py file and treat those as imports. It may actually be easier/more straight forward if we're going to stick with a Jupyter Notebook to just put these directly into a cell here in the Jupyter Notebook. 
- If you import a module at the top and then change something in the exe script you will need to restart or otherwise delete variables because that change won't be reflected otherwise. 

In [5]:
import ee 
import importlib
import pandas as pd 
import ltop
importlib.reload(ltop)
import lt_params
importlib.reload(lt_params)
import run_SNIC_01 as runSNIC
importlib.reload(runSNIC)
import run_kMeans_02_1 as kmeans_1
import run_kMeans_02_2 as kmeans_2
import abstract_sampling_03 as ab_img
import abstract_imager_04 as run_lt_pts
importlib.reload(run_lt_pts)
import ltop_lt_paramater_scoring_01 as param_scoring
importlib.reload(param_scoring)
import generate_LTOP_05 as make_bps
importlib.reload(make_bps)
import run_ltop_complete as runner
from run_ltop_complete import RunLTOPFull,parse_params
import yaml 
importlib.reload(runner)


LTOP version:  0.1.1


<module 'run_ltop_complete' from '/vol/v1/proj/LTOP_FTV_Py/LTOP_FTV_Py/scripts/run_ltop_complete.py'>

In [2]:
#define an aoi 
aoi = ee.FeatureCollection("USDOS/LSIB/2017").filter(ee.Filter.eq('COUNTRY_NA','Cambodia')).geometry()
    # # aoi = ee.Geometry.Polygon(
    # #     [[[105.21924736250195, 14.414700359899358],
    # #       [105.21924736250195, 12.212492266993609],
    # #       [107.62525322187695, 12.212492266993609],
    # #       [107.62525322187695, 14.414700359899358]]])


In [3]:
with open("config.yml", "r") as ymlfile:
        cfg = yaml.safe_load(ymlfile)
        cfg = parse_params(aoi,cfg) 
        # single_example = RunLTOPFull(cfg,sleep_time=30)
        # single_example.runner()

In [None]:
#run the first SNIC step 
#note that if you run this thing and the outputs already exist you will get an error in the task generation 
status1,status2 = runSNIC.generate_snic_outputs(cfg)


In the next step we run the kmeans algorithm which will automatically grab the snic outputs to do that process. This is not exposed to the user and assumes that you have specified the child and root directories where you want things to go in the params.py file. The following code block will check if the snic process is done and then when it determines that process has concluded it will execute the kmeans step. 

In [None]:
#similar to snic, if you run this and create the tasks in GEE you will hit an error if those things already exist 
km_status = kmeans_1.generate_snic_outputs(cfg)
    

In the next step we just take the kmeans output and do a stratified random sample to get one point for each cluster id in the kmeans output image. Like the previous step, this one will also check to see if the output is done before executing. 

In [None]:
#similar to snic, if you run this and create the tasks in GEE you will hit an error if those things already exist 
km_pts_status = kmeans_2.generate_tasks(cfg)


The next step is to create the abstract images. Previously to the Python implementation, these were created from a CSV which was generated in GEE and then pulled down to a local machine. The actual images were constructed in Numpy and then re-uploaded to GEE. This is a pretty inefficient process and therefore we are moving it to a GEE-assets generation type process. This is based on some code that Jack Kilbride wrote to replace the Numpy scheme and is still in testing as of 10/6/2022. 

In [None]:
#this will throw an error in GEE if these already exist 
ab_imgs_status = ab_img.create_abstract_imgs(cfg)

Next we run landtrendr on the abstract image points, using the indices generated in the previous step as inputs. The scripts for the 04 step should be prepped to handle assets as inputs and will expect an imageCollection of abstract images as well as the points that were generated in the previous step that show where the abstract image pixels are located (centroids). 

In [None]:
#in the full workflow this happens in GCS. If we're just going to test it we don't really need to go to GCS because we just want to look at the outputs. 
#note that just for testing this has been changed so it will export csvs to a HARDCODED Google Drive folder called LTOP_TESTING. It will not send things to GCS. This
#means that you need to manually inspect and download those data!!
#update the outfile name so we can generate a test output
cfg['outfile'] = cfg['outfile'][:-4] + '_param_testing_from_nb.csv'

cfg['outfile']

lt_pt_status = run_lt_pts.run_LT_abstract_imgs(cfg)

Now we incorporate the sections that were previously done in Python to accomplish the LT versions scoring. This likely needs to be amended still. Ideally, we wouldn't have to generate the giant csv and move that around for scoring. However, that is a fairly substantial lift to move all of that python code to GEE so for now it will stay as it is but this is a TODO for the future. 

In [6]:
#update the inputs and outputs in the params from the yml file to run this manually 
input_dir = "/vol/v1/proj/LTOP_FTV_Py/param_selection_testing_inputs"
outfile = '/vol/v1/proj/LTOP_FTV_Py/param_selection_testing_outputs/LTOP_param_selected_testing_revised.csv'
cfg['param_scoring_inputs'] = input_dir
cfg['outfile'] = outfile

param_scoring.generate_selected_params(cfg) 

# 	main(input_dir,njobs,outfile)

The files we are going to process are ['/vol/v1/proj/LTOP_FTV_Py/param_selection_testing_inputs/LTOP_Cambodia_200_abstractImageSample_lt_144params_NDVI_c2_selected.csv', '/vol/v1/proj/LTOP_FTV_Py/param_selection_testing_inputs/LTOP_Cambodia_200_abstractImageSample_lt_144params_NBR_c2_selected.csv', '/vol/v1/proj/LTOP_FTV_Py/param_selection_testing_inputs/LTOP_Cambodia_200_abstractImageSample_lt_144params_TCG_c2_selected.csv', '/vol/v1/proj/LTOP_FTV_Py/param_selection_testing_inputs/LTOP_Cambodia_200_abstractImageSample_lt_144params_TCW_c2_selected.csv', '/vol/v1/proj/LTOP_FTV_Py/param_selection_testing_inputs/LTOP_Cambodia_200_abstractImageSample_lt_144params_B5_c2_selected.csv']


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  these['rankVscore'] = these['vertscore'].rank(method='max')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  these['rankAICc'] = these['AICc'].rank(method='max', ascending=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  these['combined'] = (these['rankAICc']*aicWeight)+(these['rankVscore']*vSco

These looks like: 
        cluster_id                                             fitted index  \
115206           0  [4997.920706493273, 4884.958825486277, 4771.99...    B5   
115207           0  [4997.920706493273, 4884.958825486277, 4771.99...    B5   
115208           0  [4997.920706493273, 4884.958825486277, 4771.99...    B5   
115209           0  [4997.920706493273, 4884.958825486277, 4771.99...    B5   
115210           0  [4997.920706493273, 4884.958825486277, 4771.99...    B5   
115211           0  [4997.920706493273, 4884.958825486277, 4771.99...    B5   
115218           0  [4997.920706493273, 4884.958825486277, 4771.99...    B5   
115219           0  [4997.920706493273, 4884.958825486277, 4771.99...    B5   
115220           0  [4997.920706493273, 4884.958825486277, 4771.99...    B5   
115221           0  [4997.920706493273, 4884.958825486277, 4771.99...    B5   
115222           0  [4997.920706493273, 4884.958825486277, 4771.99...    B5   
115223           0  [4997.9207064

In [None]:
#test the output of the gcs file checking function 
files = ['LTOP_cambodia_selected_params.csv','LTOP_cambodia_test_selected_params','something']
test = ltop.check_multiple_gcs_files(files,'ltop_assets_storage')
test

Finally we run the last (05) step in the LTOP workflow. This step takes in the selected LT versions and it 

In [None]:
lt_vertices_status = make_bps.generate_LTOP_breakpoints(params.params)