This notebook is intended to run the full LandTrendr Optimization (LTOP) workflow implemented in Python. This tool is intended to be used to select the 'optiumum' version of the LandTrendr change detection algorithm (Kennedy et al., 2010, 2018) for different areas on a landscape. 

#### Notes
- You need to authenticate the run with a specific GEE account. In the import params statement its going to trigger the ee.Authenticate() protocal which will prompt you to a browser page to authenticate using your GEE account. 
- Subsequent scripts have just the ee.Initialize protocal because you should have already authenicated and this should only happen once. 
- Right now this is going to import the params from a separate .py file and treat those as imports. It may actually be easier/more straight forward if we're going to stick with a Jupyter Notebook to just put these directly into a cell here in the Jupyter Notebook. 
- If you import a module at the top and then change something in the exe script you will need to restart or otherwise delete variables because that change won't be reflected otherwise. 

In [9]:
import ee 
import params
import importlib
importlib.reload(params)
import time 
import pandas as pd 
import ltop 
importlib.reload(ltop)
import lt_params
from run_SNIC_01 import RunSNIC
import run_kMeans_02_1 as kmeans_1
import run_kMeans_02_2 as kmeans_2
import abstract_sampling_03 as ab_img
import abstract_imager_04 as run_lt_pts
import ltop_lt_paramater_scoring_01 as param_scoring
importlib.reload(param_scoring)
import generate_LTOP_05 as make_bps
importlib.reload(make_bps)

<module 'generate_LTOP_05' from '/vol/v1/general_files/user_files/ben/LTOP_FTV_py_revised/LTOP_FTV_Py/generate_LTOP_05.py'>

There is potentially an issue here where running a cell for a large area will result in a problem with the cell hanging. This may be an issue particuarly if you're running this thing in a browser. Its also possible that it will just start a job on the GEE server and then finish running here in the notebook. 

In order to check if a task is done we can get the task status and check to see if its done. Define a little function that can be recycled below for that purpose. 

In [None]:
def check_task_status(task_dict): 
    '''
    Input to this function should be a dictionary that is formatted 
    like the output of task.status()
    '''
    task_id = task_dict['id']
    #for some reason GEE defaults this to a list with a dictioanary as its only item
    task_status = ee.data.getTaskStatus(task_id)[0]
    
    return task_status['state']

In [None]:
#run the first SNIC step 

snic = RunSNIC(params)

status1,status2 = snic.generate_tasks()


In the next step we run the kmeans algorithm which will automatically grab the snic outputs to do that process. This is not exposed to the user and assumes that you have specified the child and root directories where you want things to go in the params.py file. The following code block will check if the snic process is done and then when it determines that process has concluded it will execute the kmeans step. 

In [None]:

#note that if you want to just run the kmeans process without having run snic 
#uncomment the following line of code
km_status = kmeans_1.generate_tasks(params.params)
while True:
    try: 
        ts_1 = check_task_status(status1) 
        ts_2 = check_task_status(status2) 
        
        if (ts_1 == 'COMPLETED') & (ts_2 == 'COMPLETED'): 
            print('The previous task is complete')
            km_status = kmeans_1.generate_tasks(params.params)
            break
        elif (ts_1 == 'FAILED') | (ts_1 == 'CANCELLED'): 
            print('The first task failed')
            break
        elif (ts_2 == 'FAILED') | (ts_2 == 'CANCELLED'): 
            print('The second task failed')
            break 
    except NameError: 
        print('You did not run the snic step so there is no status to check')
        break
    

In the next step we just take the kmeans output and do a stratified random sample to get one point for each cluster id in the kmeans output image. Like the previous step, this one will also check to see if the output is done before executing. 

In [None]:
km_pts_status = kmeans_2.generate_tasks(params.params)


The next step is to create the abstract images. Previously to the Python implementation, these were created from a CSV which was generated in GEE and then pulled down to a local machine. The actual images were constructed in Numpy and then re-uploaded to GEE. This is a pretty inefficient process and therefore we are moving it to a GEE-assets generation type process. This is based on some code that Jack Kilbride wrote to replace the Numpy scheme and is still in testing as of 10/6/2022. 

In [17]:
ab_imgs_status = ab_img.create_abstract_imgs(params.params)

In [None]:
#test to see if we can just make the runParams into something else
df = pd.DataFrame.from_records(lt_params.runParams)#.reset_index()#,index=range(len(lt_params.runParams)))

# df = df.head(10)
df['timeseries'] = None
df['timeseries'] = ee.ImageCollection([])
output = df.to_dict(orient='records')

ls = [x for x in output]
ls[0]


Next we run landtrendr on the abstract image points, using the indices generated in the previous step as inputs. The scripts for the 04 step should be prepped to handle assets as inputs and will expect an imageCollection of abstract images as well as the points that were generated in the previous step that show where the abstract image pixels are located (centroids). 

In [23]:
lt_pt_status = run_lt_pts.run_LT_abstract_imgs(params.params)

Now we incorporate the sections that were previously done in Python to accomplish the LT versions scoring. This likely needs to be amended still. Ideally, we wouldn't have to generate the giant csv and move that around for scoring. However, that is a fairly substantial lift to move all of that python code to GEE so for now it will stay as it is but this is a TODO for the future. To accomplish this task we need to: 
1. download the csv outputs somewhere. Note that this should be done programmatically but its not working correctly with permissions so this is something that we'll need to come back to and streamline. This also depends on what we do with the scoring scripts. 
2. Run the param selection script
3. re-upload the outputs to GEE
4. Then we can run the 05 script to generate the outputs

In [27]:
#note that the param scoring scripts have been combined into one script 
input_dir = "/vol/v1/general_files/user_files/ben/LTOP_FTV_py_revised/output_04_lt_runs/"
# 	startYear = 1990 
# 	endYear = 2021
# 	outfile = "/vol/v1/general_files/user_files/ben/LTOP_FTV_py_revised/selected_lt_params/selected_tc_lt_params.csv"
njobs = 8

outfile = '/vol/v1/proj/LTOP_mekong/csvs/02_param_selection/selected_param_config_gee_implementation/LTOP_Cambodia_troubleshooting_selected_LT_params_tc.csv'
param_scoring.generate_selected_params(input_dir,njobs,outfile) 

# 	main(input_dir,njobs,outfile)

The files we are going to process are ['/vol/v1/general_files/user_files/ben/LTOP_FTV_py_revised/output_04_lt_runs/LTOP_servir_comps_revised_abstractImageSample_lt_144params_NBR_c2.csv', '/vol/v1/general_files/user_files/ben/LTOP_FTV_py_revised/output_04_lt_runs/LTOP_servir_comps_revised_abstractImageSample_lt_144params_NDVI_c2.csv', '/vol/v1/general_files/user_files/ben/LTOP_FTV_py_revised/output_04_lt_runs/LTOP_servir_comps_revised_abstractImageSample_lt_144params_TCG_c2.csv', '/vol/v1/general_files/user_files/ben/LTOP_FTV_py_revised/output_04_lt_runs/LTOP_servir_comps_revised_abstractImageSample_lt_144params_TCW_c2.csv', '/vol/v1/general_files/user_files/ben/LTOP_FTV_py_revised/output_04_lt_runs/LTOP_servir_comps_revised_abstractImageSample_lt_144params_B5_c2.csv']


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  these['rankAICc'] = these['AICc'].rank(method='max', ascending=False)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the c

Done generating selected LT params file


Finally we run the last (05) step in the LTOP workflow. This step takes in the selected LT versions and it 

In [10]:
lt_vertices_status = make_bps.generate_LTOP_breakpoints(params.params)