##### The workflow continues in this notebook, where three main actions occur
* Marxan input files and directories are prepared
* the Marxan executable is called
* Summary files and plots are created from the output of the Marxan run

Some manual steps are needed to be taken by the user to set up the workflow 
components, uch as downloading the Marxan excutable and saving shapefiles and 
raster files created using other software to specifically named directories 
that were created by this workflow in the first notebook.  Requiring this sort of 
manual coordination may be addressed in the future, to provide more automation
throughout the process.

Once the setup steps have been completed, a few cells are offered to set variables 
that determine how the Marxan run will be completed. 

The bulk of the workflow occurs in the final cells of the notebook, which use loops 
both to control the Marxan analysis and also produce summary plots and tables.

The summary output can be reviewed to determine how well it meets the intended goals, 
informing the user on possible adjustments to be made in the input variables until 
optimal results are acheived.

##### The first cells import libraries, and set the working directory 

In [None]:
# Import libraries
import os
import csv
import datetime
import io
import pathlib
from pathlib import Path
import requests
import shutil
import time
from glob import glob


import contextily as cx
import earthpy as et
import earthpy.plot as ep
import geopandas as gpd
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from PIL import Image
from rasterio.crs import CRS
from rasterio.plot import plotting_extent
import rioxarray as rxr
import seaborn as sns
import subprocess

import kba_thresh_sa_scripts as ks

# set global cache override variable
CACHE_OVERRIDE = False

##### Check if 'kba_thresh_sa' directory exists
* If it does, change working directory to 'earth-analytics/data/kba_thresh_sa' 
* define paths to two directories that should have been created in first notebook
    * 'hex_shp' directory for ecosystem shapefiles
    * 'r_tif' directory for ecosystem rasters 
* verify the two directories are not empty
 
* If directories are missing or empty, prompt user to return to first notebook.

In [None]:
# Define a filepath to working directory
data_path = os.path.normpath(os.path.join(et.io.HOME, 
                                          'earth-analytics', 
                                          'data', 
                                          'kba_thresh_sa'))

# Check if 'kba_thresh_sa' directory exists.  
# If it does, make that the current working directory
if os.path.exists(data_path):
    print('Working directory set to earth-analytics/data/kba_thresh_sa.')
    os.chdir(data_path)
    
    # define a path to the hexfiles directory created in the 1st notebook
    shp_dir_path = os.path.normpath(os.path.join(data_path, 'hex_shp'))
    # check to see if it exists
    if os.path.exists(shp_dir_path):
        # check to see if it contains files
        dir = os.listdir(shp_dir_path)
        if len(dir) == 0:
            print("Empty 'hex_shp' directory, please add needed .shp files"
                  "to " + shp_data_path)
    else:
        print("'hex_shp' directory missing, please run 1st notebook "
              "before proceeding")
            
    # define a path to the raster directory created in the 1st notebook
    tif_dir_path = os.path.normpath(os.path.join(data_path, 'r_tif'))
    # check to see if it exists
    if os.path.exists(tif_dir_path):
        dir = os.listdir(tif_dir_path)
        # check to see if it contains files
        if len(dir) == 0:
            print("Empty 'r_tif' directory, please add needed rasters to " 
                  + tif_dir_path)
    else:
        print("'r_tif' directory missing, please run 1st notebook "
              "before proceeding")
else:
    print("Please go to first notebook in workflow to set up initial "
          "directories")

##### Marxan executable will need to be manually downloaded and and saved to 'earth-analytics/data/kba_thresh_sa'. 
A path to this location will be defined in the cell below; user should ensure 
the correct version is saved to the correct location with the correct name.

if using 
* Marxan v2.43 (currently preferred): save to 'earth-analytics/data/kba_thresh_sa' 
dir with filename 'Marxan_x64_243.exe'
* Marxan v4.06: save to 'earth-analytics/data/kba_thresh_sa' dir with filename 
'Marxan_x64_243.exe'

 [link to marxansolutions.org to download v4.06 and v2.43](https://marxansolutions.org/software//)


##### A bit of background on 'target2'...

Mutliple versions of Marxan have been developed since its inception in 2000, 
enabling the software to work with increasingly complex multi-species scenarios.
It appears that as development occurred in some areas, other earlier features
were no longer supported as a trade-off?  One such unsupported feature may be 
the 'target2' variable in the 'spec.dat' input file.  

This variable was set to define a minimum size of a selected conservation area, 
and allowed for multiple selections of cells (known as 'clumps') to be generated 
in a single Marxan run in order to meet a larger conservation goal.  This 
functioned nicely to define individual KBAs of a minimum size that could be added 
together to reach a larger overall conservation target.  For example 'target2' 
could be set to the IUCN KBA Threshold size for a Vulnerable ecosytem, which is 
10% of the overall ecosystem extent.  The larger overall target would be set at
30% of the total ecosystem extent. Setting the input parameters in this way would 
result in three clumps @ 10% size, which when added together would reach the 
30% conservation goal.

Multiple attempts were made this during this course to find a way to work with 
'target2' using the more recent versions of Marxan, as these versions had more 
robust user manuals and seemed like they would be easier to work with. None of
these attempts had been successful, with the Marxan output showing a succession
of zeros instead of the more favorable calculated results.  

In the end, two workarounds were considered.  One was to rework the input files 
and workflow to match the format required Marxan 1.8.10.  A second idea was to 
use while loops in the python workflow developed for Marxan v2.43/v4.06 in order 
to generate multiple marxan runs for a single test level (without using the 
'target2' variable in the 'spec.dat' input file).  This is what the current 
notebook is attempting to accomplish.

In [None]:
# Define path to Marxan.exe executable file has been manually copied over to 
# 'kba_thresh_sa' directory 

# v4.0.6 (might be causing 'target2' crash? use 2.43 instead)
marxan_path = os.path.join(data_path, "Marxan_x64.exe")

# v2.43 (spec.dat files with 'target2' field also crashed using v2.43, so it
# was decided to stop using 'target2'. Workflow currently uses v2.43 in run)
marxan_243_path = os.path.join(data_path, 'Marxan_x64_243.exe')

##### A table of information about the ecosystems is provided   

The workflow requires an associated table with information about the 
ecosystems to be analyzed. Currently our workflow uses the 'README' .csv file
provided along with the Landfire EVT 2020 raster.  This file has been manually 
uploaded to our GitHub repo as 'Assets/Data/'from_LF_EVT_2020_README.csv'. 
The code below will download that file from URL to a pandas dataframe, and then 
also save that dataframe locally as a csv. 

For testing purposes, 9 ecosystem rows were filtered out of the 856 total rows 
included in the full file.  Three ecosystems are selected for the initial test, 
and their supporting shapefiles and rasters have been uploaded to the data 
directory on GitHub in two separate subdirectories (one for shapefiles and the 
other for rasters). Lana has generated the individual the shapefiles and rasters 
for the remaining six ecosystems included in the filtered README file using 
ArcGIS, but those files have not been uploaded to GitHub due to size constraints.

All ~50 columns of the original README file were kept in the file uploaded to 
GitHub.  These provide a range of information about the 9 ecosystem records, 
including their status as determined by the IUCN Red List of Ecosystems (which 
in turn defines the minimum size for a KBA to be established). One new column was 
added to the file before it was uploaded, assigning each ecosystem a one-word 
'Short_Name' that will be used to identify the ecosystem in this workflow for 
identification and filenaming purposes (ex. 'dome' for 'South Florida Cypress 
Dome', 'dune' for 'Southwest Florida Dune and Coastal Grassland', and 'mesic' 
for 'Crowley's Ridge Mesic Loess Slope Forest').   

#### *IN THE FUTURE -* 

Currently our code will work with the ecosystem raster and hex files that Lana 
created in ArcGIS using the ArcMarxan plugin.  Ultimately we hope to work directly
with the full Landfire EVT 2020 raster, but the file is proving too large to 
effectively manage with our personal laptops. A solution may be found using the 
2016 Landfire data which has an available API (the 2020 data is scheduled to be 
published to the API later this year). Other alternative solutions for working 
with large raster files might be found using Dask, or possibly Amazon Web Services 
to access additional processing capabilities.

If/When our code can generate shp and raster files for individual ecosystems, the 
'from_LF_EVT_2020_README.csv' file saved to our GitHub assets/data directory will 
need to be updated to the full version of the LF_2020_EVT_README file, after first 
assigning 'Short_Name' values to all 856 ecosystems in the file. Once that occurs, 
we could ask for user input to get entries matching the 'Short_Name' values in 
order to select specific ecosystems from the full Landfire EVT 2020 data. That user 
input would be assigned to the list variable 'short_name_filter'. 

If short-naming all 856 ecosystems proves to be cumbersome, the user could be 
prompted to provide the 'Value' values from the LF_EVT_2020_README (ex. 7447 for 
South Florida Cypress Dome), and then be prompted to provide the one word 
'Short_Name' for each selected value. The 'Value' entry would filter the df, and the 
'Short_Name' entry would be added to the resulting dataframe as a new column.

If a user decided not to use the LF_2020_EVT_README file as their table of 
information, any .csv with the following minimum requirements could be used.

Minimum required columns:
* 'Short_Name' (the one word short name assigned to the ecosystem)
* 'RLE_FINAL' (status in IUCN Red List of Ecosystems, using 2 letter abbreviated 
   format (ex. 'CR' for Critical, 'EN' for Endangered and 'VU' for Vulnerable)
* 'US_km2' (total extent of ecosystem in km2)

This existing code could be reused if the user were prompted for a url where 
they have their table stored

Otherwise
1. Prompt user to save their table with the minimum required columns to 
'earth-analytics/data/kba_thresh_sa' as 'ecosystem_info.csv'  
2. Check for 'ecosystem_info.csv' in 'earth-analytics/data/kba_thresh_sa'  
3. If found, load to dataframe  
   If not found, prompt user to "Save ecosystm_info.csv' to 
'earth-analytics /data/kba_thresh_sa' directory, then rerun notebook" 

In [None]:
# Download the csv file stored on GitHub repository 
# (contains info on selected ecosystems taken from LF_EVT_2020_README 
# file, with an added 'Short_Name' field that is used as index)

# Provide the URL (using raw content at GitHub)
ecoinfo_url = ("https://raw.githubusercontent.com/csandberg303/"
               "kba-threshold-sensitivity-analysis/main/assets/data/"
               "from_LF_EVT_2020_README.csv")

# Create local cache overide variable
cache_override = True or CACHE_OVERRIDE

# Provide the path to local directory
ecoinfo_path = os.path.normpath(
    os.path.join(data_path, 'from_LF_EVT_2020_README.csv'))

# Create dataframe from information at provided URL
ecoinfo_df = pd.read_csv(ecoinfo_url).set_index('Short_Name')

# Check for csv in local directory and create from df if needed
if not os.path.exists(ecoinfo_path) or cache_override:

    # Read csv at URL into pandas dataframe, using 'Short_Name' col as index
    ecoinfo_df.to_csv(ecoinfo_path)
    
ecoinfo_df

##### Two columns are added to the 'ecoinfo_df' dataframe
These will generate the appropriate KBA Threshold Value for each ecosystem, which 
is 5% or 10%, depending on its IUCN RLE Status. 

In [None]:
# Add column 'Type' to assign a number of 1 or 2 based upon the value in text 
# column 'RLE_FINAL' (Type = 1 if 'CR', 'CR (CR-EN)', 'EN (CR-EN) or 'EN'; 
# Type = 2 if 'VU')

# create a list of conditions
type_conditions = [(ecoinfo_df['RLE_FINAL'] == 'CR'), 
                  (ecoinfo_df['RLE_FINAL'] == 'CR (EN-CR)'),
                  (ecoinfo_df['RLE_FINAL'] == 'EN'),
                  (ecoinfo_df['RLE_FINAL'] == 'EN (EN-CR)'),
                  (ecoinfo_df['RLE_FINAL'] == 'VU')]

# create a list of the values to assign for each condition
type_values = [1, 1, 1, 1, 2]

# create new column using np.select to assign values using lists as arguments
ecoinfo_df['Type'] = np.select(type_conditions, type_values)

# 2nd column - Add column 'Current_IUCN_TH'. Uses np.select to assign a 
# threshold percentage, based upon the column 'Type' (5% if 1, 10% if 2)

# create a list of conditions
current_threshold_conditions = [(ecoinfo_df['Type'] == 1), 
                               (ecoinfo_df['Type'] == 2)]

# create a list of the values to assign for each condition
current_threshold_values = [.05, .10]

# create new column using np.select to assign values using lists as arguments
ecoinfo_df['Current_IUCN_TH'] = np.select(
    current_threshold_conditions, current_threshold_values)

ecoinfo_df

##### The two cells below provide an opportunity for the user to edit variables to control the Marxan analysis run

The first of these cells set the values of two list variables.

The 'test_threshold' list provides the basis for the sensitivity analysis of the IUCN KBA Thresholds.  The 'test_threshold' value of 1.0  will prompt Marxan to make selections based on the current IUCN Threshold size.  A 'test_threshold' value of 0.50 will mean Marxan will look to make cell selections at 50% of the Current IUCN Threshold size.  Our workflow is currently testing the IUCN Thresholds at four levels - 1.00, 0.75, 0.50 and 0.25.

Currently the 'eco_list' variable has been hard-coded to show the three ecosystems in our initial workflow test ('dome', 'dune' and 'mesic').  

The 'ecoinfo_df' is then filtered by the values in the 'eco_list'.

In [None]:
# CREATE LISTS THAT WILL BE USED LATER IN ITERATION LOOPS

# Create list of threshold values to test
test_threshold = [1.0, 0.75, 0.50, 0.25] 

# Define list variable 'eco_list' to show the 'Short_Name' values of the 
# the ecosystems to be analyzed.  
# (NOTE: THESE INDIVIDUAL LINES CAN BE COMMENTED OUT, TO DESELECT THEM FROM 
# THE CURRENT MARXAN ANALYSIS RUN)
eco_list = [
    'dome',
    'dune',
    'mesic'
]
eco_list.sort()

# use 'eco_list' to create a new df with only matching records
eco_subset_df = ecoinfo_df.filter(items = eco_list, axis=0)

eco_subset_df

##### The second variable setting cell is where individual variable values are set

For more information on 'prop', 'spf', 'numreps', 'numitns', 'blm' and 'runmode', 
please refer to the User Manuals and Best Practices documentation available on the Marxan 
website (https://marxansolutions.org/software/) 

In [None]:
# DEFINE VARIABLES TO BE USED IN MARXAN RUN

# provide a testrun_basename (this value can be whatever the user deems 
# important (but should be kept as short as possible and include no spaces);  
# it will be appended to a timestamp when generating a directory name to 
# ensure each run will have a unique directory name.
testrun_basename = 'testrun'

# ESPG value to set as CRS for raster and shapefile
espg = '5070'

# # Set 'prop' to show the desired final proportion of ecosystem extent that 
# should be selected from Marxan (must be between 0 and 1, currently the value
# is set at 0.3 for 30% of total extent) 
prop = 0.3

# Species Penalty Factor 
spf = 1

# Number of repeat runs (or solutions) - default value = 100
numreps = 100

# Number of iterations for annealing; default value = 1000000
# (more iterations require longer processing times)
# NOTE: RUNMODE 1 & 3 did not complete successfully with numitins=10 or 1000)
numitns = 10000

# Boundary Length Modifier (default value from qmarxan code = 1)
blm = 1
    
# Runmode (determines annealing/heuristic properties to be used in Marxan run)
runmode = 1

With the variables set, the main work of the workflow can begin.  This will be done by setting up a series of iterative loops. The highest level loop will be for three ecosystems, followed by loops set for each the four threshold test values. 3 ecosystems x 4 test levels result in 12 'ecotest' loops.  

Within each of the 12 ecotest loops, there will need to be a series of internal loops to make multiple selections for a single ecotest. Each of these internal loops represents a single KBA selection. As such, each internal loop will require it's own set up of directories and input files, all in the required Marxan format.

The number of internal loops will be determined by three factors -
* the test level value from the 'test_threshold' list
* the size of the overall conservation target (expressed as a proportion of the total ecosystem extent)
* the current IUCN threshold of either 5% or 10%, which is determined by the ecosystem's 'Red List of Ecosystems' status  (seen in the 'RLE_FINAL' column of the LF_EVT_2020_README file.

##### Setting the internal loop count
The table below shows how the loop counts are set for each ecotest.  The Number of Loops required will be equal to the Number of KBAs needed to reach the Conservation Target.  

For instance, the Dome ecosystem is listed as 'Vulnerable' so the Current IUCN KBA Threshold, or minimum size required for a KBA designation, would be 10% of the overall ecosystem extent.  The test at 1.00 of the current threshold would mean three KBAs will be needed to reach the overall conservation target of 30% (3 KBAs, each @ 10% of the total ecosystem extent would equal 30% of the total ecosystem extent when combined).  The loop count for this test will be 3.

Another example would be the dune ecosystem at the 0.25 test level.  The dune ecoystem is listed as Critical, so its current IUCN KBA Threshold level is 5%.  Down at the 0.25 test level, the Target2 size when the IUCN TH is 5% is just 1.25% of the total extent. This means that 24 KBAs would be need to be identified to reach the 30% Conservation Target (24 KBAs @ 1.25% each = 30% total). This means that 24 loops will be needed for that test.

|                      	 **RLE_Status** 	|	 **IUCN TH** 	|	 **Test** 	|	 **Target2 (IUCN TH x Test)** 	|	 **Conservation Target** 	|	 **Num of Loops** 	|
|	 :--- 	|	 ----------: 	|	 -------: 	|	 ---------------------------: 	|	 ----------------------: 	|	 ----------------: 	|
|                 	 **Vulnerable (VU)** 	|        	 10% 	|    	 1.00 	|                       	 10.0% 	|                    	 30% 	|                	 3 	|
|                                    	  	|        	 10% 	|    	 0.75 	|                        	 7.5% 	|                    	 30% 	|                	 6 	|
|                                    	  	|        	 10% 	|    	 0.50 	|                        	 5.0% 	|                    	 30% 	|                	 9 	|
|                                    	  	|        	 10% 	|    	 0.25 	|                        	 2.5% 	|                    	 30% 	|               	 12 	|
|                                    	  	|           	  	|        	  	|                            	  	|                       	  	|      	 **TOTAL: 30** 	|
|	 **Critical (CR) or Endangered (EN)** 	|         	 5% 	|    	 1.00 	|                        	 5.0% 	|                    	 30% 	|                	 6 	|
|                                    	  	|         	 5% 	|    	 0.75 	|                       	 3.75% 	|                    	 30% 	|               	 12 	|
|                                    	  	|         	 5% 	|    	 0.50 	|                        	 2.5% 	|                    	 30% 	|               	 18 	|
|                                    	  	|         	 5% 	|    	 0.25 	|                       	 1.25% 	|                    	 30% 	|               	 24 	|
|                                    	  	|           	  	|        	  	|                            	  	|                       	  	|    	 **TOTAL: 60** 	|


Our list of three ecosystems to be analyzed includes one 'Vulnerable' (dome) and two 'Critical/Endangered'(dune and mesic).  This means that each full analysis run wil require 150 Marxan directories to be set up, populated and analyzed.

##### Loop through the eco_list, create directories and input files needed by Marxan.

Each time the code below runs, a new timestamped diretory is created. Inside will
be subdirectories created from the 'Short_Name' value of the selected ecosystems 
seen in the 'eco_subset' variable.

Each of these ecosystem subdirectories will have the following named 
subdirectories -
* input - where files needed by marxan analysis are stored (bound.dat, pu.dat, 
puvsp.dat, spec.dat)
* output - where files generated by marxan analysis are stored
* pu - pu and report seen in qmarxan setup (purpose tbd)
* report - pu and report seen in qmarxan setup (purpose tbd)
* source data - where the rasters and PU hex_shp files are moved to, after they
are copied from the 'r_tif' and 'hex_shp' folders

A fifth input file 'input.dat' is created and placed in the main ecosystem 
directory.

Of the five required input files, three are created in the workflow (input.dat, 
pu.dat and spec.dat).  The remaining two (bound.data and puvsp.dat) have been 
created in ArcGIS using the ArcMarxan toolbox plug in (a parallel Qmarxan plugin 
is avaiable for QGIS). In these two cases, the worflow uses a formula
to copy those files from their saved location in the GitHub repository and save
them to the appropriate input folder.

#### *IN THE FUTURE -* 
* Our project sponsor has said that the required set of Marxan input files are 
commonly prepared using GIS tools, due to the complexity of the spatial 
calculations involved.  If that is the practice we will continue, but the process
would be a pinch point without automation.  It may prove useful to learn how to 
work with python console window in QGIS, to see if the QMarxan plugin could be 
integrated into the workflow. Creating the input files programatically rather 
manually within GIS may allow for easier manipulation of the files.

##### Call Marxan Executable within each loop.  
As the workflow iterates through the loops, information in output to the
screen in order to ensure the loops are progressing logically and is using 
accurate input parameters.

As Marxan completes a run, output files are generated and saved.  Some of this
output is needed as input for the next run, so a pause is generated to allow the 
file writing process to complete before the workflow attempts to access that 
information.  Two options are provided for managing the pause, either by manually 
clicking the 'Enter' button after seeing that each of the 150 Marxan pop-up 
windows has completed execution, or by setting a sleep timer to pause for 
slightly longer than the Marxan execution will take.  Setting the timer is 
less labor-intensive, but runs the risk of wasting time if set too long, or 
generating errors if the the files if the timer is set to be too short. 

Ideally, the workflow will find the 'best_run' output file generated by Marxan, 
in order to get the list of cells that were selected in that run.  That list of 
cells is used to update two files; one is used as the new input file of the next 
run and the other will keep track of all selected cells in the ecotest so that
each selection is able to be measured individually.  Doing this will mean that 
a cell can only be selected one time for each ecotest, and the remaining loops 
will be forced to find solutions using other cells that are still available.  
As a result, each ecotest will show multiple selections that each meet a minimum 
size requirement, and when added together would reach the overall Conservation 
Target set in the 'prop' variable.

In [None]:
# ********* NEW TEST FOR WHILE LOOP *********
# USE THIS CELL FOR MARXAN v4.06 AND MARXAN v2.43 (CURRENTLY USING 2.43)

# RUN THIS CELL TO BEGIN AUTOMATED WORKFLOW 
# (1ST CELL OF TWO - BEGIN MARXAN ANALYSIS)

# checks to see if a directory based upon provided 'testrun_basename' has 
# already been made. If so, a number will be added to the end 
# 'testrun_basename' before creating new directory (so that each named 
# directory will maintain a unique ID beyond the timestamp).
testrun_basename_count = 0
testrun_basename_glob = glob(os.path.join(
    data_path, '*' + testrun_basename + '*'))
testrun_basename_count = len(testrun_basename_glob)
if testrun_basename_count>0:
    testrun_basename = (testrun_basename + 
    str(f"{(testrun_basename_count + 1):02d}"))
print('testrun_basename_count = ' + str(testrun_basename_count))
print('testrun_basename: ' + testrun_basename)

# set new directory name, based upon timestamp and provided 'testrun_basename'
new_dir = os.path.normpath(
    os.path.join(data_path, datetime.datetime.now().strftime('%Y%m%d_%H%M%S') 
                 + '_' + testrun_basename))
os.makedirs(new_dir)
print(new_dir)

# Set 'heurtype' - Determined by runmode entry in input.dat 
# if RUNMODE = 3 then use heurtype = 1 (greedy), else -1 (not used)
# (NOTE: this variable is used for RUNMODE 3 only, and currently this
# multiloop workflow is using RUNMODE 1. Keeping it in notebook in case that 
# may ever change)
if runmode == 3:
    heurtype = 1
else:
    heurtype = -1
print('runmode: ' + str(runmode) +'\nheurtype: ' + str(heurtype))
print('threshold tests: ' + str(test_threshold))

selection_loop_ls = []
test_loop_ls = []
eco_loop_ls = []

### 1ST LOOP BEGINS HERE 

# LOOP THROUGH ECOSYSTEMS (in 'ecolist') 
for eco in eco_list:
    print('\nbegin ecoloop: ' + eco)
    os.chdir(new_dir)
    # create directory for each ecosystem selected for analysis
    os.makedirs('eco_' + eco)
    os.chdir('eco_' + eco)
    eco_data_path = os.path.normpath(os.path.join(new_dir, 'eco_' + eco))
    # create 'source_data' directory to store ArcGIS shp and tif files
    # THESE WILL BE USED BY EACH NEW TEST DIRECTORY CREATED IN IN THE 'eco' 
    # DIRECTORY
    os.makedirs('source_data')
    os.chdir('source_data')
    source_data_path = (new_dir, 'eco_' + eco, 'source_data')
    # copy source files that were stored locally to the 'hex_shp' and 'r_tif' 
    # directories after running 1st notebook. Our workflow is currently  
    # using the files Lana created manually using ArcGIS
    ks.get_source_files_targetloops(os.path.join(data_path, "hex_shp"), eco)
    ks.get_source_files_targetloops(os.path.join(data_path, "r_tif"), eco)
     
    # create 'orig_input_files' directory to store the 5 .dat files) for the 
    # ecosystem so that each Marxan analysis loop will pick them up from this 
    # location (since much info for the analysis runs will remain constant as 
    # the KBA threshold size is tested) 
    os.chdir(eco_data_path)
    os.makedirs('orig_input_files')
    os.chdir('orig_input_files')
    orig_input_data_path = os.path.normpath(os.path.join(data_path, 
                                                         new_dir, 
                                                         'eco_' + eco, 
                                                         'orig_input_files'))
    
    # CREATE INPUT FILES THAT WILL REMAIN CONSTANT DESPITE TEST LEVELS 
    # (pu.dat, puvsp.dat, bound.dat).  
    
    # CREATE INITIAL PU.DAT FROM ORIGINAL FORMULA 
    # Provides a record of each planning unit hex cell in the .shp file,  
    # using a default uniform cost of '1', and  a status of '0' which 
    # indicates that unit is avaialable to Marxan for selection. As the loops 
    # continue until set proportion target of 30% overall extent is reached, 
    # this pu.dat file will be updated so that selected cells will show a 
    # status value of '3' for unavailable/locked-out.
    ks.create_pu_dat_targetloops(eco, 
                                 eco_data_path)  
    orig_pu_dat_path = os.path.normpath(os.path.join(
        orig_input_data_path, 'pu.dat'))
    pu_dat = pd.read_csv(orig_pu_dat_path)
    
    # THEN CREATE 2 ADDITIONAL DATAFRAMES BASED OFF THE INITIAL 'pu_dat' FILE

    # 1) 'pu_selected' 
    # This df will be used to keep an overall record of which cell was 
    # selected in each loop, so that each loop's selection can be seen and 
    # measured independently.
    pu_selected = pu_dat.set_index('id')
    # Set initial value of 'selection' column to 'not selected'
    pu_selected['selection'] = 'not selected'
    
    
    # 2) 'updated_pu_dat'
    # This df will be used to track selected cells as loops progress, so that 
    # those cells will be locked out of selection in future loops.
    # (THIS BECOMES THE NEW 'pu.dat' INPUT FILE IN FUTURE LOOPS)
    updated_pu_dat = pu_dat.set_index('id')

    
     
    # USE 'get_marxan_input_files' FUNCTION TO COPY IN ANY REMAINING .DAT 
    # FILES NEEDED THAT ARE CREATED IN ArcGIS/QGIS RATHER THAN PYTHON.
    # This formula currently is used for 'bound.dat' and 'puvsp.dat'.
    # Formula will copy files that have been created using ArcMarxan tool 
    # in ArcGIS then saved to the repository.
    ks.get_marxan_input_files_targetloops(eco, 
                                          ['bound.dat', 
    #                                     "pu.dat", 
                                           'puvsp.dat', 
    #                                     "spec.dat"
                                          ])
    bound_dat_path = os.path.normpath(os.path.join(
        orig_input_data_path, 'bound.dat'))
    bound_dat = pd.read_csv(bound_dat_path)
    puvsp_dat_path = os.path.normpath(os.path.join(
        orig_input_data_path, 'puvsp.dat'))
    puvsp_dat = pd.read_csv(puvsp_dat_path)
    
    # THE MAIN LOOP WILL BEGIN HERE, STARTING IN THE 'eco' DIRECTORY
    os.chdir(eco_data_path)
    
    # create empty list variable that will be used to collect summary info 
    # from loops, and set initial loop count to one
    select_summary_ls = []

    # LOOP THROUGH EACH KBA THRESHOLD SIZE TEST 
    # (these are the values set earlier in the 'test_threshold' list)
    for test in test_threshold:
        print('\n  Begin test loop: ' + eco + ' test ' + str(test))
       
        # get Current IUCN KBA Threshold for ecosystem (10% VU, 5% CR or EN)
        current_iucn_th = eco_subset_df.at[eco,'Current_IUCN_TH']
        # get ecosystem extent in km2
        us_km2 = eco_subset_df.at[eco,'US_km2']
        # convert ecosystem extent in km2 to ecosystem extent in m2
        us_m2 = us_km2 * 1000000
        # set 'target2' variable to equal the minimum size requirement (in m2) 
        # for a KBA designation within given ecosystem, at given test level
        # (target2 = ecosystem extent in m2 x Current_IUCN_TH x test level)
        target2 = us_m2 * current_iucn_th * test
        target2 = round(target2)

        # set the 'target' variable used in spec.dat file to equal 'target2'
        # (Doing this will prompt Marxan to find a selection of this size,
        # once for each time the executable is called within a loop)
        target = target2
                
        # create Scenario ID from 'eco' & 'test' variable values
        # (used will be used as prefix in filenames, so any '.' that exist in 
        # 'test' variable will be removed) 
        scen_id = (eco + str(test).translate(
            str.maketrans('', '', '.')) + '_run')
               
        print('  ' + scen_id + ': target2 (km2) = ' + str(target2/1000000))
         
        # CREATE INPUT FILES IN THE 'FOR TEST IN TEST THRESHOLD' LOOP WHEN 
        # THEY REQUIRE INFORMATION AT THE TEST LEVEL (input.dat, spec.dat)
             
        # CREATE 'input.dat' FILE USING FORMULA ADAPTED FROM 'qmarxan_toolbox' 
        # (including the 'formatAsME' format as Marxan Exponent function)
        # Some input parameters are provided to the formula, to replace the 
        # default values that were provided in the qmarxan code. 
        ks.create_input_dat(orig_input_data_path, 
                            blm, 
                            numreps, 
                            numitns, 
                            runmode, 
                            heurtype, 
                            scen_id)
        input_dat_path = os.path.normpath(os.path.join(
            orig_input_data_path, "input.dat"))
        input_dat = pd.read_csv(input_dat_path)
        
        # CREATE THE 'spec.dat' FILE FROM v4 FORMULA (includes 'target' only)
        os.chdir(orig_input_data_path)
        ks.create_spec_dat_v4_targetloops(eco_subset_df, eco, target, spf)
        spec_dat_path = os.path.normpath(os.path.join(
            orig_input_data_path, 'spec.dat'))

        # Print initial info statement for test loop and begin creating the 
        # needed directories
        os.chdir(eco_data_path)
        os.makedirs(scen_id) 
        ecotest_data_path = os.path.normpath(os.path.join(data_path, new_dir, 
                                                          'eco_' + eco, 
                                                          scen_id))
        # create path for 'pu_selected' file in 'scen_id' directory
        # (this file will store information from the 'pu_selected' df created
        # earlier, which allows for an overall record to be kept showing which 
        # cell was selected in each loop of a test, so that each selection can 
        # be seen and measured independently)
        pu_selected_path = os.path.normpath(os.path.join(
            new_dir, 
            'eco_' + eco,
            scen_id,
            scen_id + '_pu_selected.csv'))
        
        # create path for 'updated_pu_dat' file in 'scen_id' directory
        updated_pu_dat_path = os.path.normpath(os.path.join(
            new_dir, 
            'eco_' + eco, 
            scen_id, 
            scen_id + '_updated_pu.dat'))

        # SET 'end_count' VALUE TO END MULTILOOP
        # Based upon given 'prop' value of 30% and 'Current_IUCN_TH' 
        # 1st ex: if prop = 30% & eco is VU(KBA 10%); then 3 x 10% KBA = 30% 
        # end_count = 3 (@ 1.0 test), 6 @ 0.50 test and 12 @ 0.25 test
        # 2nd ex: if prop = 30% & eco is CR/EN(KBA 5%); then 6 x 5% KBA = 30% 
        # end_count = 6 (@ 1.0 test), 12 @ 0.50 test and 24 @ 0.25 test
        end_count = round(prop/(current_iucn_th * test))
        print('  ' + scen_id + ': end count = ' + str(end_count) + '\n')
        
        # initialize 'loop_count' variable to 1
        loop_count = 1
        
        # BEGIN MULTILOOP FOR EACH TEST IN EACH ECOSYSTEM DIRECTORY        
        while loop_count <= end_count:
            # create string of loop count, to allow for alphabetization of 
            # the resulting filenames
            loop_count_str = 'loop_' + str(f"{loop_count:02d}")
            print('    ' + scen_id + " " + loop_count_str + ' | end_count = ' 
                  + str(end_count))
            
            # begin loop in the 'ecotest' directory (ex. 'mesic025')
            os.chdir(ecotest_data_path)
            # create directory for loop, to store Marxan input/output files
            os.makedirs(loop_count_str)
            loop_count_path = os.path.normcase(os.path.join(
                ecotest_data_path, loop_count_str))
            os.chdir(loop_count_path)
            
            # COPY IN THE INPUT FILE FROM 'orig_input_files' DIRECTORY
            shutil.copy(input_dat_path, os.getcwd())

            # CREATE INPUT DIRECTORY
            # which is where the four remaining .dat files will be stored
            os.makedirs('input')
            eco_input_data_path = os.path.normpath(os.path.join(
                ecotest_data_path, loop_count_str, 'input'))
            os.chdir(eco_input_data_path)
            
            # COPY IN THE 3 DAT FILES THAT WILL REMAIN UNCHANGED AS LOOPCOUNT
            # PROGRESSES (bound.dat, puvsp.dat and spec.dat) 
            unchanged_dat_files = (bound_dat_path, 
                                   puvsp_dat_path, 
                                   spec_dat_path)
            for file in unchanged_dat_files:
                shutil.copy(file, os.getcwd())
        
            # GET APPROPRIATE 'pu.dat' FILE FOR LOOP_COUNT
            # This is where the 'loop_count' variable determines if the  
            # original 'pu_dat' file should be used (if loop 1), or if the 
            # 'updated_pu_dat' file generated from the previous loops should  
            # be used (for loops 2-End) 
            if loop_count == 1:
                shutil.copy(orig_pu_dat_path, os.getcwd())
                pu_dat = pd.read_csv(orig_pu_dat_path)
            else:
                shutil.copy(updated_pu_dat_path, os.getcwd())
                os.rename(scen_id + '_updated_pu.dat','pu.dat')
            
            # create remaining directories
            os.chdir(loop_count_path)
            os.makedirs('output')
            os.makedirs('report')
            os.makedirs('pu')      

            # BEGIN MARXAN ANALYSIS RUN
            print('    ' + scen_id + " " + loop_count_str + 
                  ': MARXAN ANALYSIS INITIATED')   
            # call on marxan executable (currently using v2.43)
            os.startfile(marxan_243_path)
            
            # DEFINE A PAUSE FOR MARXAN EXECUTION, USING ONE OF TWO METHODS
            # This is needed to allow Marxan time to finish writing 
            # output files before the workflow tries to locate them            
            # NOTE: ONE OPTION MUST BE COMMENTED OUT BEFORE RUNNING THE CELL
            # SO THAT IT WILL BE IGNORED DURING WORKFLOW 
            # (To do this, highlight rows of the option that should not be 
            # used, and hit 'CTRL + /' to toggle that selection to show green 
            # text lines beginning with #.  Pause Option 1 is currently 
            # inactive, and the the sleep timer has been set for 30 seconds)

#             # PAUSE OPTION 1: Hit Enter to Continue 
#             # Wait for Marxan pop-up execution to complete, then press 'Enter'
#             # at prompt in screen output window after 'The End' is seen
#             # (overall quickest, but requires attention)
#             def pause():
#                 programPause = input("Press the <ENTER> key to continue...")
#             pause()
#             print('Wait to see 'The End' at bottom of Marxan execution '
#                  'pop-up before pressing Enter')

            # PAUSE OPTION 2: Set sleep timer length
            # Define a sleep timer so that Python will simply count down that 
            # number of seconds before moving on. Need to ensure that the 
            # sleep time set > filewriting/execution, or errors in reading 
            # output files will occur
            # (automates the workflow, but takes longer time overall)
            # set 'sleeptime' variable for length of pause in seconds
            # (for my system, I use 20 seconds when numitns = 1000000 and 
            # 2 seconds when numitns = 10000)
            sleeptime = 5
            print('    time.sleep(' + str(sleeptime) + ') applied to pause '
                  'workflow execution for ' + str(sleeptime) + ' seconds '
                  'while Marxan output files are written')
            time.sleep(sleeptime) 
          
          
            # WHEN MARXAN COMPLETES, GET BEST RUN SOLUTION AND USE IT'S 
            # 'SOLUTION' COLUMN VALUES TO UPDATE THE 'pu_selected' AND 
            # 'updated_pu_dat' dataframes

            # First check for output files, to see if the run had errors
            # open '_best' file created by Marxan and saved to 'output' dir
            globfile_best = glob(os.path.normpath(os.path.join(
                ecotest_data_path, loop_count_str, 'output', '*_best.csv')))            
            # if no file is found, print error message to screen
            if globfile_best == []:
                output = print (scen_id + ": ERROR: 'pu_selected' file not "
                                'found - check output/log. \nWill need to '
                                'resolve error and rerun Marxan if final '
                                "output files haven't completed successfully")  
            else:
                # Create list of selected cells from 'best_run' output file
                best_run_file = pd.read_csv(globfile_best[0])
                selected_df = best_run_file[best_run_file['SOLUTION'] == 1]
                selected_cells = selected_df['PUID'].tolist()
                print('    ', selected_cells)
                for puid in selected_cells:
                    # Update the status of those cells in 'pu_selected' df's 
                    # 'selection' column, to show in which run they were 
                    # selected
                    pu_selected.at[puid, 'selection'] = ('Select_' + 
                                                         loop_count_str)
                    # Update the status of those cells in 'updated_pu_dat' 
                    # df's'status' column, from '0'-available to 
                    # '3'-unavailable/locked-out
                    updated_pu_dat.at[puid,'status']=3   
                    
                # save updated 'updated_pu_dat' file for next loop
                updated_pu_dat.to_csv(updated_pu_dat_path)
        
            # add +1 to 'loop_count', and continue 'while' loop
            loop_count = loop_count+1
            
        
            print('    ' + scen_id + loop_count_str + '  ' + "start of " + 
                  scen_id + loop_count_str + " end actions")
            # add additional summary info to 'pu_selected' df
            # THIS SHOULD OCCUR AT END OF ECOTEST, NOT AT END OF LOOP
            pu_selected['dir_path'] = new_dir
            pu_selected['Short_Name'] = eco               
            pu_selected['current_test_level'] = test
        #         pu_selected['test_loop'] = test_loop_str
        #                 pu_selected['lc = selection? clump from lc']
            pu_selected['Current_IUCN_TH'] = current_iucn_th
            pu_selected['US_km2'] = us_km2
            pu_selected['US_m2'] = us_m2
            pu_selected['30% of US_m2'] = us_m2*prop
            pu_selected['KBA @ current test (m_2)'] = target2
            pu_selected['KBA @ current test (km_2)'] = target2/1000000
            pu_selected['BLM'] = blm
            pu_selected['SPF'] = spf
            
            # Append 'pu_selected' to 'selection_loop_ls' list, so that all 
            # information from teh selection loop will be incorporated into 
            # the '_initial_loop_summary.csv'
            selection_loop_ls.append(pu_selected)
            print('    ' + scen_id + loop_count_str + 
                  " selection loop info appended to to selection_loop_ls\n")
        
        # Concatenate 'selection_loop_ls and append 'sel_loop_df' to 
        # 'test_loop_ls' list, so that all information from the test loop
        # will be incorporated into the '_initial_loop_summary.csv'
        sel_loop_df = pd.concat(selection_loop_ls)
        test_loop_ls.append(sel_loop_df)
        print('' + scen_id + '  ' + 
              "test loop info appended to to test_loop_ls")
                 

        # Once 'end_count' is reached ('loop_count' = 'end_count'), save the 
        # 'pu_selected' df as a .csv file 
        pu_selected.to_csv(pu_selected_path) 

        print('' + scen_id + ' info from ' + scen_id  
              + ' will be added to final summary\nEnd testloop: ' 
              + scen_id + '\n')

        
    # Concatenate 'test_loop_ls and append 'test_loop_df' to 
    # 'eco_loop_ls' list, so that all information from the eco loop
    # will be incorporated into the '_initial_loop_summary.csv'
    test_loop_df = pd.concat(test_loop_ls)
    eco_loop_ls.append(test_loop_df)
    print(eco + '  ' + "eco loop info appended to to eco_loop_ls\nEnd of " + 
          eco + ' loop\n')

# Concatenate 'eco_loop_ls' and save as 'testrun_basename_df', and save  
# information from all ecosytem, test and selection loops to 
# 'Initial Loop Summary.csv' 
testrun_basename_df = pd.concat(eco_loop_ls)
testrun_basename_df.to_csv(os.path.normpath(os.path.join(
    new_dir, testrun_basename + '_initial_loop_summary.csv')))
print(testrun_basename + '  ' + 'all info s/b concat into ' + 
      testrun_basename + '_initial_loop_summary.csv')
    
print(scen_id + 'End of ' + testrun_basename + 'initial workflow loop\n')
      
# # save info from loop stored in 'select_summary_ls' to '_final_summary.csv'
# final_summary_df = pd.concat(select_summary_ls)

# final_summary_df.to_csv(os.path.normpath(
# os.path.join(new_dir, testrun_basename + 'final_summary.csv')), index=False)
      
# print(testrun_basename + "'final_summary.csv' saved to " + new_dir)
        
os.getcwd()
print('\nloop completed successfully')


##### inserting line to break execution betweeen workflow loops
This will allow time to review the screen output to see if any issues would prevent 
successful execution of the second looping cell which collects summary information and 
generates plots. 

(To continue with execution of the final cell, click the arrow left of cell)

In [None]:
break_here_for_pause

##### Final cell will loop through the output to generate plots and summary csvs
A plot will be made for each of the 12 ecotest loops, showing the selections made in the 
'best_run' solution. A heatmap plot will also be generated for each ecosystem, which shows
each hexcell's proportion of the total ecosystem extent.  These 15 plots are saved 
individually as .png files, and also combined into a single 'combined plots' pdf.

#### *IN THE FUTURE -* 

**Most importantly, I've noticed how the selection plots only seem to work for the 
test @ 1.00, as other test levels will show an overall selection of more than 30% 
total extent.  I trust there's something in the calculations or input files that 
is off and can be easily corrected once discovered.  I've been focusing on the 
looping logic recently and have just begun to review the results of those loops.  
I would also like to work with the Legend of the selection plot, so that it shows 
outside the map area and can possibly include the size in (km2) of each selected 
area.**

Work still needs to be done to  see if adjusting any of the input variables will
help to achieve improved results in how Marxan makes it's selections.  As it 
currently stands, Marxan is not consistently finding cohesive clumps with each 
internal loop.  A selection may be the right size, but when the information is 
plotted it shows as multiple parts, rather than one single connected shape.  This 
is an issue since if the selection is not connected as a single unit of a minimum 
size, it can not be designated as a KBA as it is not recognized as a managable 
conservation area. If no solution for this problem can be found by adjusting the 
input parameters, perhaps the output files can be reviewed to see if they provide 
any markers or flags to indicate which selections are fully connected.  One thought
is to look at the geometry columns of the selections, to see if they may identify
which selections appear to be grouped in close proximity and which are spread out
across a distance.  If such a solution can be found, then any solutions that don't 
show these markers can be filtered out.

More analysis of the final summary is needed, to delve into the details and 
provide summary pivots what is most useful. For instance, the selection plots 
would benefit by showing the measured area of each selection to help validate 
whether or not the Marxan run has found a viable solution.  

In recent weeks there was a change in direction with this project, abandoning the 
attempt of getting the 'target2' variable to function as it's described and instead 
trying  to replicate it's effect by using loops.  While its been very rewarding to 
begin to see multiple selections on a single plot, the original looping logic was 
altered significantly in the process which caused havoc with the csv summaries. 
Marxan provides a voluminous amount of information in its output files, beyond 
what has been collected in the loops and saved to the final summary csvs created
in this workflow.  The challenge of how to put it to best use can be significant.  
The Marxan output files need a closer review, and the workflow-generated .csv 
files also need to be reviewed/validated to be sure they are showing the best 
available information.  Ideally the workflow csv created in each of the looping
cells will be combined into a single file.

This is certainaly a work in progress, rich with opportuntiies for refinements 
both in the workflow itself with the interplay of QGIS and Python, the output 
summaries and plots, and also the review of the output to determine if changes to 
the input parameters could be beneficial in achieving better results. It's a bit 
unfortunate that the learning curve of this project did not dovetail neatly with 
the required timeline of the Earth Analytics Summer 2022 course. There have been 
significant gains despite the tradeoffs in the past few weeks. I will continue 
to work on this project after class ends, and am excited to see where it may go.



In [None]:
# 2ND PART OF CURRENT WORKFLOW - MARXAN 2.43 or 4.06 ONLY

# THIS WILL MERGE INFORMATION FROM THE 'pu_selected' FILE CREATED IN 1ST 
# WORKFLOW TO THE HEX SHAPEFILE, IN ORDER TO BE ABLE TO SHOW WHICH HEXES WERE 
# SELECTED.  THEN THE 'puvsp.dat' INPUT FILE WILL BE MERGED TO THE SHAPEFILE, 
# TO PROVIDE THE AMOUNT OF ECOSYSTEM (in m2) CONTAINED IN EACH HEX.  THIS 
# ALLOWS THE AREA OF THE SELECTION TO BE MEASURED AND PLOTTED, TO DETERMINE 
# IF THE SOLUTION MEETS REQUIREMENTS.  SUMMARY INFORMATION FROM THE RUNS WILL 
# ALSO BE COLLECTED AND SAVED TO A .CSV FILE FOR FURTHER ANALYSIS.

# Create empty lists outside the loop to store information:
# for plot images 
plot_im_list = []

# for info from shapefile, after its merged with 'puvsp.dat' and 'pu_selected'
test_loop_ls = []


os.chdir(new_dir)
print('\nnew_dir ' + new_dir)

# Define 'ecotestdirs' glob list, to find all directories ending in '*_run' 
# (ex. 'dome025_run')
ecotestdirs = sorted(glob(os.path.join(data_path, '*' + testrun_basename, '*', 
                                       '*_run')))
print('ecotestdirs include ' + str(len(ecotestdirs)) + ' directories')
print('ecotest_data_path ' + ecotest_data_path)

count = 0

for ecotestdir in ecotestdirs:

    scen_id = os.path.split(ecotestdirs[count])[1]
    print('\nBegin loop for scen_id: ' + scen_id)
    os.chdir(ecotestdir)
    print(ecotestdir)
    get_eco = (''.join([i for i in scen_id if not i.isdigit()]))
    eco = get_eco.replace(get_eco[(len(get_eco)-4):], '')
    eco_data_path = os.path.normpath(os.path.join(new_dir, 'eco_' + eco))
    ecotest_data_path = os.path.normpath(os.path.join(new_dir, 'eco_' + eco, 
                                                      scen_id))
    
    # create 'selected_plot' dir if it doesn't already exist
    # (this is where the plots showing selected hexes will be stored)
    selected_plot_dir_path = os.path.normpath(os.path.join(ecotestdir, 
                                                          'selected_plot'))
    if os.path.isdir(selected_plot_dir_path):
        os.chdir(selected_plot_dir_path)
    else:
        os.makedirs('selected_plot')
        os.chdir('selected_plot')
    print('cwd ' + os.getcwd())
   
    # try to open 'pu_selected' file created in first workflow loop cell 
    globfile_selected = glob(os.path.normpath(
        os.path.join(ecotestdir, '*pu_selected*')))
    print('globfile_selected contains: ' + globfile_selected[0])
    if globfile_selected == []:
        output = print (scen_id + ": ERROR: 'pu_selected' file not found")
    else:
        # If found, merge 'pu_selected' with reprojected copy of the shp file
        # check if reprojected shp in 'source_data' already exists; create if
        # not found
        shp_layer_crs_path = os.path.normpath(os.path.join(
            new_dir, 
            'eco_' + eco, 
            'source_data', 
            eco + "_espg_" + espg +'.shp'))
        print('shp_layer_crs_path: ' + shp_layer_crs_path)
        print('eco: ' + eco)
        if glob(os.path.normpath(os.path.join(
            new_dir, '*', 'source_data', eco + "_espg_" + espg +'.shp'))):
            print('reprojected shp file check = PASS')
        else:
            # open the original shp file from 'eco_data_path/source_data' 
            orig_shp_data_path = glob(os.path.join(new_dir, '*', 
                                                   "source_data", 
                                                   eco + '.shp'))[0]
            print('\n reprojecting source shapefile;\norig_shp_data_path ' 
                  + orig_shp_data_path)
            orig_shp_layer = gpd.read_file(orig_shp_data_path)
            # reproject CRS of shp
            shp_layer_crs = orig_shp_layer.to_crs(epsg=espg)
            # create new .shp file
            shp_layer_crs.to_file(shp_layer_crs_path, index=False)

        # open reprojected shp layer and prepare to merge with other files
        merged_shp = gpd.read_file(shp_layer_crs_path)
        # merge reprojected shp file with 'pu_selected' & 'puvsp_dat' dfs                
        # add 'id' index to enable merge with other files
        merged_shp.insert(0, 'id', range(1, 1 + len(merged_shp)))
        merged_shp.set_index('id')
        # get 'pu_selected' file from 'globfile_selected' list
        pu_selected_path = globfile_selected[0]
        pu_selected = pd.read_csv(pu_selected_path).set_index('id')
        # merge 'pu_selected' to shp layer (adds 'select' column, & more *)
        merged_shp = merged_shp.merge(pu_selected, on='id')
        # open 'puvsp.dat' from input directory
        puvsp_path = glob(os.path.normpath(os.path.join(ecotestdir, '*', 
                                                        'input', 
                                                        'puvsp.dat')))[0]
        # merge with shp layer to get 'amount' from puvsp
        puvsp_dat = pd.read_csv(puvsp_path)
        puvsp_dat = puvsp_dat.rename(columns={'pu': 'id'}).set_index('id')
        merged_shp = merged_shp.merge(puvsp_dat, on='id')
        
        # use 'amount' value to calculate 'percent_of_total' 
        # (the proportion of total ecosystem extent found in each hexcell)
        merged_shp['percent_of_total'] = (
            merged_shp['amount']/merged_shp['US_m2'])
#
        # save merged shapefile as new file
        # check if file already exists, if not create it
        merged_shp_layer_path = os.path.normpath(os.path.join(
            ecotest_data_path,
            'selected_plot', 
            scen_id + "_merged.shp"))
        if os.path.exists(merged_shp_layer_path):
            print(scen_id + "_merged.shp file check = PASS")
        else:
            # save merged shp with add'l 'selected' info as new shape file
            # THIS WOULD BE THE TIME TO CHECK FOR COLUMN NAMES >10 CHARS
            merged_shp.to_file(merged_shp_layer_path, index=False)
            print (scen_id + ': ' + eco + '.shp merged with ' + scen_id + 
                   "'pu_selected' and 'puvsp.dat', saved as " + scen_id +
                   "merged.shp") 
        # verify shp file exists, and print update to screen
        if os.path.exists(merged_shp_layer_path):
            print(scen_id + " : merged shapefile saved to 'selected_plot' "
                  "directory") 
        else:
            print(scen_id + (": Error: reprojected shapefile was not able"
               " to be saved"))

        # check if reprojected tif in 'source_data' exists, if not create it
        tif_layer_crs_path = os.path.normpath(os.path.join(
            new_dir, 'eco_' + eco, 'source_data', 
            eco + "_espg_" + espg +'.tif'))
        if os.path.exists(tif_layer_crs_path):
            print('reprojected tif file check = PASS')
        else:
            # open the tif file saved at 'eco_data_path/source_data' location
            tif_data_path = os.path.join(new_dir, 'eco_' + eco, 'source_data', 
                                         eco + '.tif')
            tif_layer = rxr.open_rasterio(tif_data_path, 
                                          masked=True).squeeze()
            # reproject CRS of tif - 
            # first create a rasterio crs object
            crs_espg = CRS.from_string('EPSG:' + espg)
            # then reproject tif using the crs object
            tif_layer_crs = tif_layer.rio.reproject(crs_espg)
            # create new .tif file
            tif_layer_crs.rio.to_raster(tif_layer_crs_path)
            # verify tif file exits, and print update to screen
            if os.path.exists(tif_layer_crs_path):
                print(scen_id + ': Raster reprojected to ESPG: ' + espg + 
                      " and saved to 'source_data' directory")
            else:
                print(scen_id + (": Error: reprojected raster was not "
                                 "able to be saved"))

        # get data from merged shp, to include in 'final_summary.csv'  
        merged_shp_df = merged_shp[['id', 
                                    'amount',
                                    'percent_of_total',
                                    'selection',
                                    'Short_Name',
                                    'Current_IUCN_TH',
                                    'current_test_level',
                                    'KBA @ current test (m_2)',
                                    'US_km2',
                                    'US_m2',
                                    '30% of US_m2',
                                    'BLM',
                                    'SPF',
                                    'dir_path',]].copy()
        
        merged_shp_df['amount (km_2)'] = merged_shp_df['amount']/1000000   
        
        merged_shp_df['KBA @ current test (km_2)'] = (
        merged_shp_df['KBA @ current test (m_2)']/1000000)     
        
        merged_shp_df.set_index('id')
        
        # Append 'merged_shp_df' to 'test_loop_ls' list, so that all 
        # information from the test loop will be incorporated into the 
        # '_second_loop_summary.csv'
        test_loop_ls.append(merged_shp_df)
        print('' + scen_id + '  ' + 
              "test loop info appended to to test_loop_ls")

        ###    

        # CREATE PLOT SHOWING MULITPLE LOOP'S SELECTIONS OVER THE RASTER
        # * VISUALIZATIONS SHOWING HEXCELL SELECTION FROM BEST RUN AND 
        # HEATMAP OF HEXCELL EXTENT AS A PROPORTION OF TOTAL EXTENT
        # solution, and save each as a .png image file
        print ('preparing plots...')

        # define raster extent for plotting
        raster_extent = plotting_extent(tif_layer_crs,
                                         tif_layer_crs.rio.transform())
        
        # get metrics to include in figtitle
        # total amount (m2) of ecosystem included in selection
        selected_m = merged_shp.query(
            "selection!='not selected'")['amount'].sum()
        selected_km = selected_m/1000000
        selected_m_string = str("{:,.2f}".format(selected_m))
        selected_km_string = str("{:,.2f}".format(selected_km))

        # get total extent of ecosystem (from the amount column, in puvsp.dat)
        eco_extent_km = eco_subset_df.at[eco,'US_km2']
        eco_extent_m = eco_extent_km * 1000000
        eco_extent_km_string = str("{:,.2f}".format(eco_extent_km))
        eco_extent_m_string = str("{:,.2f}".format(eco_extent_m))
        
        # get Conservation Target value (currently 30% x total extent)
        conserv_tgt_km = prop * eco_extent_km
        conserv_tgt_km_string = str("{:.2f}".format(conserv_tgt_km))
               
        # get selected proporion of total
        selected_prop = selected_km / eco_extent_km
        sel_prop_string = str("{:.2%}".format(selected_prop))
        
        # set target2 value as string, for inclusion in figure title
#         us_m2 = eco_subset_df.at[eco,'US_km2']*1000000
        test_level = merged_shp_df['current_test_level'].mean()
        test_level_string = str("{:.0%}".format(test_level))
        current_iucn_th = eco_subset_df.at[eco,'Current_IUCN_TH']
        target2_m = (test_level * current_iucn_th * eco_extent_m).mean()
        target2_km = target2_m/1000000
        target2_m_string = str("{:,.2f}".format(target2_m))
        target2_km_string = str("{:,.2f}".format(target2_km))
        
        # print figure title info to screen for validation
        print('selected_km: ' + str(selected_km) + 
              '\neco_extent_km: ' + str(eco_extent_km) + 
              '\nconserv_tgt_km: ' + str(conserv_tgt_km) +
              '\nselected_prop: ' + str(selected_prop) +
              '\ntarget2_km: ' + str(target2_km) + 
              '\ntarget2_m: ' + str(target2_m) + 
              '\ntest_level: ' + str(test_level) +
              '\ncurrent_iucn_th: ' + str(current_iucn_th) +
              '\neco_extent_m: ' + str(eco_extent_m))
    
        # create strings for individual lines in figtitle
        ft1 = (scen_id.upper() + " - Searching for KBA @ " + test_level_string 
               + " Current IUCN Value\n")
        ft2 = ("Total Ecosytem Extent: " + eco_extent_km_string + " sq km\n")
        ft3 = ("Conservation Target: " + conserv_tgt_km_string + " sq km\n")
        ft3 = ("KBA target size: " + target2_km_string + " sq km\n")
        ft4 = ("Total Selected Ecosystem " + selected_km_string + "sq km\n(" +
               sel_prop_string + " of Total Extent)")
        
        selected_title_txt = ft1 + ft2 + ft3 + ft4
        
        # PLOT (3 LAYERS) - BEST SELECTION, RASTER, AND BASEMAP
        fig, ax = plt.subplots(figsize=(10, 10))
        merged_shp.plot(column='selection',
                        cmap='nipy_spectral_r', # orig used viridis 
                        ax=ax, 
                        alpha=0.50, 
                        legend=True)
                          
        ax.set(title=selected_title_txt)
        ax.axes.xaxis.set_visible(False)
        ax.axes.yaxis.set_visible(False)
        ax.patch.set_edgecolor('black')
        cx.add_basemap(ax=ax, crs=shp_layer_crs.crs)
        ax.imshow(tif_layer_crs, cmap='jet', extent=raster_extent, 
                  interpolation='nearest')
        plt.savefig((scen_id + '_pu_selections_over_raster.png'), 
                    facecolor='w', edgecolor='k', dpi=600)
        plt.close(fig)
        print(scen_id + ": _pu_selections_over_raster saved as .png\n")

        # convert 'selected' plot to image and add to 'plot_im_list', so 
        # that it'll be included in final pdf of plot images
        plot_im = glob(os.path.normpath(os.path.join(
            os.getcwd(), scen_id + "_pu_selections_over_raster.png")))
        plot_im = Image.open(plot_im[0])
        plot_im = plot_im.convert('RGB')
        plot_im_list.append(plot_im) 
        print("\nand will be included in 'final_plots.pdf'")
        
        count = count+1

# loop through ecosystem info another time to create the heatmap plots (which
# contain ecosystem level information only, this will not show any information
# from the Marxan analysis runs)
for eco in eco_list:
    print('\nbegin loop for ' + eco)
    eco_data_path = os.path.normpath(os.path.join(new_dir, 'eco_' + eco))

    os.chdir(eco_data_path)
    
    # define paths needed to get correct info for eco plot (no test level 
    # data needed)
    merged_shp_layer_path = glob(os.path.normpath(os.path.join(
        '*','selected_plot', '*_merged.shp')))[0]   
    merged_shp = gpd.read_file(merged_shp_layer_path)
    
    # define raster extent for plotting
    tif_layer_crs_path = os.path.normpath(os.path.join(
            os.getcwd(), 'source_data', eco + "_espg_" + espg +'.tif'))

    tif_layer_crs = rxr.open_rasterio(tif_layer_crs_path, 
                                      masked=True).squeeze()
    raster_extent = plotting_extent(tif_layer_crs,
                                    tif_layer_crs.rio.transform())
    
    # PLOT (3 LAYERS) - 'PERCENT_OF_TOTAL', RASTER, AND BASEMAP
    fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(10, 10))
    fig.suptitle(t=("Percentage of the Total Ecosystem Extent "
                 "Containined in Each Hexcell\n"))
    plt.subplots_adjust(hspace=0.5)
    merged_shp.plot(column='percent_of', 
                    cmap='RdYlGn', 
                    ax=ax, 
                    alpha=0.65, 
                    legend=True)
#                     legend_kwds={'loc': 'middle left'})
    ax.set(title=(eco + ": Percentage of Total Ecosystem Extent "
                  "Containined in Each Hexcell\n"))
    cx.add_basemap(ax=ax, crs=merged_shp.crs, 
                    source=cx.providers.CartoDB.Positron) 
    ax.imshow(tif_layer_crs, cmap='jet', extent=raster_extent,
      interpolation='nearest')
    ax.axes.xaxis.set_visible(False)
    ax.axes.yaxis.set_visible(False)
    ax.set_title(eco.upper())
    plt.savefig((eco + '_hexcell_as%_total_extent.png'), facecolor='w', 
                edgecolor='k', dpi=600)
    print(eco + "_hexcell_as%_total_extent saved as .png'\n")
    plt.close(fig)
    
    # convert '% of total' plot to image and add to 'plot_im_list', so 
    # that it'll be included in final pdf of plot images
    plot_im = glob(os.path.normpath(os.path.join(
    os.getcwd(), eco + "_hexcell_as%_total_extent.png")))
    plot_im = Image.open(plot_im[0])
    plot_im = plot_im.convert('RGB')
    plot_im_list.append(plot_im) 
    print("\nand will be included in 'final_plots.pdf'")

# combine all dfs stored in the 'test_loop_ls' list into one pandas dataframe,
# then save that dataframe as '_shp_summary.csv'
final_shp_summary_df = pd.concat(test_loop_ls)
final_shp_summary_df.to_csv(os.path.normpath(os.path.join(
    new_dir, testrun_basename + '_shp_summary.csv')), index=False)
print("\n" + testrun_basename + "'_shp_summary.csv' saved to " + new_dir)

# save plot images to pdf (*CURRENTLY DUPLICATES THE 1ST IMAGE, NEEDS FIX)
plots_pdf_path = os.path.normpath(os.path.join(new_dir, 
                                               'combined_plots.pdf'))
plot_im_list[0].save(plots_pdf_path, save_all=True, 
                     append_images=plot_im_list)

print("\n'final_plots.pdf' saved to " + new_dir)

print('\nloop completed successfully')