# About

This notebook extracts spectral and date features from NAIP images at random points within polygons in the 'polygons_form_naip_images' folder. These polygons are known iceplant and non iceplant locations within a specific NAIP image. 


Once the area of interest and years are specified, the notebook samples first polygons labeled as iceplant locations and then polygons labeled as non-iceplant locations. Three methods for sampling polygons are implemented in the `sample_rasters.py` module, all can be used in this notebook. These are:

- sample a fixed fraction of pixels in each polygon,
- sample a fixed fraction of the pixels in each polygon up to a maximum number of points, and
- sample a fixed number of points from each polygon. 

Polygons vary significantly in size, so it is important to take into account that  sampling a fraction of the pixels in each polygon will likely result in an over-sampling of bigger polygons (most often those corresponding to non-iceplant locations), which in turn would greatly unbalance the training set towards one label. 

**NOTEBOOK VARIABLES:**

- `aois` (array): These are the areas of interest where we collected the polygons we want to sample. Must be a subset of: `['campus_lagoon','carpinteria','gaviota','capitan','point_conception']`. 

- `years` (array): can be any subset of `[2012, 2014, 2016, 2018, 2020]`. If aoi = 'point_conception', then 2016 will not be included in the outcome since there are no NAIP images to sample from that year. 

- `ice_param` and `nonice_param` (str): determines which sampling method will be used to create the samples from the iceplant polygons and non-iceplant polygons, respectively. Must be one of 'fraction', 'sliding' or 'constant'. 

- `sample_fraction` (float in (0,1]): fraction of points to sample from each polygon

- `max_sample` (int): maximum number of points to sample from a polygon

- `const_sample` (int): constant number of points to sample from each polygon

- `verbose` (bool): whether to print the stats of how many points were sampled per year and area of interest as the notebook runs

- `save_stats` (bool): whether to save as a csv file the stats of how many points were sampled from each year and area of interest

- `stats_csv_name` (str): where to save the stats from sampled points in the form "file_name.csv"

- `save_pts` (bool): whether to save points as a csv file in temp folder  


**OUTPUT:**

The output is a data frame of points with the following features:

- x, y: coordinates of point *p* 
- pts_crs: CRS of coordinates x, y
- naip_id: itemid of the NAIP from which *p* was sampled from
- polygon_id: id of the polygon from which *p* was sampled from
- iceplant: whether point *p* corresponds to a confirmed iceplant location or a confirmed non-iceplant location (0 = non-iceplant, 1 = iceplant)
- r, g, b, nir: Red, Green, Blue, and NIR values of NAIP scene with naip_id at coordinates of point *p*
- ndvi: computed for each point using the Red and NIR bands
- year, month, day_in_year: year, month, and day of the year when the NAIP image was collected
- aoi: name of the area of interest where the points were sampled from


The data frames are saved in the 'temp' folder as a csv file. Filenames have the structure: `aoi_points_year.csv'`
The stats are saved in the current working directory with a specified file name.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import os
import geopandas as gpd

import sample_rasters as sr

# Specify notebook variables

In [2]:
# ***************************************************
# ************* NOTEBOOK VARIABLES ******************

#aois = ['campus_lagoon','carpinteria','gaviota','point_conception']
aois = ['capitan']

# years = array of years, can be any subset from [2012, 2014, 2016, 2018, 2020]
#years = [2012, 2014, 2016, 2018, 2020]
years = ['2020']

ice_param = 'constant'
nonice_param = 'sliding'

sample_fraction = 0.01
max_sample = 10
const_sample = 5

# print stats as notebook runs
verbose = True

# save stats
save_stats = False
stats_csv_name = 'stats_sampling_pts_from_polygons.csv'

#save points
save_pts = False

# Sample points

In [3]:
# initialize sampling statistcs df
stats = []

# sample points
for aoi in aois:
    for year in years:
        
        #there's no data for Point Conception on 2016
        if ('point_conception' != aoi) or (year != 2016):  
            # open polygons
            fp = sr.path_to_polygons(aoi, year)
            polys = gpd.read_file(fp)
            # -------------------------
            # select iceplant polygons and sample sample_fraction of pts in each polygon 
            polys_ice = polys.loc[polys.iceplant == 1].reset_index(drop = True)
            
            pts_ice = sr.sample_naip_from_polys_no_warnings(polys = polys_ice,
                                                            class_name = 'iceplant',
                                                            itemid = polys.naip_id[0], 
                                                            param = ice_param,
                                                            sample_fraction = sample_fraction,
                                                            max_sample = max_sample,
                                                            const_sample = const_sample)  

            # -------------------------
            # select non-iceplant polygons and sample sample_fraction of pts in each polygon,  
            # but at most max_sample points 
            polys_nonice = polys.loc[polys.iceplant==0]
            polys_nonice = polys_nonice.reset_index(drop=True)
            
            pts_nonice = sr.sample_naip_from_polys_no_warnings(polys = polys_nonice,
                                                            class_name = 'iceplant',
                                                            itemid = polys.naip_id[0], 
                                                            param = nonice_param,
                                                            sample_fraction = sample_fraction,
                                                            max_sample = max_sample,
                                                            const_sample = const_sample)  
            # -------------------------            
            # assemble into single dataframe
            pts = pd.concat([pts_ice, pts_nonice])
            pts['aoi'] = aoi
            
            # add ndvi as feature
            pts['ndvi'] = (pts.nir.astype('int16') - pts.r.astype('int16'))/(pts.nir.astype('int16') + pts.r.astype('int16'))

            # -------------------------           
            # save points as csv in temp folder  
            if save_pts:
                # create temp directory if needed
                tmp_path = os.path.join(os.getcwd(),'temp')  
                if not os.path.exists(tmp_path):
                    os.mkdir(tmp_path)

                fp = sr.path_to_spectral_pts(aoi, year)
                pts.to_csv(fp, index=False)
            
            # -------------------------
            # print sample statistics
            if verbose:
                print('************ '+aoi+ ' ' +str(year)+' ************')
                sr.iceplant_proportions(pts.iceplant)
                print( '---------------------------------------')
                
            # -------------------------
            # keep track of statistics for saving
            if save_stats:
                n_ice =  pts_ice.shape[0]
                n_nonice =  pts_nonice.shape[0]
                total = n_ice + n_nonice
                
                stat = [aoi, 
                     year, 
                     str(round(n_nonice/n_ice,1))+':1', 
                     round(n_ice/total*100,2),
                     round(n_nonice/total*100,2),
                     n_ice,
                     n_nonice
                    ]
                stats.append(stat)
                
# -------------------------   
# save stats
if save_stats:      
    stats_df = pd.DataFrame(stats, 
                            columns=['aoi', 'year', 'ratio','perc_ice','perc_nonice','n_ice','n_nonice'])    
    fp = os.path.join(os.getcwd(),  
                      stats_csv_name)
    stats_df.to_csv(fp, index=False)

# -------------------------                    
# final processing message               
if verbose:                
    print( 'non iceplant:iceplant ratio   ', sum(stats_df.n_nonice)/sum(stats_df.n_ice))