# About

This notebook extracts spectral and date features from NAIP images at random points within polygons. 
The polygons used here are depict verified iceplant locations within four NAIP images along the Santa Barbara County coast. 

Four methods for sampling points within polygons are implemented in the `sample_rasters.py` module, all can be used in this notebook. These are:

- sample a fixed fraction of pixels in each polygon,
- sample a fixed fraction of the pixels in each polygon up to a maximum number of points,
- sample a fixed number of points from each polygon, and
- specify a total number of points and sample from polygons proportionally to their area.

Polygons vary significantly in size, so it is important to take into account that just sampling a fraction of the pixels in each polygon will result in an over-sampling of bigger polygons.

**NOTEBOOK VARIABLES:**

- `aois` (array): These are the areas of interest where we collected the polygons we want to sample. Must be a subset of: `['campus_lagoon','carpinteria','gaviota','capitan']`. 

- `years` (array): can be any subset of `[2012, 2014, 2016, 2018, 2020]`.

- `sample_param` (str): determines which sampling method will be used to create the samples from the polygons and non-iceplant polygons. Must be one of 'fraction', 'sliding',  'constant', or 'proportional. 

- `sample_fraction` (float in (0,1]): fraction of points to sample from each polygon (if using 'fraction' or 'sliding' sample).

- `max_sample` (int): maximum number of points to sample from a polygon (if using 'sliding' sample)

- `const_sample` (int): constant number of points to sample from each polygon (if using 'constant' sample')

- `verbose` (bool): whether to print the stats of how many points were sampled per year and area of interest as the notebook runs

- `save_stats` (bool): whether to save as a csv file the stats of how many points were sampled from each year and area of interest

- `stats_csv_name` (str): where to save the stats from sampled points in the form "file_name.csv"

- `save_pts` (bool): whether to save points as a csv file in temp folder  


**OUTPUT:**

The output is a data frame of points with the following features:

- x, y: coordinates of point *p* 
- pts_crs: CRS of coordinates x, y
- naip_id: itemid of the NAIP from which *p* was sampled from
- polygon_id: id of the polygon from which *p* was sampled from
- iceplant: whether point *p* corresponds to a confirmed iceplant location or a confirmed non-iceplant location (0 = non-iceplant, 1 = iceplant)
- r, g, b, nir: Red, Green, Blue, and NIR values of NAIP scene with naip_id at coordinates of point *p*
- ndvi: computed for each point using the Red and NIR bands
- year, month, day_in_year: year, month, and day of the year when the NAIP image was collected
- aoi: name of the area of interest where the points were sampled from


The data frames are saved in the 'temp' folder as a csv file. Filenames have the structure: `aoi_points_year.csv'`
The stats are saved in the current working directory with a specified file name.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import os
import geopandas as gpd

import sample_rasters as sr

# Specify notebook variables

In [None]:
### ***************************************************
# ************* NOTEBOOK VARIABLES ******************

#aois = ['campus_lagoon','carpinteria','gaviota','point_conception']
aois = ['capitan','gaviota']

# years = array of years, can be any subset from [2012, 2014, 2016, 2018, 2020]
#years = [2012, 2014, 2016, 2018, 2020]
years = [2020]

sample_param = 'constant'
const_sample = 1
sample_fraction = 0.01
max_sample = 10
total_pts = 140

# print stats as notebook runs
verbose = False

# convert to epsg 4326
convert_crs = True

# save stats
save_stats = False
stats_csv_name = 'stats_sampling_pts_from_polygons.csv'

#save points
save_pts = False

In [3]:
def path_2_polys(aoi):
    fp = os.path.join(os.getcwd(),'FINAL_iceplant_polygons',
                      aoi+'_FINAL_iceplant_polygons',
                      aoi+'_FINAL_iceplant_polygons.shp')
    
    # check there is a file at filepath
    if not os.path.exists(fp):
        print('invalid filepath: no file')
        return
    return fp
    

# Sample points

In [4]:
# initialize sampling statistcs df
# stats = []

if save_pts:
    # create temp directory if needed
    tmp_path = os.path.join(os.getcwd(),'temp')  
    if not os.path.exists(tmp_path):
        os.mkdir(tmp_path)

# sample points
all_pts = []
for aoi in aois:
    for year in years:
        
        #there's no data for Point Conception on 2016
        if ('point_conception' != aoi) or (year != 2016):  
            # open polygons
            #fp = sr.path_to_polygons(aoi, year)
            fp = path_2_polys(aoi)
            polys = gpd.read_file(fp)
            # -------------------------
            # select iceplant polygons and sample sample_fraction of pts in each polygon 
            polys_ice = polys.loc[polys.iceplant == 1].reset_index(drop = True)
            
            pts = sr.sample_naip_from_polys_no_warnings(polys = polys_ice,
                                                            class_name = 'iceplant',
                                                            itemid = polys.aoi[0], 
                                                            param = sample_param,
                                                            sample_fraction = sample_fraction,
                                                            max_sample = max_sample,
                                                            const_sample = const_sample,
                                                            total_pts = total_pts)  
            pts['aoi'] = aoi
            # add ndvi as feature
            pts['ndvi'] = (pts.nir.astype('int16') - pts.r.astype('int16'))/(pts.nir.astype('int16') + pts.r.astype('int16'))
            if (not convert_crs) and save_pts:
                fp = sr.path_to_spectral_pts(aoi, year)            
                pts.to_csv(fp, index=False)
            if convert_crs:
                all_pts.append(pts)

all_pts = pd.concat(all_pts).reset_index(drop=True)
# -------------------------           
# save points as csv in temp folder  
if save_pts:
    # create temp directory if needed
    tmp_path = os.path.join(os.getcwd(),'temp')  
    if not os.path.exists(tmp_path):
        os.mkdir(tmp_path)

    fp = sr.path_to_spectral_pts(aoi, year)
    all_pts.to_csv(fp, index=False)

In [5]:
len(pts)

72

In [6]:
pts

Unnamed: 0,x,y,pts_crs,polygon_id,iceplant,r,g,b,nir,year,month,day_in_year,naip_id,aoi,ndvi
0,770000.863562,3.817379e+06,epsg:26910,0,1,98,114,98,179,2020,5,143,ca_m_3412040_ne_10_060_20200522,capitan,0.292419
1,769989.386424,3.817384e+06,epsg:26910,0,1,148,143,129,155,2020,5,143,ca_m_3412040_ne_10_060_20200522,capitan,0.023102
2,769991.658497,3.817370e+06,epsg:26910,1,1,119,113,99,174,2020,5,143,ca_m_3412040_ne_10_060_20200522,capitan,0.187713
3,770063.711583,3.817358e+06,epsg:26910,1,1,126,123,98,176,2020,5,143,ca_m_3412040_ne_10_060_20200522,capitan,0.165563
4,770105.021872,3.817355e+06,epsg:26910,2,1,139,131,118,162,2020,5,143,ca_m_3412040_ne_10_060_20200522,capitan,0.076412
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
67,775786.356493,3.817228e+06,epsg:26910,33,1,114,114,89,145,2020,5,143,ca_m_3412040_ne_10_060_20200522,capitan,0.119691
68,775833.915227,3.817219e+06,epsg:26910,34,1,123,121,92,163,2020,5,143,ca_m_3412040_ne_10_060_20200522,capitan,0.139860
69,775809.915024,3.817223e+06,epsg:26910,34,1,119,114,93,148,2020,5,143,ca_m_3412040_ne_10_060_20200522,capitan,0.108614
70,774216.814207,3.817441e+06,epsg:26910,35,1,130,144,103,182,2020,5,143,ca_m_3412040_ne_10_060_20200522,capitan,0.166667
