# About

This notebook extracts spectral and date features from NAIP images at random points within polygons. 
The polygons used here are depict verified iceplant locations within four NAIP images along the Santa Barbara County coast. 

Four methods for sampling points within polygons are implemented in the `sample_rasters.py` module, all can be used in this notebook. These are:

- sample a fixed fraction of pixels in each polygon,
- sample a fixed fraction of the pixels in each polygon up to a maximum number of points,
- sample a fixed number of points from each polygon, and
- specify a total number of points and sample from polygons proportionally to their area.

Polygons vary significantly in size, so it is important to take into account that just sampling a fraction of the pixels in each polygon will result in an over-sampling of bigger polygons.

**NOTEBOOK VARIABLES:**

- `aois` (array): These are the areas of interest where we collected the polygons we want to sample. Must be a subset of: `['campus_lagoon','carpinteria','gaviota','capitan']`. 


- `sample_param` (str): determines which sampling method will be used to create the samples from the polygons and non-iceplant polygons. Must be one of 'fraction', 'sliding',  'constant', or 'proportional. 

- `sample_fraction` (float in (0,1]): fraction of points to sample from each polygon (if using 'fraction' or 'sliding' sample).

- `max_sample` (int): maximum number of points to sample from a polygon (if using 'sliding' sample)

- `const_sample` (int): constant number of points to sample from each polygon (if using 'constant' sample')

- `verbose` (bool):

- `save_pts` (bool): whether to save points as a csv file in temp folder  


**OUTPUT:**

The output is a data frame of points with the following features:

- x, y: coordinates of point *p* 
- pts_crs: CRS of coordinates x, y
- naip_id: itemid of the NAIP from which *p* was sampled from
- polygon_id: id of the polygon from which *p* was sampled from
- iceplant: whether point *p* corresponds to a confirmed iceplant location or a confirmed non-iceplant location (0 = non-iceplant, 1 = iceplant)
- r, g, b, nir: Red, Green, Blue, and NIR values of NAIP scene with naip_id at coordinates of point *p*
- ndvi: computed for each point using the Red and NIR bands
- year, month, day_in_year: year, month, and day of the year when the NAIP image was collected
- aoi: name of the area of interest where the points were sampled from


The data frames are saved in the 'temp' folder as a csv file. Filenames have the structure: `aoi_iceplant_points.csv'`
The stats are saved in the current working directory with a specified file name.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import os
import geopandas as gpd
from rasterio import CRS

import sample_rasters as sr

### ***************************************************
def path_2_polys(aoi):
    """
        Creates a path to the shapefile with polygons collected at specified aoi. 
            Parameters:
                        aoi (str): name of aoi in polygon's file name
            Return: fp (str): if the file exists it returns the constructed file path
    """    
    
    fp = os.path.join(os.getcwd(),'FINAL_iceplant_polygons',
                      aoi+'_FINAL_iceplant_polygons',
                      aoi+'_FINAL_iceplant_polygons.shp')
    
    # check there is a file at filepath
    if not os.path.exists(fp):
        print('invalid filepath: no file')
        return
    return fp
    

# Specify notebook variables

In [2]:
### ***************************************************
# ************* NOTEBOOK VARIABLES ******************

aois = ['carpinteria','campus_lagoon','capitan','gaviota']

sample_param = 'staggered'
total_pts = [64,173,177,183]

const_sample = 2
sample_fraction = 0.01
max_sample = 10


# convert to epsg 4326
convert_crs = True

#save points
save_pts = True

# Sample points

In [3]:
# initialize sampling statistcs df
# stats = []

if save_pts:
    # create temp directory if needed
    tmp_path = os.path.join(os.getcwd(),'temp')  
    if not os.path.exists(tmp_path):
        os.mkdir(tmp_path)

# sample points
all_pts = []
for aoi,total_pts_aoi in zip(aois,total_pts):
    # open polygons
    fp = path_2_polys(aoi)
    polys = gpd.read_file(fp)
    # -------------------------
    # select iceplant polygons and sample sample_fraction of pts in each polygon 
    polys_ice = polys.loc[polys.iceplant == 1].reset_index(drop = True)

    pts = sr.sample_naip_from_polys_no_warnings(polys = polys_ice,
                                                    class_name = 'iceplant',
                                                    itemid = polys.aoi[0], 
                                                    param = sample_param,
                                                    sample_fraction = sample_fraction,
                                                    max_sample = max_sample,
                                                    const_sample = const_sample,
                                                    total_pts = total_pts_aoi)  
    pts['aoi'] = aoi
    # add ndvi as feature
    pts['ndvi'] = (pts.nir.astype('int16') - pts.r.astype('int16'))/(pts.nir.astype('int16') + pts.r.astype('int16'))
    if (not convert_crs) and save_pts:
        fp = os.path.join(os.getcwd(),'temp',aoi+'_iceplant_pts.csv')
        pts.to_csv(fp, index=False)
    if convert_crs:
        all_pts.append(pts)

if convert_crs:
    same_crs_pts = []
    for df in all_pts:
        aoi = df.aoi[0]

        if aoi in ['campus_lagoon','carpinteria']:
            crs = 26911
        else:
            crs = 26910

        gdf = gpd.GeoDataFrame(df,
                               geometry = gpd.points_from_xy(df.x, df.y),
                               crs = CRS.from_epsg(crs))
        gdf = gdf.to_crs(CRS.from_epsg(4326))
        gdf['x'] = gdf.geometry.x
        gdf['y'] = gdf.geometry.y
        same_crs_pts.append(gdf)

    pts = pd.concat(same_crs_pts, ignore_index=True)
    pts = pts.drop(['x','y','pts_crs'], axis=1)
    pts = pts.assign(x = lambda pt: pt.geometry.x)
    pts = pts.assign(y = lambda pt: pt.geometry.y)
    pts['pts_crs'] = 'EPSG:4326'
    pts = pts.drop(['geometry'], axis=1)

if convert_crs and save_pts:
    fp = os.path.join(os.getcwd(),'temp','iceplant_pts.csv')
    pts.to_csv(fp, index=False)

In [4]:
pts

Unnamed: 0,polygon_id,iceplant,r,g,b,nir,year,month,day_in_year,naip_id,aoi,ndvi,x,y,pts_crs
0,0,1,78,92,74,163,2020,5,142,ca_m_3411936_se_11_060_20200521,carpinteria,0.352697,-119.554111,34.411326,EPSG:4326
1,1,1,71,88,74,161,2020,5,142,ca_m_3411936_se_11_060_20200521,carpinteria,0.387931,-119.553556,34.410872,EPSG:4326
2,3,1,66,88,72,170,2020,5,142,ca_m_3411936_se_11_060_20200521,carpinteria,0.440678,-119.552860,34.410397,EPSG:4326
3,4,1,84,100,75,171,2020,5,142,ca_m_3411936_se_11_060_20200521,carpinteria,0.341176,-119.552515,34.410083,EPSG:4326
4,7,1,110,117,95,174,2020,5,142,ca_m_3411936_se_11_060_20200521,carpinteria,0.225352,-119.551980,34.408956,EPSG:4326
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
592,27,1,120,115,92,155,2020,5,143,ca_m_3412039_nw_10_060_20200522,gaviota,0.127273,-120.216142,34.472995,EPSG:4326
593,27,1,118,114,90,158,2020,5,143,ca_m_3412039_nw_10_060_20200522,gaviota,0.144928,-120.216216,34.473028,EPSG:4326
594,27,1,115,116,83,168,2020,5,143,ca_m_3412039_nw_10_060_20200522,gaviota,0.187279,-120.216189,34.472992,EPSG:4326
595,27,1,124,117,92,160,2020,5,143,ca_m_3412039_nw_10_060_20200522,gaviota,0.126761,-120.216177,34.473019,EPSG:4326


In [5]:
64+173+177+183

597