# About

This notebook extracts spectral and date features from NAIP images at random points within polygons. 
The polygons used here depict verified iceplant locations within four NAIP images along the Santa Barbara County coast. 

Four methods for sampling points within polygons are implemented in the `sample_rasters.py` module, all can be used in this notebook. These are:

- sample a fixed fraction of pixels in each polygon,
- sample a fixed fraction of the pixels in each polygon up to a maximum number of points,
- sample a fixed number of points from each polygon, and
- specify a total number of points and sample from polygons relative to their area.

**NOTEBOOK VARIABLES:**

- `aois` (array): These are the areas of interest where we collected the polygons we want to sample. Must be a subset of: `['campus_lagoon','carpinteria','gaviota','capitan']`. 

- `sample_param` (str): determines which sampling method will be used to create the samples from the polygons and non-iceplant polygons. Must be one of 'fraction', 'sliding',  'constant', or 'staggered'. 

- `total_pts` (array of int): the number of points to sample from each aoi when using 'staggered' sampling method

- `sample_fraction` (float in (0,1]): fraction of points to sample from each polygon (if using 'fraction' or 'sliding' sample).

- `max_sample` (int): maximum number of points to sample from a polygon (if using 'sliding' sample)

- `const_sample` (int): constant number of points to sample from each polygon (if using 'constant' sample')

- `save_pts` (bool): whether to save points as a csv file in temp folder  

- `convert_crs` (bool): whether to match all sampled points to the same CRS (EPSG 4326), otherwise points have the crs of the naip image it was sampled from.


**OUTPUT:**

The output is a data frame of points with the following features:

- x, y: coordinates of point *p* 
- pts_crs: CRS of coordinates x, y
- naip_id: itemid of the NAIP from which *p* was sampled from
- polygon_id: id of the polygon from which *p* was sampled from
- iceplant: whether point *p* corresponds to a confirmed iceplant location or a confirmed non-iceplant location (0 = non-iceplant, 1 = iceplant)
- r, g, b, nir: Red, Green, Blue, and NIR values of NAIP scene with naip_id at coordinates of point *p*
- ndvi: computed for each point using the Red and NIR bands
- year, month, day_in_year: year, month, and day of the year when the NAIP image was collected
- aoi: name of the area of interest where the points were sampled from


The data frames are saved in the 'temp' folder as a csv file. Filenames have the structure: `aoi_iceplant_points.csv'`
The stats are saved in the current working directory with a specified file name.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import os
import geopandas as gpd
from rasterio import CRS

import sample_rasters as sr

### ***************************************************
def path_2_polys(aoi):
    """
        Creates a path to the shapefile with polygons collected at specified aoi. 
            Parameters:
                        aoi (str): name of aoi in polygon's file name
            Return: fp (str): if the file exists it returns the constructed file path
    """    
    
    fp = os.path.join(os.getcwd(),'FINAL_iceplant_polygons',
                      aoi+'_FINAL_iceplant_polygons',
                      aoi+'_FINAL_iceplant_polygons.shp')
    
    # check there is a file at filepath
    if not os.path.exists(fp):
        print('invalid filepath: no file')
        return
    return fp
    

# Specify notebook variables

In [2]:
### ***************************************************
# ************* NOTEBOOK VARIABLES ******************

aois = ['carpinteria','campus_lagoon','capitan','gaviota']

sample_param = 'staggered'
total_pts = [64,173,177,183]

const_sample = 2
sample_fraction = 0.01
max_sample = 10


# convert to epsg 4326
convert_crs = True

#save points
save_pts = True

# Sample points

In [3]:
# create temp directory if needed, final samples pts are saved here
if save_pts:
    
    tmp_path = os.path.join(os.getcwd(),'temp')  
    if not os.path.exists(tmp_path):
        os.mkdir(tmp_path)

# -------------------------
# sample points
all_pts = []
for aoi, total_pts_aoi in zip(aois,total_pts):
    # open polygons
    fp = path_2_polys(aoi)
    polys = gpd.read_file(fp)
    
    # -------------------------
    # select iceplant polygons
    polys_ice = polys.loc[polys.iceplant == 1].reset_index(drop = True)

    # sample points according to parameters
    pts = sr.sample_naip_from_polys_no_warnings(polys = polys_ice,
                                                    class_name = 'iceplant',
                                                    itemid = polys.aoi[0], 
                                                    param = sample_param,
                                                    sample_fraction = sample_fraction,
                                                    max_sample = max_sample,
                                                    const_sample = const_sample,
                                                    total_pts = total_pts_aoi)  
    pts['aoi'] = aoi 
    # add ndvi as feature
    pts['ndvi'] = (pts.nir.astype('int16') - pts.r.astype('int16'))/(pts.nir.astype('int16') + pts.r.astype('int16'))
    # -------------------------
    # if we don't need to match the crs of all points, save each file sepparately
    if (not convert_crs) and save_pts:
        fp = os.path.join(os.getcwd(),'temp',aoi+'_iceplant_pts.csv')
        pts.to_csv(fp, index=False)
    if convert_crs:
        all_pts.append(pts)

# -------------------------
# match crs of all sampled points to EPSG 4326
if convert_crs:
    same_crs_pts = []
    for df in all_pts:
        # -------------------------        
        # find crs of points and create geodataframe
        aoi = df.aoi[0]
        if aoi in ['campus_lagoon','carpinteria']:
            crs = 26911
        else:
            crs = 26910
        gdf = gpd.GeoDataFrame(df,
                               geometry = gpd.points_from_xy(df.x, df.y),
                               crs = CRS.from_epsg(crs))
        # -------------------------        
        # conver to EPSG 4326 crs
        gdf = gdf.to_crs(CRS.from_epsg(4326))
        same_crs_pts.append(gdf)

    # -------------------------        
    # create final dataframe of pts
    pts = pd.concat(same_crs_pts, ignore_index=True)
    # -------------------------        
    # update coordinate and crs columns
    pts = pts.drop(['x','y','pts_crs'], axis=1)
    pts = pts.assign(x = lambda pt: pt.geometry.x)
    pts = pts.assign(y = lambda pt: pt.geometry.y)
    pts['pts_crs'] = 'EPSG:4326'
    pts = pts.drop(['geometry'], axis=1)

# -------------------------        
# save points if needed
if convert_crs and save_pts:
    fp = os.path.join(os.getcwd(),'temp','iceplant_pts.csv')
    pts.to_csv(fp, index=False)

In [6]:
pts.groupby(['aoi']).count()

Unnamed: 0_level_0,polygon_id,iceplant,r,g,b,nir,year,month,day_in_year,naip_id,ndvi,x,y,pts_crs
aoi,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
campus_lagoon,173,173,173,173,173,173,173,173,173,173,173,173,173,173
capitan,177,177,177,177,177,177,177,177,177,177,177,177,177,177
carpinteria,64,64,64,64,64,64,64,64,64,64,64,64,64,64
gaviota,183,183,183,183,183,183,183,183,183,183,183,183,183,183
