# About

This notebook extracts spectral and date features of NAIP images at random points within polygons in the 'polygons_form_naip_images' folder. These polygons are known ice plant and non ice plant locations within a specific NAIP image. 


Once the aoi and years are specified, the notebook samples first polygons labeled as ice plant locations and then polygons labeled as non-ice plant locations. Two methods for sampling polygons are implemented in the `extracting_points_from_polygons,` and both are used in this notebook. The first one, `naip_sample_proportion`, samples a fixed fraction of the points in each polygon. The second one, `naip_sample_sliding`, samples a fixed fraction of the points in each polygon up to a maximum number of points. Polygons vary greatly in size, so simply sampling a fraction of the points in each polygon would result in an over-sampling of bigger polygons (most often those corresponding to non-ice plant locations), which in turn would unbalance the training set towards one label. The parameters used in this notebook were determined to obtain a final training set with a 3:1 proportion of non-ice plant to ice plant points. 

**NOTEBOOK VARIABLES:**

- `aoi` (string): This is the area of interest from which the polygons we want to sample were collected. Must be one of the following: 'campus_lagoon','carpinteria','gaviota',or 'point_conception'. 

- `years` (array): can be any subset from [2012, 2014, 2016, 2018, 2020], except if aoi = 'point_conception'. If aoi = 'point_conception' then 2016 should not be included since there are not points to NAIP images to sample from on that year. 

- `sample_fraction` (float in (0,1]): fraction of points to sample from each polygon

- `max_sample` (int): maximum number of points to sample from a polygon

**OUTPUT:**

The output is a dataframe of points with the following features:

- geometry: coordinates of point *p* in the NAIP image's CRS
- naip_id: itemid of the NAIP from which *p* was sampled from
- polygon_id: id of the polygon from which *p* was sampled from
- iceplant: whether point *p* corresponds to a confirmed iceplant location or a confirmed non-iceplant location (0 = non-iceplant, 1 = iceplant)
- r, g, b, nir: Red, Green, Blue and NIR bands values of NAIP scene with naip_id at at cooridnates of point *p*
- year, month, day: year, month and day when the NAIP image was collected


The dataframe is then saved in the 'temp' folder as a csv file.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import os
import geopandas as gpd

import extracting_points_from_polygons as pp

# Specify AOI and years from which to sample points

In [10]:
# ***************************************************
# ************* NOTEBOOK VARIABLES ******************

# aois = ['campus_lagoon','carpinteria','gaviota','point_conception']
aoi = 'carpinteria'

# years = array of years, can be any subset from [2012, 2014, 2016, 2018, 2020]
# except if aoi = 'point_conception' (no pts for 2016)
years = [2012, 2014, 2016, 2018, 2020]

# sample 90% of pts in each polygon
sample_fraction = 0.9

# maximum number of pts to sample in a polygon
max_sample = 1500
# ***************************************************
# ***************************************************

In [11]:
all_pts = []

for year in years:
    
    # open polygons
    fp = pp.path_to_polygons(aoi,year)
    polys = gpd.read_file(fp)
    
    polys_ice = polys.loc[polys.iceplant==1]
    polys_ice.reset_index(inplace=True, drop=True)
    # sample sample_fraction of pts in polygon
    pts_ice = pp.naip_sample_proportion_no_warnings(polys_ice, 
                                                    polys.naip_id[0], 
                                                    sample_fraction)  

    print('*** '+str(year)+' # ice plant pts sampled')
    print(pts_ice.shape[0], '\n')
    pts_ice.polygon_id.value_counts()

    
    polys_nonice = polys.loc[polys.iceplant==0]
    polys_nonice.reset_index(inplace=True, drop=True)
    # sample sample_fraction of pts in polygon, but at most max_sample points
    pts_nonice = pp.naip_sample_sliding_no_warnings(polys_nonice, polys.naip_id[0], 
                                                    sample_fraction, 
                                                    max_sample)

    print('*** '+str(year)+' # non ice plant pts sampled')
    print(pts_nonice.shape[0], '\n')
    pts_nonice.polygon_id.value_counts()
    
    pts = pd.concat([pts_ice,pts_nonice])
    print('*** '+str(year)+' # proportions')
    pp.iceplant_proportions(pts.iceplant)
    
    all_pts.append(pts)
    
    print( '---------------------------------------')

*** 2012 # ice plant pts sampled
3433 

*** 2012 # non ice plant pts sampled
15587 

*** 2012 # proportions
no-iceplant:iceplant ratio     4.5 :1
          counts  percentage
iceplant                    
0          15587       81.95
1           3433       18.05

---------------------------------------
*** 2014 # ice plant pts sampled
2221 

*** 2014 # non ice plant pts sampled
15587 

*** 2014 # proportions
no-iceplant:iceplant ratio     7.0 :1
          counts  percentage
iceplant                    
0          15587       87.53
1           2221       12.47

---------------------------------------
*** 2016 # ice plant pts sampled
2973 

*** 2016 # non ice plant pts sampled
21135 

*** 2016 # proportions
no-iceplant:iceplant ratio     7.1 :1
          counts  percentage
iceplant                    
0          21135       87.67
1           2973       12.33

---------------------------------------
*** 2018 # ice plant pts sampled
6388 

*** 2018 # non ice plant pts sampled
24135 

*** 20

# Save points

In [13]:
for i in range(0,len(years)):
    fp = os.path.join(os.getcwd(), 
                      'temp', 
                      aoi+'_points_'+str(years[i])+'.csv')
    all_pts[i].to_csv(fp)