# About

This notebook extracts spectral and date features from NAIP images at random points within polygons. 
The polygons used here are depict verified iceplant locations within four NAIP images along the Santa Barbara County coast. 

Four methods for sampling points within polygons are implemented in the `sample_rasters.py` module, all can be used in this notebook. These are:

- sample a fixed fraction of pixels in each polygon,
- sample a fixed fraction of the pixels in each polygon up to a maximum number of points,
- sample a fixed number of points from each polygon, and
- specify a total number of points and sample from polygons proportionally to their area.

Polygons vary significantly in size, so it is important to take into account that just sampling a fraction of the pixels in each polygon will result in an over-sampling of bigger polygons.

**NOTEBOOK VARIABLES:**

- `aois` (array): These are the areas of interest where we collected the polygons we want to sample. Must be a subset of: `['campus_lagoon','carpinteria','gaviota','capitan']`. 

- `years` (array): can be any subset of `[2012, 2014, 2016, 2018, 2020]`.

- `sample_param` (str): determines which sampling method will be used to create the samples from the polygons and non-iceplant polygons. Must be one of 'fraction', 'sliding',  'constant', or 'proportional. 

- `sample_fraction` (float in (0,1]): fraction of points to sample from each polygon (if using 'fraction' or 'sliding' sample).

- `max_sample` (int): maximum number of points to sample from a polygon (if using 'sliding' sample)

- `const_sample` (int): constant number of points to sample from each polygon (if using 'constant' sample')

- `verbose` (bool): whether to print the stats of how many points were sampled per year and area of interest as the notebook runs

- `save_stats` (bool): whether to save as a csv file the stats of how many points were sampled from each year and area of interest

- `stats_csv_name` (str): where to save the stats from sampled points in the form "file_name.csv"

- `save_pts` (bool): whether to save points as a csv file in temp folder  


**OUTPUT:**

The output is a data frame of points with the following features:

- x, y: coordinates of point *p* 
- pts_crs: CRS of coordinates x, y
- naip_id: itemid of the NAIP from which *p* was sampled from
- polygon_id: id of the polygon from which *p* was sampled from
- iceplant: whether point *p* corresponds to a confirmed iceplant location or a confirmed non-iceplant location (0 = non-iceplant, 1 = iceplant)
- r, g, b, nir: Red, Green, Blue, and NIR values of NAIP scene with naip_id at coordinates of point *p*
- ndvi: computed for each point using the Red and NIR bands
- year, month, day_in_year: year, month, and day of the year when the NAIP image was collected
- aoi: name of the area of interest where the points were sampled from


The data frames are saved in the 'temp' folder as a csv file. Filenames have the structure: `aoi_points_year.csv'`
The stats are saved in the current working directory with a specified file name.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import os
import geopandas as gpd

import sample_rasters as sr

### ***************************************************
def path_2_polys(aoi):
    """
        Creates a path to the shapefile with polygons collected at specified aoi. 
            Parameters:
                        aoi (str): name of aoi in polygon's file name
            Return: fp (str): if the file exists it returns the constructed file path
    """    
    
    fp = os.path.join(os.getcwd(),'FINAL_iceplant_polygons',
                      aoi+'_FINAL_iceplant_polygons',
                      aoi+'_FINAL_iceplant_polygons.shp')
    
    # check there is a file at filepath
    if not os.path.exists(fp):
        print('invalid filepath: no file')
        return
    return fp
    

# Specify notebook variables

In [2]:
### ***************************************************
# ************* NOTEBOOK VARIABLES ******************

#aois = ['campus_lagoon','carpinteria','gaviota','point_conception']
aois = ['gaviota']

# years = array of years, can be any subset from [2012, 2014, 2016, 2018, 2020]
#years = [2012, 2014, 2016, 2018, 2020]
years = [2020]

sample_param = 'staggered'
const_sample = 2
sample_fraction = 0.01
max_sample = 10
total_pts = 100

# print stats as notebook runs
verbose = False

# convert to epsg 4326
convert_crs = False

# save stats
save_stats = False
stats_csv_name = 'stats_sampling_pts_from_polygons.csv'

#save points
save_pts = False

# Sample points

In [3]:
# initialize sampling statistcs df
# stats = []

if save_pts:
    # create temp directory if needed
    tmp_path = os.path.join(os.getcwd(),'temp')  
    if not os.path.exists(tmp_path):
        os.mkdir(tmp_path)

# sample points
all_pts = []
for aoi in aois:
    for year in years:
        
        #there's no data for Point Conception on 2016
        if ('point_conception' != aoi) or (year != 2016):  
            # open polygons
            #fp = sr.path_to_polygons(aoi, year)
            fp = path_2_polys(aoi)
            polys = gpd.read_file(fp)
            # -------------------------
            # select iceplant polygons and sample sample_fraction of pts in each polygon 
            polys_ice = polys.loc[polys.iceplant == 1].reset_index(drop = True)
            
            pts = sr.sample_naip_from_polys_no_warnings(polys = polys_ice,
                                                            class_name = 'iceplant',
                                                            itemid = polys.aoi[0], 
                                                            param = sample_param,
                                                            sample_fraction = sample_fraction,
                                                            max_sample = max_sample,
                                                            const_sample = const_sample,
                                                            total_pts = total_pts)  
            pts['aoi'] = aoi
            # add ndvi as feature
            pts['ndvi'] = (pts.nir.astype('int16') - pts.r.astype('int16'))/(pts.nir.astype('int16') + pts.r.astype('int16'))
            if (not convert_crs) and save_pts:
                fp = sr.path_to_spectral_pts(aoi, year)            
                pts.to_csv(fp, index=False)
            if convert_crs:
                all_pts.append(pts)

[6 4 2 3 3 4 1 5 6 3 3 5 1 3 1 5 4 2 4 1 1 1 5 5 6 6 6 4]


In [4]:
pts

Unnamed: 0,x,y,pts_crs,polygon_id,iceplant,r,g,b,nir,year,month,day_in_year,naip_id,aoi,ndvi
0,754468.827557,3.817929e+06,epsg:26910,0,1,121,123,89,174,2020,5,143,ca_m_3412039_nw_10_060_20200522,gaviota,0.179661
1,754464.141891,3.817939e+06,epsg:26910,0,1,106,116,92,171,2020,5,143,ca_m_3412039_nw_10_060_20200522,gaviota,0.234657
2,754471.286636,3.817935e+06,epsg:26910,0,1,103,117,82,174,2020,5,143,ca_m_3412039_nw_10_060_20200522,gaviota,0.256318
3,754465.081795,3.817932e+06,epsg:26910,0,1,110,121,100,160,2020,5,143,ca_m_3412039_nw_10_060_20200522,gaviota,0.185185
4,754467.967108,3.817939e+06,epsg:26910,0,1,85,102,78,172,2020,5,143,ca_m_3412039_nw_10_060_20200522,gaviota,0.338521
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,758582.815321,3.818015e+06,epsg:26910,26,1,122,125,97,172,2020,5,143,ca_m_3412039_nw_10_060_20200522,gaviota,0.170068
96,755680.442795,3.818121e+06,epsg:26910,27,1,119,111,88,164,2020,5,143,ca_m_3412039_nw_10_060_20200522,gaviota,0.159011
97,755685.713693,3.818121e+06,epsg:26910,27,1,124,117,95,156,2020,5,143,ca_m_3412039_nw_10_060_20200522,gaviota,0.114286
98,755685.208877,3.818122e+06,epsg:26910,27,1,121,117,92,157,2020,5,143,ca_m_3412039_nw_10_060_20200522,gaviota,0.129496


In [None]:
# ----------------------------------
def path_to_spectral_pts(aoi, year):
    fp = os.path.join(os.getcwd(), 
                             'temp', 
                            aoi + '_points_'+str(year)+'.csv')
    # TO DO: maybe change to _spectral_points
    return fp

In [None]:
len(pts)

In [None]:
itemid = 'ca_m_3411936_se_11_060_20200521'
rast_reader = sr.get_raster_from_item(sr.get_item_from_id(itemid))

In [None]:
rast_reader.crs

In [None]:
polys.crs

In [None]:
def count_pixels_in_polygons(polys, rast_reader):
    """
        Counts the approximate number of pixels in a raster covered by each polygon in a list.
        No need to match CRS: to do the count it internally matches the CRS of the polygons and the raster. 
            Parameters:
                        polys (geopandas.geodataframe.GeoDataFrame): 
                            GeoDataFrame with geometry column of type shapely.geometry.polygon.Polygon
                        rast_reader (rasterio.io.DatasetReader):
                            reader to the raster on which we will "overlay" the polygons to count the pixels covered
            Returns:
                    n_pixels (numpy.ndarray): 
                        approximate number of pixels from raster covered by each polygon
            
    """
    # convert to same crs as raster to properly calculate area of polygons
    if polys.crs != rast_reader.crs:
        print('matched crs')
        polys = polys.to_crs(rast_reader.crs)
    
    # area of a single pixel from raster resolution    
    pixel_size = rast_reader.res[0]*rast_reader.res[1]
    
    # get approx number of pixels by dividing polygon area by pixel_size
    n_pixels = polys.geometry.apply(lambda p: int((p.area/pixel_size)))
    
    return  n_pixels.to_numpy()

In [None]:
n_pixels = count_pixels_in_polygons(polys, rast_reader)

In [None]:
total_pixels = np.sum(n_pixels)
#n_pts = [ total_pts * x/total_pixels for x in n_pixels]
n_pts = n_pixels / total_pixels * total_pts

In [None]:
n_pixels

In [None]:
n_pts_int = n_pts.astype('int')
n_pts_int