# About

This notebook adds canopy height information to the points sampled in the notebook `2_sample_pts_from_polygos.ipynb` (csvs with points information are located in the temp folder). The canopy height rasters for Santa Barbara County were obtained from the California Forest Observatory (CFO) using the `1_download_CFO_canopy_height_raster.ipynb` notebook and are located in the SantabarbaraCounty_lidar folder. 


This notebook creates four additional temporary rasters from the CFO canopy height layer *H* to obtain additional canopy height features. These layers are avg_lidar, max_lidar, min_lidar, and min_max_diff. For a given year, the avg_lidar layer is created by replacing the value of a pixel *p* in *H* by the average of the values of *H* in a 3x3 window centered at *p* (effectively a convolution of the raster *H* with a 3x3 matrix with constant weights 1/9). The max_lidar is created by replacing the value of a pixel *p* in *J* with the maximum value of *H* in a 3x3 window centered at *p*. The min_lidar layer is created similarly, now taking the minimum value over the window. Finally, the min_max_diff layer is the difference between the max_lidar and the min_lidar layers. All the functions to create these raster layers and sample information from them are in `lidar_sampling_functions`. 


**NOTEBOOK VARIABLES:**

- `aoi_year` (int): the year of the points which will have lidar features added. Must be one of 2012, 2014, 2016, 2018 or 2020.

- `aois` (array): the areas of interest of the points which will have lidar features added. Must be a subset of `['campus_lagoon','carpinteria','gaviota', 'point_conception']`.

- `lidar_year` (int): the year of canopy height data from which to sample the lidar features for the points. Must be 2016, 2018, or 2020. Ideally, `aoi_year = lidar_year`, but due to data availability it is recommended to make `lidar_year=2016` when `aoi_year` equals 2014 or 2012. 

- `delete_pts` (bool): whether to delte the files with the original points or not

Notes: there are no points sampled from point_conception on campus point. The notebook automatically excludes this option. 


**OUTPUT:**
For each csv of points from the specified year and aoi, the notebook creates a dataframe with the original features from the intial points dataset (see notebook `2_sample_pts_from_polygons`) augmented with the canopy height, avg_lidar, max_lidar, min_lidar, and min_max_diff features sampled at the points location from the CFO canopy height raster from year `lidar_year`. 
The dataframes are then saved each as a csv file in the 'temp' folder.

In [1]:
import numpy as np
import pandas as pd
import os

# Assemble data frame with all sampled points

In [2]:
def path_tp_points_csv(aoi, year):
    # root for all polygons collected on naip scenes
    fp = os.path.join(os.getcwd(), 
                      'temp',
#                      aoi +'_pts_spectral_lidar_'+str(year)+'.csv')
                      aoi +'_points_'+str(year)+'.csv')
    return fp            

In [3]:
years = [2012,2014,2016,2018,2020]
aois = ['campus_lagoon','carpinteria','gaviota','point_conception']

In [4]:
li = []

for aoi in aois:
    for year in years:
        if ('point_conception' != aoi) or (year != 2016):  #there's no data for Point Conception on 2016
            sample = pd.read_csv(path_tp_points_csv(aoi,year))
            li.append(sample)

df = pd.concat(li, axis=0)

In [5]:
df.reset_index(drop=True,inplace=True)

In [6]:
df.drop(['Unnamed: 0'],axis=1, inplace=True)
df.head(3)

Unnamed: 0,geometry,iceplant,polygon_id,r,g,b,nir,x,y,year,month,day_in_year,naip_id,aoi
0,POINT (238556.9134408507 3810784.7933008512),1,0,142,121,100,173,238556.913441,3810785.0,2012,5,126,ca_m_3411934_sw_11_1_20120505_20120730,campus_lagoon
1,POINT (238557.1191380456 3810770.398440421),1,0,138,121,104,158,238557.119138,3810770.0,2012,5,126,ca_m_3411934_sw_11_1_20120505_20120730,campus_lagoon
2,POINT (238564.35314380075 3810810.2954679746),1,0,144,126,119,155,238564.353144,3810810.0,2012,5,126,ca_m_3411934_sw_11_1_20120505_20120730,campus_lagoon


In [7]:
df.columns

Index(['geometry', 'iceplant', 'polygon_id', 'r', 'g', 'b', 'nir', 'x', 'y',
       'year', 'month', 'day_in_year', 'naip_id', 'aoi'],
      dtype='object')

In [12]:
iceplant_proportions(df.iceplant)

no-iceplant:iceplant ratio     2.4 :1
          counts  percentage
iceplant                    
0         418741       70.21
1         177691       29.79



## Stats about sample distribution among aois and scenes

In [8]:
#checking all data was loaded
df.aoi.value_counts()

point_conception    224311
campus_lagoon       168714
carpinteria         126141
gaviota              77266
Name: aoi, dtype: int64

In [9]:
df.year.value_counts()

2020    178213
2018    157560
2014    105826
2012     86932
2016     67901
Name: year, dtype: int64

In [10]:
df.naip_id.value_counts()

ca_m_3412037_nw_10_060_20200607             89506
ca_m_3411934_sw_11_060_20180722_20190209    72441
ca_m_3412037_nw_10_1_20140603_20141030      62414
ca_m_3412037_nw_10_1_20120518_20120730      37126
ca_m_3412037_nw_10_060_20180913_20190208    35265
ca_m_3411936_se_11_060_20200521             34682
ca_m_3411934_sw_11_060_20200521             34267
ca_m_3411936_se_11_060_20180724_20190209    30523
ca_m_3411934_sw_11_.6_20160713_20161004     26768
ca_m_3411936_se_11_.6_20160713_20161004     24108
ca_m_3411934_sw_11_1_20120505_20120730      19816
ca_m_3412039_nw_10_060_20200522             19758
ca_m_3412039_nw_10_060_20180724_20190209    19331
ca_m_3411936_se_11_1_20120505_20120730      19020
ca_m_3411936_se_11_1_20140901_20141030      17808
ca_m_3412039_nw_10_.6_20160616_20161004     17025
ca_m_3411934_sw_11_1_20140601_20141030      15422
ca_m_3412039_nw_10_1_20120518_20120730      10970
ca_m_3412039_nw_10_1_20140603_20141030      10182
Name: naip_id, dtype: int64

In [13]:
len(df.naip_id.value_counts())

19

## Add NDVI feature

In [16]:
# df['ndvi']=(df.nir.astype('int16') - df.r.astype('int16'))/(df.nir.astype('int16') + df.r.astype('int16'))
# df.head(3)

Unnamed: 0,iceplant,r,g,b,nir,year,month,day,naip_id,polygon_id,geometry,lidar,max_lidar,min_lidar,min_max_diff,avg_lidar,day_in_year,aoi,ndvi
0,1,134,125,103,170,2012,5,5,ca_m_3411934_sw_11_1_20120505_20120730,0,POINT (238565.79498225075 3810768.627232482),2,2,0,2,1.111111,126,campus_lagoon,0.118421
1,1,130,114,101,164,2012,5,5,ca_m_3411934_sw_11_1_20120505_20120730,0,POINT (238553.15545424985 3810802.7926417096),2,3,0,3,0.888889,126,campus_lagoon,0.115646
2,1,132,110,98,160,2012,5,5,ca_m_3411934_sw_11_1_20120505_20120730,0,POINT (238552.77597268307 3810773.0767946127),1,3,0,3,1.222222,126,campus_lagoon,0.09589


In [17]:
# ORDER COLUMNS

df = df[['r','g','b','nir','ndvi',     # spectral
         'year','month','day_in_year', # date
         'lidar', 'max_lidar', 'min_lidar', 'min_max_diff', 'avg_lidar', # lidar
         'iceplant',                  
         'geometry',         # point coords (CRS is one from scene in naip_id)
         'aoi','naip_id', 'polygon_id']] #
df.head(3)

Unnamed: 0,r,g,b,nir,ndvi,year,month,day_in_year,lidar,max_lidar,min_lidar,min_max_diff,avg_lidar,iceplant,geometry,aoi,naip_id,polygon_id
0,134,125,103,170,0.118421,2012,5,126,2,2,0,2,1.111111,1,POINT (238565.79498225075 3810768.627232482),campus_lagoon,ca_m_3411934_sw_11_1_20120505_20120730,0
1,130,114,101,164,0.115646,2012,5,126,2,3,0,3,0.888889,1,POINT (238553.15545424985 3810802.7926417096),campus_lagoon,ca_m_3411934_sw_11_1_20120505_20120730,0
2,132,110,98,160,0.09589,2012,5,126,1,3,0,3,1.222222,1,POINT (238552.77597268307 3810773.0767946127),campus_lagoon,ca_m_3411934_sw_11_1_20120505_20120730,0


In [18]:
df.to_csv(os.path.join(os.getcwd(),'samples_for_model.csv'), index=False)

In [11]:
def iceplant_proportions(labels):
    unique, counts = np.unique(labels, return_counts=True)
    print('no-iceplant:iceplant ratio    ',round(counts[0]/counts[1],1),':1')
    n = labels.shape[0]
    perc = [round(counts[0]/n*100,2), round(counts[1]/n*100,2)]
    df = pd.DataFrame({'iceplant':unique,
             'counts':counts,
             'percentage':perc}).set_index('iceplant')
    print(df)
    print()