In [2]:
# Import useful libraries
import json, pandas as pd, pydeck as pdk, ee, ipyfilechooser, ipywidgets, datetime, sentinel_satellites

In [3]:
# Initializes the Google Earth Engine APIs
ee.Authenticate()
ee.Initialize()


Successfully saved authorization token.


# Features Extraction

The project aim is to create a Machine Learning model capable of detecting the dates when a crop field has been manured, using satellite data.

Before even starting considering models or even doing an exploratory analysis, it is useful to extract all the features (both optical and radar) from sentinel satellites over the specified crop fields.

Feature extraction is fundamental in Earth Observation (EO) because it enables us to extract valuable information from large and complex satellite data. EO is a field that involves the collection and analysis of data about the Earth's surface and atmosphere from satellites and other airborne sensors. The data acquired from these sensors are usually vast, complex, and contain a wealth of information that needs to be extracted to make sense of them.

## Import a JSON file containing crop fields details

The structure of the JSON file must comply with two very simple rules ([example](../Datasets/main/main-fields.json)):
* it should contain the **crop field name** (preferred if each crop field have a unique name)
* it should contain the **set of coordinates** that are composing the single crop field (closed polygon geometry)

In [4]:
# Choose the file (it must be a JSON file)
file_chooser = ipyfilechooser.FileChooser(path='../Datasets/main/', filename='main-fields.json', select_default=True, use_dir_icons=True, filter_pattern='*.json')
display(file_chooser)

FileChooser(path='/Users/francesco/Documents/Documents/Università (Università degli Studi, PV)/2_ MAGISTRALE…

In [5]:
# Load JSON data from file
with open(file_chooser.selected) as f:
    data = json.load(f)

# Create DataFrame with properties excluding 'manure_dates' column
fields_df = pd.DataFrame([{k:v for k,v in f['properties'].items() if k!='manure_dates'} for f in data['features']])

# Add column with coordinates for each field
fields_df['polygon_coordinates'] = [[tuple(c) for c in p] for f in data['features'] for p in f['geometry']['coordinates']]

# Create a dataframe that just has the columns crop_field_name and manure_dates
y_df = pd.DataFrame([{k:v for k,v in f['properties'].items() if k!='polygon_coordinates'} for f in data['features']])

In [6]:
# Show the entire dataframe
entire_df = fields_df.merge(y_df, on='crop_field_name')
entire_df

Unnamed: 0,crop_field_name,polygon_coordinates,manure_dates
0,P-BLD,"[(-4.202723286616649, 43.39683579015289), (-4....",[2022-05-26]
1,P-BLLT1,"[(-4.085622203603083, 43.429605845026266), (-4...",[2022-05-16]
2,P-BLLT2,"[(-4.084840437376829, 43.430826294936246), (-4...",[2022-05-26]
3,P-CBRCS1,"[(-4.200826431306206, 43.39067464298489), (-4....",[2022-05-26]
4,P-CBRCS2,"[(-4.204911872695676, 43.3876170244562), (-4.2...",[2022-05-26]
5,P-CLGT,"[(-4.111699726693341, 43.39830644556494), (-4....",[2022-05-16]
6,P-CLMBRS,"[(-4.544769098140127, 43.38040395682432), (-4....",[2022-05-26]
7,P-CMNTR,"[(-4.147208715069137, 43.40038457218137), (-4....",[2022-05-16]
8,P-DR,"[(-4.142486752802821, 43.396858931472195), (-4...",[2022-03-21]
9,P-FNFR,"[(-4.265940418729373, 43.38866671614796), (-4....",[2022-05-16]


## Show crop fields locations on Earth-map
The objective is to show where our crop fields are geographically placed. This can provide several benefits, including:
* **Spatial context:** it can provide a spatial context that makes it easier to understand the geographic distribution of the fields. This can be especially useful for people who are not familiar with the area or the crops being considered
* **Data exploration:** it can make it easier to explore the data contained in the JSON file. Users can zoom in and out, pan, and filter the data to focus on specific areas or types of crops
* **Data validation:** it can help validate the data contained in the JSON file. Users can visually confirm that the crop fields are located in the correct locations, and identify any potential errors or discrepancies in the data
* **Communication:** it can be a powerful way to communicate data to others. Users can share the map with stakeholders or the public to help them understand the geographic distribution of crop fields

In [7]:
# Define the layer with a tooltip
layer = pdk.Layer(
    'PolygonLayer',
    data=entire_df,
    get_polygon='polygon_coordinates',
    get_fill_color=[255, 255, 0, 100],
    get_line_color=[255, 255, 0, 100],
    stroked=True,
    filled=True,
    lineWidthMinPixels=3,
    pickable=True,
    auto_highlight=True,
)

# Define the initial view state of the map
view_state = pdk.ViewState(
    longitude=fields_df.polygon_coordinates[0][0][0],
    latitude=fields_df.polygon_coordinates[0][0][1],
    zoom=7.8
)

# Create the map with the layers and the initial view state
r = pdk.Deck(layers=layer, initial_view_state=view_state,)

# Show the map
r.show()


DeckGLWidget(carto_key=None, custom_libraries=[], google_maps_key=None, json_input='{\n  "initialViewState": {…

It can be noticed that our fields are placed in the Northern part of Spain. Please consider generalization issue.

## Features extraction - using sentinel-satellites PyPI library

The objective is to generate a dataset that contains for each field, for each time the satellites (sentinel 1 and sentinel 2) have passed on the field (in a period, specified by the user), all the phisical indicators that will be further used to build the final model.

This procedure has been designed to be performed in parallel in order to exploit the computational power of the machine (since each field is indipendent with the others).

Why you should use sentinel-satellites? [Sentinel-satellites](https://pypi.org/project/sentinel-satellites/) PyPI library provides **a useful, user friendly and powerful toolset for extracting Sentinel data in Python** (exploiting GEE APIs), and can be a valuable resource for researchers, analysts, and others working with Earth observation data.

### Select time-span to extract features

In [8]:
start_date_widget = ipywidgets.widgets.DatePicker(description='Start date', value=datetime.date(2022, 1, 1), disabled=False)
display(start_date_widget)

end_date_widget = ipywidgets.widgets.DatePicker(description='End date', value=datetime.date(2022, 12, 31), disabled=False)
display(end_date_widget)

DatePicker(value=datetime.date(2022, 1, 1), description='Start date')

DatePicker(value=datetime.date(2022, 12, 31), description='End date')

### Sentinel 2 (optical features)

In [9]:
# Get all the mean features for the crop fields inside the dataframe, within a time period, using sentinel 2 satellites
fields_s2_features_extracted_df = sentinel_satellites.get_features(fields_df, start_date_widget.value, end_date_widget.value, sentinel=2, fields_threads=4)
# Add manure dates
fields_s2_features_extracted_df = fields_s2_features_extracted_df.merge(y_df, on=str(y_df.columns[0]))

In [10]:
# Show the dataframe
fields_s2_features_extracted_df

Unnamed: 0,crop_field_name,s2_acquisition_date,B1,B2,B3,B4,B5,B6,B7,B8,...,CARI1,CARI2,MCARI,MCARI1,MCARI2,BSI,GLI,ALTERATION,SDI,manure_dates
0,P-BLD,2022-01-06,2.260204,119.981293,550.044218,234.045918,1055.875850,3447.054422,3945.947279,4264.421769,...,4.454413e+05,4982.086081,-445.160158,6251.747919,0.000112,-0.401654,0.576730,2.146924,3446.937773,['2022-05-26']
1,P-BLD,2022-01-16,77.833333,163.544218,558.989796,246.552721,1064.484694,3564.574830,4160.479592,4440.142857,...,4.278992e+05,5022.796735,-478.064673,6495.934367,0.000106,-0.401654,0.521299,2.169754,3599.301217,['2022-05-26']
2,P-BLD,2022-01-26,1092.221088,1174.479592,1585.828231,1284.115646,2188.639456,4718.545918,5334.006803,5582.107143,...,3.655315e+05,10421.150048,-1599.747754,6645.637753,0.000068,-0.262987,0.123052,1.496120,3780.577055,['2022-05-26']
3,P-BLD,2022-02-05,1271.739796,1333.068027,1797.221088,1433.763605,2455.855442,5567.085034,6378.239796,6670.309524,...,3.806352e+05,12805.443716,-1797.251256,8365.554349,0.000057,-0.267715,0.134532,1.550023,4697.868215,['2022-05-26']
4,P-BLD,2022-02-10,2838.678571,2966.909864,3020.103741,2772.358844,3200.421769,4275.404762,4614.176871,4698.974490,...,1.233776e+06,7863.844417,-2495.168966,3164.277716,0.000045,-0.170590,0.026876,1.226493,2537.616534,['2022-05-26']
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
610,P-VNS,2022-08-19,1253.485202,1417.915695,1698.800897,1708.236771,2243.918386,3324.786547,3697.008072,3859.205381,...,3.286727e+05,9414.838817,-1714.781734,2654.091272,0.000066,0.047329,0.032852,1.379813,793.727002,['2022-04-23']
611,P-VNS,2022-10-23,4059.667265,4026.379372,3709.677130,3366.669058,3713.954260,4235.782063,4364.606278,4536.184753,...,1.298689e+06,7940.244297,-2953.701540,1992.912193,0.000030,-0.102150,-0.001769,1.107208,1347.519631,['2022-04-23']
612,P-VNS,2022-11-12,1443.568610,1602.507623,1970.007175,1972.767713,2587.695964,3854.926457,4186.630493,4406.135426,...,3.672974e+05,12126.114263,-1986.483171,4071.528794,0.000060,-0.035251,0.063214,1.417535,1860.949704,['2022-04-23']
613,P-VNS,2022-11-17,1133.057399,1364.994619,1726.755157,1631.587444,2309.713004,3881.580269,4315.133632,4503.217040,...,3.714358e+05,10259.538288,-1723.213210,4921.807863,0.000069,-0.084921,0.088036,1.469676,2301.094478,['2022-04-23']


#### Store the obtained dataset

In [11]:
# Compressed .csv files, to take less memory space
filename = file_chooser.selected_path + '/' + file_chooser.selected_filename.split('.')[0]
fields_s2_features_extracted_df.to_csv(filename + '-s2-features-extracted.gz', header=True, index=False, compression='gzip')

### Sentinel 1 (radar features)

In [12]:
# Get all the mean features for the crop fields inside the dataframe, within a time period, using sentinel 1 satellites
fields_s1_features_extracted_df = sentinel_satellites.get_features(fields_df, start_date_widget.value, end_date_widget.value, sentinel=1, fields_threads=4)
# Add manure dates
fields_s1_features_extracted_df = fields_s1_features_extracted_df.merge(y_df, on=str(y_df.columns[0]))

In [13]:
# Show the dataframe
fields_s1_features_extracted_df

Unnamed: 0,crop_field_name,s1_acquisition_date,VV,VH,AVE,DIF,RAT1,RAT2,NDI,RVI,manure_dates
0,P-BLD,2022-01-07,0.092826,0.024981,0.058904,0.067845,4.433061,0.310939,0.550754,0.898493,['2022-05-26']
1,P-BLD,2022-01-08,0.048492,0.010486,0.029489,0.038006,6.347387,0.236885,0.633442,0.733116,['2022-05-26']
2,P-BLD,2022-01-19,0.063989,0.012711,0.038350,0.051278,7.147208,0.241408,0.643399,0.713202,['2022-05-26']
3,P-BLD,2022-01-20,0.040962,0.012790,0.026876,0.028172,4.073362,0.396756,0.473732,1.052536,['2022-05-26']
4,P-BLD,2022-01-31,0.053555,0.012323,0.032939,0.041232,6.004133,0.261394,0.619382,0.761236,['2022-05-26']
...,...,...,...,...,...,...,...,...,...,...,...
1914,P-VNS,2022-11-28,0.123947,0.027985,0.075966,0.095962,6.340434,0.270724,0.604252,0.791495,['2022-04-23']
1915,P-VNS,2022-12-09,0.076697,0.021648,0.049173,0.055049,5.111079,0.342721,0.532146,0.935708,['2022-04-23']
1916,P-VNS,2022-12-10,0.089545,0.014371,0.051958,0.075175,7.774091,0.207425,0.682870,0.634261,['2022-04-23']
1917,P-VNS,2022-12-21,0.057072,0.012252,0.034662,0.044821,6.242575,0.271425,0.603557,0.792885,['2022-04-23']


#### Store the obtained dataset

In [14]:
# Compressed .csv files, to take less memory space
filename = file_chooser.selected_path + '/' + file_chooser.selected_filename.split('.')[0]
fields_s1_features_extracted_df.to_csv(filename + '-s1-features-extracted.gz', header=True, index=False, compression='gzip')