<a href="https://colab.research.google.com/github/wurDevTim/Workshop_P4P/blob/main/correcting_meassurement_time.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Correcting for meassurement time
Systems like the cropreporter can only meassure one plant at a time, including dark/light adaptation there can be several hours between the meassurement of the first and last plant.

As you can imaging a lot can happen in 3 hours: plants can grow, diseases can spread. To correct for this it's advised to use a simple model to 'shift' all meassuring moment to the same time.

The biometric group at Wageningen developend an R package to do this and apply other corrections, for example for the location in the greenhouse: https://biometris.github.io/LMMsolver/

for more information on splines please see: https://en.wikipedia.org/wiki/Spline_(mathematics)

An important note is that the results can differ between R version, we also encountered this when preparing the workshop.



## setup
Next to python, R code can also be used in colab notebooks.
Here we call R from python instead with rpy2 to enable us to use python code as well.
An R cell is marked with '%%R' at the top of the cell.

In [None]:
%load_ext rpy2.ipython

In [None]:
# indicate that you're running R code
%%R

# Install packages
install.packages("LMMsolver")


In [None]:
# Import
%%R
library(LMMsolver)

In [None]:
# Mount google drive - not found an R alternative, using python instead.
from google.colab import drive
from os import path

datafolder = "/content/drive/My Drive/P4P_workshop_data"
# Check if the data folder is mounted correctly
if not path.exists(datafolder):
  drive.mount('/content/drive')

!ls "$datafolder"

In [None]:
# Function which uses the LMM solver to compute the spline.
%%R
##------------------------------------------------------------
# Inputs:
# - Dataframe with all the data
# - The unique identified for a specific plant
# - A list of traits to fir a spline for
# Return:
# - A list with predictions for all days in the meassuring period
##------------------------------------------------------------
compute_spline <- function(df, plant_identifier, trait_list)
{
  ### Fit 1D spline per plant
  for (i in c(1:length(unique(df[[plant_identifier]])))){
    plant_id = unique(df[[plant_identifier]])[i]
    one_plant <- df[df[[plant_identifier]] == plant_id,]
    datenum = one_plant[['datenum']]
    preddates <- data.frame(datenum = min(one_plant$date):max(one_plant$date))
    # Each day has 24*60*60 = 86400 hours
    preddates <- preddates * 86400

    # Fit 1D spline per trait
    for (trait in trait_list){
      # Check for inf values
      if (sum(is.infinite(one_plant[[trait]])) > 0) {
        print(paste('Warning: infinite value encoutered for plant: ', plant_id, ', trait: ',trait))
      }
      trait_df <- one_plant[!is.infinite(one_plant[[trait]]),]
      # Need at least 2 not inf values
      # Need at least 2 not inf values
      if (nrow(trait_df) > 2) {
        # Nan values will be removed, but they do cause warnings.
        m1 <- LMMsolve(fixed = as.formula(paste(trait, "~", 1)),
                       spline = ~spl1D(x = datenum, nseg = 20),
                       data = trait_df)
        #summary(m1)

        # Note, in some cases you might only want to predict from the first meassurement to the last.
        # Especially when the first/last one was a nan.
        prediction <- obtainSmoothTrend(m1, newdata = preddates,
                                        includeIntercept = T)
        # Rename ypred column
        names(prediction)[names(prediction) == 'ypred'] <- trait
      } else {
        print(paste('Warning: not enough values to process plant: ', plant_id, ', trait: ',trait))
        prediction <- preddates
        prediction[trait] <- NA
      }
      prediction <- prediction[,c("datenum",trait)]
      # Combine results
      if (trait == trait_list[1]){
        plant_predictions <- prediction
      } else {
        plant_predictions <- merge(plant_predictions, prediction, by='datenum')
      }
    }
    plant_predictions[[plant_identifier]] = plant_id
    if (i == 1){
      all_predictions <- plant_predictions
    } else {
      all_predictions <- rbind(all_predictions, plant_predictions)
    }
  }
  return(all_predictions)
}

# Example
In this example the cropreporter data from Lucia is used, which has been analysed beforehand.

Note: Systems like the cropreporter in NPEC use the local time. If your unlucky your experiment contains both winter & summer time. In this case we would advice to switch to UTC.

In [None]:
# Load the data
%%R
df <- read.csv('/content/drive/My Drive/P4P_workshop_data/cropreporter_traits.csv', sep = ",")
head(df)

In [None]:
%%R
# Converting to datetime object
df[['Datetime']] <- as.POSIXct(df[['Datetime']], format = "%Y-%m-%d, %H:%M:%OS", tz="Europe/Paris")


# Datenum stored as integer, exact datetime of measurement
df[['datenum']] <- as.integer(df[['Datetime']])

# Date is multiplied with 86400 to get value at 00:00:00 of each day
df[['date']] <- as.numeric(as.Date(df[['Datetime']]))

In [None]:
%%R
head(df)

In [None]:
%%R
# All columns
colnames(df)

In [None]:
%%R
# List of the columns to interpolate
trait_list <- list('mean_yii', 'MeanChlorophyll', 'MeanNdvi', 'MeanEgreen', 'MeanPsri', 'MeanAri', 'MeanMari')


In [None]:
%%R
# Compute the spline
predictions = compute_spline(df, 'PlantId', trait_list)

In [None]:
%%R
# Convert time back to datetime, trick question is this datetime now UTC or Amsterdan local time?
predictions[['Datetime']] <- as.POSIXct(predictions[['datenum']]	, origin="1970-01-01", tz = "UTC")

In [None]:
%%R
# Look at some results
head(predictions[predictions[['PlantId']] == unique(predictions[['PlantId']])[1],])

In [None]:
%%R
# Save predictions
write.csv(predictions, "/content/drive/My Drive/P4P_workshop_data/pspline_predictions.csv", row.names=FALSE)