<h1 style="font-size:2.0rem; color:green;"> Trend comparison ESA CCI VS in-situ  </h1>  

<div class="alert alert-block alert-success"; background-color:red> In this notebook, we will compare trends in the number of days with snow over land for each month using the original and gap-filled ESA CCI scfg data and in-situ observations of snow depth measured at the Ifrane station. The trends will be calculated over the period of intersection of ESA CCI data with in-situ observations 2005-2018. The purpose of this comparison is to show which product better approximates the trends, original ESA CCI or gap-filled? </div> 

<h1 style="font-size:1.5rem; color:green;"> Load required libraries </h1> 

In [1]:
# autoreload reloads modules automatically before entering the execution of code typed at the IPython prompt
%load_ext autoreload
%autoreload 2

import numpy as np
import pandas as pd
import netCDF4 as nc
from netCDF4 import Dataset  
import matplotlib.pyplot as plt
import xarray as xr         
import pymannkendall as mk
import seaborn as sns

In [2]:
# For parallelisation
from dask.distributed import Client
client = Client(n_workers=2, threads_per_worker=2, memory_limit='4GB')
client

Perhaps you already have a cluster running?
Hosting the HTTP server on port 50866 instead


0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:50866/status,

0,1
Dashboard: http://127.0.0.1:50866/status,Workers: 2
Total threads: 4,Total memory: 7.45 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:50867,Workers: 2
Dashboard: http://127.0.0.1:50866/status,Total threads: 4
Started: Just now,Total memory: 7.45 GiB

0,1
Comm: tcp://127.0.0.1:50885,Total threads: 2
Dashboard: http://127.0.0.1:50886/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:50871,
Local directory: C:\Users\Hamid\Desktop\scripts\1D analysis\dask-worker-space\worker-c5pe1be2,Local directory: C:\Users\Hamid\Desktop\scripts\1D analysis\dask-worker-space\worker-c5pe1be2

0,1
Comm: tcp://127.0.0.1:50882,Total threads: 2
Dashboard: http://127.0.0.1:50883/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:50870,
Local directory: C:\Users\Hamid\Desktop\scripts\1D analysis\dask-worker-space\worker-6t3pzccr,Local directory: C:\Users\Hamid\Desktop\scripts\1D analysis\dask-worker-space\worker-6t3pzccr


<h1 style="font-size:1.5rem; color:green;"> Importing and preparing in-situ data</h1>

In [3]:
path0 = r'C:\Data\Snow\SD\In-situ\modified_data\V1'             
IFR = pd.read_csv(path0 + r'\IFR_sd_max.csv')                                

In [4]:
IFR = IFR.rename(columns={'date':'date','sd_max':'sd'})

<div class="alert alert-block alert-success"; background-color:red> We convert the date into a format recognized by the Pandas module </div> 

In [5]:
IFR['date'] = pd.to_datetime(IFR['date'], yearfirst=True, dayfirst=True, errors='coerce')

In [6]:
for l in range(len(IFR)):  
    if str(IFR['date'][l])=='NaT':
        IFR = IFR.drop(l, axis=0)  

<div class="alert alert-block alert-success"; background-color:red> Sometime data are string format, we convert it into numeric format  </div> 

In [7]:
IFR['sd'] = pd.to_numeric(IFR['sd'])

<div class="alert alert-block alert-success"; background-color:red> We out data column as index </div> 

In [8]:
IFR = IFR.set_index('date')

<div class="alert alert-block alert-success"; background-color:red> We select the periods of intersection with the other data types  </div> 

In [9]:
IFR = IFR['2005-01-01':'2014-12-31']

In [10]:
IFR = IFR.fillna(-9999)

<h1 style="font-size:1.5rem; color:green;"> Importing and preparing ESA CCI 
scfg gap-filled and not gap-filled data at the pixels containing the above-mentionned stations </h1>

In [11]:
path1 = r'C:\Data\Snow\SCFG\Satellite\modified_data\V0\extracted_points'   

IFR_scfg = pd.read_csv( path1 + r'\scfg_orig\scfg_orig_IFR.csv', sep=",") 
IFR_scfg_interp = pd.read_csv( path1 + r'\scfg_gf\scfg_gf_IFR.csv', sep=",")

# =======================================================================================================

path2 = r'C:\Data\Snow\SCFG\Satellite\modified_data\V0\extracted_points\cov' 

IFR_cov = pd.read_csv(path2+ '/cov_IFR.csv', sep=",")

In [12]:
IFR_scfg['date'] = pd.to_datetime(IFR_scfg['date'], dayfirst=True, yearfirst=True)
IFR_scfg_interp['date'] = pd.to_datetime(IFR_scfg_interp['date'],  dayfirst=True, yearfirst=True)

<div class="alert alert-block alert-success"; background-color:red> Calculation of the monthly average number of days per month covered by the data  </div> 

In [28]:
IFR_cov['date'] = pd.to_datetime(IFR_cov['date'], yearfirst=True)

In [14]:
IFR_scfg.set_index('date', inplace=True)
IFR_scfg_interp.set_index('date', inplace=True)

In [15]:
IFR_cov.set_index('date', inplace=True)

In [16]:
IFR_scfg = IFR_scfg['2005-01-01':'2014-12-31']

In [17]:
IFR_scfg_interp = IFR_scfg_interp['2005-01-01':'2014-12-31']

In [18]:
IFR_cov = IFR_cov['2005-01-01':'2014-12-31']

In [19]:
IFR_scfg.loc[IFR_scfg['scfg'] > 100, 'scfg'] = -9999

In [20]:
IFR_scfg_interp = IFR_scfg_interp.fillna(-9999)

<div class="alert alert-block alert-success"; background-color:red> We run the following two cells, only if we want to plot the time series  </div> 

In [21]:
IFR_scfg = IFR_scfg.replace(-9999, np.nan)

<div class="alert alert-block alert-success"; background-color:red>  An auxiliary function for formatting of the figures  </div> 

In [22]:
def set_size(w,h, ax=None):
    """ w, h: width, height in inches """
    if not ax: ax=plt.gca()
    l = ax.figure.subplotpars.left
    r = ax.figure.subplotpars.right
    t = ax.figure.subplotpars.top
    b = ax.figure.subplotpars.bottom
    figw = float(w)/(r-l)
    figh = float(h)/(t-b)
    ax.figure.set_size_inches(figw, figh)

In [23]:
Snow_esa = IFR_scfg['scfg'].where(IFR_scfg['scfg'] > 0).groupby(by=[IFR_scfg.index.month,IFR_scfg.index.year]).count() 
Sum_esa = IFR_scfg['scfg'].where(IFR_scfg['scfg'] >= 0).groupby(by=[IFR_scfg.index.month,IFR_scfg.index.year]).count() 
Nan_esa = IFR_scfg['scfg'].where(IFR_scfg['scfg'] < 0).groupby(by=[IFR_scfg.index.month,IFR_scfg.index.year]).count()

Snow_esa_gf = IFR_scfg_interp['scfg_interp'].where(IFR_scfg_interp['scfg_interp'] > 0).groupby(by=[IFR_scfg_interp.index.month,IFR_scfg_interp.index.year]).count() 
Sum_esa_gf = IFR_scfg_interp['scfg_interp'].where(IFR_scfg_interp['scfg_interp'] >= 0).groupby(by=[IFR_scfg_interp.index.month,IFR_scfg_interp.index.year]).count() 
Nan_esa_gf = IFR_scfg_interp['scfg_interp'].where(IFR_scfg_interp['scfg_interp'] < 0).groupby(by=[IFR_scfg_interp.index.month,IFR_scfg_interp.index.year]).count()

Snow = IFR['sd'].where(IFR['sd'] > 0).groupby(by=[IFR.index.month,IFR.index.year]).count() 
Sum = IFR['sd'].where(IFR['sd'] >= 0).groupby(by=[IFR.index.month,IFR.index.year]).count() 
Nan = IFR['sd'].where(IFR['sd'] < 0).groupby(by=[IFR.index.month,IFR.index.year]).count()


years = [y for y in range(2005,2019)]
for mon in range(1,13):
    for year in years:
        try: 
            if Nan_esa[mon][year]>15:
                Nan_esa[mon][year] = -9999
                Snow_esa[mon][year] = -9999
                Nan[mon][year] = -9999
                
        except KeyError:  
            continue            

             
Nan_esa = Nan_esa.replace(-9999, np.nan)
Snow_esa = Snow_esa.replace(-9999, np.nan)
Nan = Nan.replace(-9999, np.nan)
  
IFR_scfg_Sdays =   Snow_esa + Snow_esa*Nan_esa/Sum_esa  
IFR_scfg_interp_Sdays =   Snow_esa_gf + Snow_esa_gf*Nan_esa_gf/Sum_esa_gf 
IFR_Sdays =   Snow + Snow*Nan/Sum   

In [24]:
IFR = pd.DataFrame(columns=["month","slope_orig","p_orig","slope_gf","p_gf","slope_situ","p_situ"])

In [25]:
slope_orig = []
slope_gf= []
slope_situ= []
p_orig = []
p_gf = []
p_situ = []

for mon in range(1,13):
    slope_orig.append(mk.original_test(IFR_scfg_Sdays[mon]).slope)
    p_orig.append(mk.original_test(IFR_scfg_Sdays[mon]).p)
        
    slope_gf.append(mk.original_test(IFR_scfg_interp_Sdays[mon]).slope)
    p_gf.append(mk.original_test(IFR_scfg_interp_Sdays[mon]).p)
    
    slope_situ.append(mk.original_test(IFR_Sdays[mon]).slope)
    p_situ.append(mk.original_test(IFR_Sdays[mon]).p)
    
IFR['month'] = [1,2,3,4,5,6,7,8,9,10,11,12]
IFR['slope_orig'] = slope_orig
IFR['p_orig'] = p_orig

IFR['slope_gf'] = slope_gf
IFR['p_gf'] = p_gf

IFR['slope_situ'] = slope_situ
IFR['p_situ'] = p_situ

print(IFR)

    month  slope_orig    p_orig  slope_gf      p_gf  slope_situ    p_situ
0       1   -0.666667  0.035251    -0.875  0.367232   -0.666667  0.104588
1       2    0.000000  0.924941     0.000  0.852892   -0.500000  0.653422
2       3    0.000000  0.120300     0.000  0.195460    0.000000  0.574708
3       4    0.000000  1.000000     0.000  1.000000    0.250000  0.349675
4       5    0.000000  1.000000     0.000  1.000000    0.000000  0.360765
5       6    0.000000  1.000000     0.000  1.000000    0.000000  1.000000
6       7    0.000000  1.000000     0.000  1.000000    0.000000  1.000000
7       8    0.000000  1.000000     0.000  1.000000    0.000000  1.000000
8       9    0.000000  1.000000     0.000  1.000000    0.000000  1.000000
9      10    0.000000  1.000000     0.000  1.000000    0.000000  0.727724
10     11    0.000000  0.163734     0.000  0.220671    0.200000  0.521658
11     12    0.000000  1.000000     0.000  1.000000    0.444444  0.201868


In [27]:
IFR.to_csv('trend_ESA_VS_Situ_2005-2018_IFR.csv')