# Estimations des paramètres GEV pour une même cellule

Ce notebook analyse les paramètres GEV calculés pour des stations dans une même cellule. Chargeons d'abord les packages essentiels pour cette étude.

In [1]:
using CSV, DataFrames, Distributions, Random, StatsBase
using Extremes, Dates, Gadfly
using Optim

using IDF

import Plots #pour faire des graphiques

In [2]:
PROVINCES = ["NB", "NL", "NS", "ON", "PE", "QC"]#provinces considerees

DURATION = "24 h"

"24 h"

In [3]:
same_cell = IDF.load_data("same_cell")

first(same_cell, 10)

Unnamed: 0_level_0,Name,Province,ID,Lat,Lon,Elevation,GridCell
Unnamed: 0_level_1,String,String,String,Float64,Float64,Int64,Int64
1,ROYAL ROAD,NB,8104480,46.05,-66.72,115,33632
2,ROYAL ROAD WEST,NB,8104482,46.08,-66.73,160,33632
3,PORT WELLER (AUT),ON,6136699,43.25,-79.22,79,20197
4,ST CATHARINES A,ON,6137287,43.2,-79.17,97,20197
5,CAMBRIDGE GALT MOE,ON,6141095,43.33,-80.32,268,18645
6,PRESTON WPCP,ON,6146714,43.38,-80.35,272,18645
7,WATERLOO WELLINGTON A,ON,6149387,43.45,-80.38,317,18645
8,MAPLE,ON,6154950,43.87,-79.48,244,19620
9,TORONTO YORK MILLS,ON,615HHDF,43.75,-79.38,153,19620
10,TORONTO NORTH YORK,ON,615S001,43.78,-79.47,187,19620


Créons la liste des numéros de cellules :

In [24]:
cells = unique(same_cell[:, :GridCell])

14-element Vector{Int64}:
 33632
 20197
 18645
 19620
 19619
 19813
 19814
 19621
 28003
 28002
 25464
 28572
 26047
 27434

In [19]:
parameters = CSV.read("C:/Users/leogu/Dropbox/Stage/Perso/Codes/Julia/results/parameters_ex_"*DURATION*".csv", DataFrame)

filter!(row -> row[:StationID] ∈ same_cell[:,:ID], parameters)

same_cell[:μ] = parameters[:μₑ]
same_cell[:ϕ] = parameters[:ϕₑ]
same_cell[:ξ] = parameters[:ξₑ]
same_cell[:BIC] = parameters[:BIC]


first(same_cell, 10)

Unnamed: 0_level_0,Name,Province,ID,Lat,Lon,Elevation,GridCell
Unnamed: 0_level_1,String,String,String,Float64,Float64,Int64,Int64
1,ROYAL ROAD,NB,8104480,46.05,-66.72,115,33632
2,ROYAL ROAD WEST,NB,8104482,46.08,-66.73,160,33632
3,PORT WELLER (AUT),ON,6136699,43.25,-79.22,79,20197
4,ST CATHARINES A,ON,6137287,43.2,-79.17,97,20197
5,CAMBRIDGE GALT MOE,ON,6141095,43.33,-80.32,268,18645
6,PRESTON WPCP,ON,6146714,43.38,-80.35,272,18645
7,WATERLOO WELLINGTON A,ON,6149387,43.45,-80.38,317,18645
8,MAPLE,ON,6154950,43.87,-79.48,244,19620
9,TORONTO YORK MILLS,ON,615HHDF,43.75,-79.38,153,19620
10,TORONTO NORTH YORK,ON,615S001,43.78,-79.47,187,19620


On a les données déjà obtenues pour les stations séparément, maintenant essayons de voir ce que l'on obtient en considérant toutes les données simultanées pour une même cellule. Commençons par extraire les données adéquates :

In [26]:
dat = DataFrame(StationName = String[],
                StationID = String[],
                GridCell = Int64[],
                Year = Int64[],
                Duration = String[],
                Pcp = Float64[])

for i in 1:(nrow(same_cell))
    df = load_station(same_cell[i,:ID])
    df[!, :StationName] .= same_cell[i,:Name]
    df[!, :StationID] .= same_cell[i, :ID]
    df[!, :GridCell] .= same_cell[i, :GridCell]
    append!(dat, df)
end

filter!(row -> row[:Duration] == DURATION, dat)

first(dat, 10)

Unnamed: 0_level_0,StationName,StationID,GridCell,Year,Duration,Pcp
Unnamed: 0_level_1,String,String,Int64,Int64,String,Float64
1,ROYAL ROAD,8104480,33632,1966,24 h,49.0
2,ROYAL ROAD,8104480,33632,1967,24 h,51.8
3,ROYAL ROAD,8104480,33632,1968,24 h,57.1
4,ROYAL ROAD,8104480,33632,1969,24 h,84.6
5,ROYAL ROAD,8104480,33632,1970,24 h,79.5
6,ROYAL ROAD,8104480,33632,1971,24 h,49.8
7,ROYAL ROAD,8104480,33632,1972,24 h,49.5
8,ROYAL ROAD,8104480,33632,1973,24 h,68.8
9,ROYAL ROAD,8104480,33632,1974,24 h,38.6
10,ROYAL ROAD,8104480,33632,1975,24 h,39.1


Ensuite, rentrons les paramètres GEV pour chaque cellule :

In [27]:
parameters_cell = DataFrame(GridCell = Int64[],
                            μ = Float64[],
                            ϕ = Float64[],
                            ξ = Float64[],
                            BIC = Float64[])

# fonction pour avoir directement les precipitations de toutes une meme cellule sous forme de vecteur
function Pcp_cell(cell::Int64)
    y = dat[dat[:,:GridCell].== cell,:Pcp]
    return y
end

for cell in cells
    y = Pcp_cell(cell)
    
    μ = mean(y)
    ϕ = log(std(y))
    ξ = 0
    
    p = [μ, ϕ, ξ]
    
    try
        p = gevfit(y).θ̂
    catch
        println("L'algorithme n'a pas convergé")
    end
    
    df = DataFrame(GridCell = cell,
                    μ = p[1],
                    ϕ = p[2],
                    ξ = p[3],
                    BIC = BIC_GEV(y))
    append!(parameters_cell, df)
end

CSV.write("results/parameters_cells_$DURATION.csv", parameters_cell)

parameters_cell

Unnamed: 0_level_0,GridCell,μ,ϕ,ξ,BIC
Unnamed: 0_level_1,Int64,Float64,Float64,Float64,Float64
1,33632,49.3909,2.4936,0.128704,326.133
2,20197,43.8642,2.43295,-0.0370084,403.442
3,18645,47.5664,2.52282,0.0529783,608.506
4,19620,38.6082,2.5676,0.0597505,470.269
5,19619,40.2431,2.31697,0.162691,763.872
6,19813,38.4133,2.2714,0.119748,435.511
7,19814,39.1403,2.04699,0.0727439,281.94
8,19621,39.6257,2.39101,0.122522,260.868
9,28003,52.7117,2.71039,0.0473589,660.297
10,28002,54.8539,2.67298,-0.432372,461.107


Enfin, entrons les différences entre les paramètres indépendants et ceux obtenus par concaténation des données :

In [8]:
parameters_δ = DataFrame(StationName = String[],
                        StationID = String[],
                        GridCell = Int64[],
                        δμ = Float64[],
                        δϕ = Float64[],
                        δξ = Float64[])

for i in 1:(nrow(same_cell))
    row_cell = filter(row -> row[:GridCell] == same_cell[i, :GridCell], parameters_cell)
    df = DataFrame(StationName = same_cell[i, :Name],
                    StationID = same_cell[i, :ID],
                    GridCell = same_cell[i, :GridCell],
                    δμ = parameters[i, :μₒ] - row_cell[:1, :μ],
                    δϕ = parameters[i, :ϕₒ] - row_cell[:1, :ϕ],
                    δξ = parameters[i, :ξₒ] - row_cell[:1, :ξ])
    
    append!(parameters_δ, df)
end

CSV.write("results/parameters_diffcell_$DURATION.csv", parameters_δ)
first(parameters_δ, 10)

Unnamed: 0_level_0,StationName,StationID,GridCell,δμ,δϕ,δξ
Unnamed: 0_level_1,String,String,Int64,Float64,Float64,Float64
1,ROYAL ROAD,8104480,33632,0.012361,-0.418667,0.885022
2,ROYAL ROAD WEST,8104482,33632,-8.70694,-0.0931592,-0.207384
3,PORT WELLER (AUT),6136699,20197,10.9346,0.173824,-0.149204
4,ST CATHARINES A,6137287,20197,4.49942,0.226454,0.0508576
5,CAMBRIDGE GALT MOE,6141095,18645,1.70308,0.234921,0.0296546
6,PRESTON WPCP,6146714,18645,4.06941,0.381918,-0.427691
7,WATERLOO WELLINGTON A,6149387,18645,13.1245,0.882951,-0.0529783
8,MAPLE,6154950,19620,16.6704,0.16761,-0.0641418
9,TORONTO YORK MILLS,615HHDF,19620,12.8376,0.291518,-0.00849026
10,TORONTO NORTH YORK,615S001,19620,10.2415,-0.0578067,0.0828874


Ensuite, nous allons comparer les BIC calculés sur les cellules et les BIC sur les stations indépendamment

In [36]:
BIC = []

for i in 1:(nrow(parameters_cell))
    stations = filter(row -> row[:GridCell] == parameters_cell[i, :GridCell], same_cell)
    s = sum(stations[:BIC])
    append!(BIC, s)
end

parameters_cell[:BIC_ind] = BIC
parameters_cell[:BIC_diff] = parameters_cell[:BIC] - parameters_cell[:BIC_ind]

parameters_cell

Unnamed: 0_level_0,GridCell,μ,ϕ,ξ,BIC,BIC_ind,BIC_diff
Unnamed: 0_level_1,Int64,Float64,Float64,Float64,Float64,Any,Float64
1,33632,49.3909,2.4936,0.128704,326.133,332.075,-5.94165
2,20197,43.8642,2.43295,-0.0370084,403.442,409.369,-5.92731
3,18645,47.5664,2.52282,0.0529783,608.506,619.197,-10.6908
4,19620,38.6082,2.5676,0.0597505,470.269,891.257,-420.988
5,19619,40.2431,2.31697,0.162691,763.872,327.251,436.62
6,19813,38.4133,2.2714,0.119748,435.511,383.686,51.8251
7,19814,39.1403,2.04699,0.0727439,281.94,202.191,79.749
8,19621,39.6257,2.39101,0.122522,260.868,431.916,-171.048
9,28003,52.7117,2.71039,0.0473589,660.297,672.043,-11.7463
10,28002,54.8539,2.67298,-0.432372,461.107,539.366,-78.2588


In [29]:
stations = filter(row -> row[:GridCell] == parameters_cell[1, :GridCell], same_cell)

Unnamed: 0_level_0,Name,Province,ID,Lat,Lon,Elevation,GridCell,μ
Unnamed: 0_level_1,String,String,String,Float64,Float64,Int64,Int64,Float64
1,ROYAL ROAD,NB,8104480,46.05,-66.72,115,33632,48.8501
2,ROYAL ROAD WEST,NB,8104482,46.08,-66.73,160,33632,50.1493


In [30]:
sum(stations[:BIC])

332.074801069891