# Premières estimations sur les stations météorologiques

Dans ce notebook, nous allons effectuer une prise en main de Julia, des données ainsi que des lois d'estimation GEV (Generalized Extreme Values).

Dans l'ordre, nous allons :

1. Charger et correctement mettre en forme les données météorologiques
2. Définir les fonctions d'estimation
3. Estimer par la log-vraisemblance les paramètres pour chaque station :
    a) d'abord sans spécifier le paramètre $\xi$.
    b) puis en considérant que toutes les stations disposent du même paramètre $\xi$.
4. Vérifier les proximités des paramètres pour les stations voisines

## 0. Chargement des bibliothèques utiles et des variables globales

In [2]:
using CSV, DataFrames, Distributions, Random, StatsBase
using Extremes, Dates, Gadfly
using Optim

import Plots #pour faire des graphiques

include("functions.jl")

BIC_GEV (generic function with 1 method)

In [3]:
PROVINCES = ["NB", "NL", "NS", "ON", "PE", "QC"]#provinces considerees

6-element Vector{String}:
 "NB"
 "NL"
 "NS"
 "ON"
 "PE"
 "QC"

## 1. Chargement et mise en forme des données

In [4]:
station_list = CSV.read("dat/_station_list.csv", DataFrame)

filter!(row -> row.Province ∈ PROVINCES , station_list)#on ne selectionne que les stations qui nous interessent

first(station_list, 10)

Unnamed: 0_level_0,Name,Province,ID,Lat,Lon,Elevation
Unnamed: 0_level_1,String,String,String,Float64,Float64,Int64
1,BEECHWOOD,NB,8100512,46.53,-67.67,91
2,BELLEDUNE,NB,8100514,47.9,-65.83,7
3,BOUCTOUCHE CDA CS,NB,8100593,46.43,-64.77,35
4,CHARLO AUTO,NB,8100885,47.98,-66.33,42
5,MIRAMICHI RCS,NB,8100989,47.02,-65.47,33
6,EDMUNDSTON,NB,8101303,47.42,-68.32,154
7,FREDERICTON A,NB,8101500,45.87,-66.53,20
8,FREDERICTON CDA CS,NB,8101605,45.92,-66.62,35
9,MONCTON INTL A,NB,8103201,46.12,-64.68,70
10,ROYAL ROAD,NB,8104480,46.05,-66.72,115


In [5]:
dat = DataFrame(StationName = String[],
                StationID = String[],
                Year = Int64[],
                Duration = String[],
                Pcp = Float64[])

for i in 1:(nrow(station_list))
    df = load_station(station_list[i,:ID])
    df[!, :StationName] .= station_list[i,:Name]
    df[!, :StationID] .= station_list[i, :ID]
    append!(dat, df)
end

first(dat, 10)

Unnamed: 0_level_0,StationName,StationID,Year,Duration,Pcp
Unnamed: 0_level_1,String,String,Int64,String,Float64
1,BEECHWOOD,8100512,1959,5 min,9.7
2,BEECHWOOD,8100512,1960,5 min,5.3
3,BEECHWOOD,8100512,1961,5 min,7.4
4,BEECHWOOD,8100512,1962,5 min,5.6
5,BEECHWOOD,8100512,1963,5 min,3.6
6,BEECHWOOD,8100512,1964,5 min,3.6
7,BEECHWOOD,8100512,1965,5 min,11.7
8,BEECHWOOD,8100512,1966,5 min,3.0
9,BEECHWOOD,8100512,1967,5 min,7.9
10,BEECHWOOD,8100512,1969,5 min,9.7


On sélectionne ensuite les stations pour lequels il y a une duree d'enregistrement souhaitee (stocké dans la variable DURATION)

In [6]:
DURATION = "24 h"

filter!(row -> row[:Duration] == DURATION, dat)
first(dat, 10)

Unnamed: 0_level_0,StationName,StationID,Year,Duration,Pcp
Unnamed: 0_level_1,String,String,Int64,String,Float64
1,BEECHWOOD,8100512,1959,24 h,118.6
2,BEECHWOOD,8100512,1960,24 h,45.2
3,BEECHWOOD,8100512,1961,24 h,69.3
4,BEECHWOOD,8100512,1962,24 h,56.4
5,BEECHWOOD,8100512,1963,24 h,46.0
6,BEECHWOOD,8100512,1964,24 h,43.9
7,BEECHWOOD,8100512,1965,24 h,74.2
8,BEECHWOOD,8100512,1966,24 h,50.3
9,BEECHWOOD,8100512,1967,24 h,54.6
10,BEECHWOOD,8100512,1969,24 h,92.2


## 2. Fonctions utiles pour la suite

In [7]:
# fonction pour avoir directement
function Pcp(stationID::String)
    y = dat[dat[:,:StationID].== stationID,:Pcp]
    return y
end

Pcp (generic function with 1 method)

On définit ensuite deux fonctions assez similaires, permettant pour un vecteur donné d'estimer les parametres de la GEV correspondante, à l'aide du package Optim.

La première fonction estime tous les paramètres, tandis que la deuxième prend en paramètre un $\xi$ fixé, pour les deux modes d'estimation différents de la suite.

In [9]:
function GEVparameters(Y::Vector{Float64})
    function f(p::Vector{Float64})
        return -logL(Y, p[1], p[2], p[3])
    end
    
    μ₀ = mean(Y)
    σ₀ = std(Y)
    ξ₀ = 0
    p₀ = [μ₀, σ₀, ξ₀]
    p = p₀
    
    try
        res = optimize(f, p₀)
        
         
        if Optim.converged(res)
            p = Optim.minimizer(res)
        else
            @warn "The maximum likelihood algorithm did not find a solution. Maybe try with different initial values or with another method. The returned values are the initial values."
            p = p₀
        end
        
    catch
        println("Error of scale with this vector")
    end
   
    
    return p
end

GEVparameters (generic function with 1 method)

In [10]:
function GEVparameters_xi(Y::Vector{Float64}, ξ::Real)
    function f(p::Vector{Float64})
        return -logL(Y, p[1], p[2], ξ)
    end
    
    μ₀ = mean(Y)
    σ₀ = std(Y)
    p₀ = [μ₀, σ₀]
    p = p₀
    
    try
        res = optimize(f, p₀)
        
         
        if Optim.converged(res)
            p = Optim.minimizer(res)
        else
            @warn "The maximum likelihood algorithm did not find a solution. Maybe try with different initial values or with another method. The returned values are the initial values."
            p = p₀
        end
        
    catch
        println("Error of scale with this vector")
    end
   
    
    return p
end

GEVparameters_xi (generic function with 1 method)

## 3. Estimation

On commence avec une estimation des parametres en considérant que les xi sont indépendants.

In [37]:
parameters = DataFrame(StationName = String[],
                        StationID = String[],
                        μₒ = Float64[],
                        ϕₒ = Float64[],
                        ξₒ = Float64[])

for i in 1:(nrow(station_list))
    y = Pcp(station_list[i, :ID])
    par_optim = GEVparameters(y)
    df = DataFrame(StationName = station_list[i, :Name],
                    StationID = station_list[i, :ID],
                    μₒ = par_optim[1],
                    ϕₒ = log(par_optim[2]),
                    ξₒ = par_optim[3])
    append!(parameters, df)
end

CSV.write("results/parameters_$DURATION.csv", parameters)
first(parameters, 10)

Error of scale with this vector
Error of scale with this vector
Error of scale with this vector
Error of scale with this vector
Error of scale with this vector
Error of scale with this vector


Unnamed: 0_level_0,StationName,StationID,μₒ,ϕₒ,ξₒ
Unnamed: 0_level_1,String,String,Float64,Float64,Float64
1,BEECHWOOD,8100512,49.4032,2.07493,1.01373
2,BELLEDUNE,8100514,40.6839,2.40044,-0.0786804
3,BOUCTOUCHE CDA CS,8100593,54.7989,2.60678,-0.186213
4,CHARLO AUTO,8100885,48.3637,2.65941,0.0138493
5,MIRAMICHI RCS,8100989,49.2695,2.75775,0.0826329
6,EDMUNDSTON,8101303,51.6358,2.90474,-0.374713
7,FREDERICTON A,8101500,60.6909,3.40578,0.0
8,FREDERICTON CDA CS,8101605,55.2787,2.73521,-0.00439128
9,MONCTON INTL A,8103201,51.4458,2.85912,0.0512602
10,ROYAL ROAD,8104480,48.8498,2.5098,0.142638


In [12]:
ξ = 0 #parametre xi a ajuster

parameters_xi = DataFrame(StationName = String[],
                        StationID = String[],
                        μ = Float64[],
                        σ = Float64[])

for i in 1:(nrow(station_list))
    y = Pcp(station_list[i, :ID])
    par = GEVparameters_xi(y, ξ)
    df = DataFrame(StationName = station_list[i, :Name],
                    StationID = station_list[i, :ID],
                    μ = par[1],
                    σ = par[2])
    append!(parameters_xi, df)
end

CSV.write("results/parameters_$DURATION _$ξ.csv", parameters_xi)

first(parameters_xi, 10)

Error of scale with this vector
Error of scale with this vector


Unnamed: 0_level_0,StationName,StationID,μ,σ
Unnamed: 0_level_1,String,String,Float64,Float64
1,BEECHWOOD,8100512,55.1483,15.0258
2,BELLEDUNE,8100514,40.2053,10.9353
3,BOUCTOUCHE CDA CS,8100593,53.4378,13.0739
4,CHARLO AUTO,8100885,48.4717,14.3542
5,MIRAMICHI RCS,8100989,49.9965,16.2436
6,EDMUNDSTON,8101303,48.244,16.3296
7,FREDERICTON A,8101500,60.6909,30.1377
8,FREDERICTON CDA CS,8101605,55.2422,15.3901
9,MONCTON INTL A,8103201,51.9356,17.7919
10,ROYAL ROAD,8104480,49.8363,13.12


### Utilisation du package Extremes

In [10]:
parameters_ex = DataFrame(StationName = String[],
                        StationID = String[],
                        μₑ = Float64[],
                        ϕₑ = Float64[],
                        ξₑ = Float64[],
                        BIC = Float64[])

for i in 1:(nrow(station_list))
    y = Pcp(station_list[i, :ID])
    
    μ₀ = mean(y)
    ϕ₀ = log(std(y))
    ξ₀ = 0
    p₀ = [μ₀, ϕ₀, ξ₀]
    par_ex = p₀
    BIC_var = 0
    
    try
        par_ex = gevfit(y).θ̂
        BIC_var = BIC_GEV(y)
    catch
        println("L'algorithme n'a pas convergé")
    end
    
    df = DataFrame(StationName = station_list[i, :Name],
                        StationID = station_list[i, :ID],
                        μₑ = par_ex[1],
                        ϕₑ = par_ex[2],
                        ξₑ = par_ex[3],
                        BIC = BIC_var)
    
    append!(parameters_ex, df)
end

CSV.write("results/parameters_ex_$DURATION.csv", parameters_ex)
first(parameters_ex, 10)

└ @ Extremes C:\Users\leogu\.julia\packages\Extremes\il3ma\src\parameterestimation\maximumlikelihood.jl:20


L'algorithme n'a pas convergé

└ @ Extremes C:\Users\leogu\.julia\packages\Extremes\il3ma\src\parameterestimation\maximumlikelihood.jl:20



L'algorithme n'a pas convergé


Unnamed: 0_level_0,StationName,StationID,μₑ,ϕₑ,ξₑ,BIC
Unnamed: 0_level_1,String,String,Float64,Float64,Float64,Float64
1,BEECHWOOD,8100512,49.4036,2.07497,1.01374,90.4445
2,BELLEDUNE,8100514,40.6841,2.40044,-0.0786836,149.741
3,BOUCTOUCHE CDA CS,8100593,54.7987,2.60677,-0.18623,146.956
4,CHARLO AUTO,8100885,48.3636,2.65942,0.0138525,444.859
5,MIRAMICHI RCS,8100989,49.2696,2.75775,0.0826289,335.015
6,EDMUNDSTON,8101303,51.6362,2.90478,-0.374743,110.217
7,FREDERICTON A,8101500,48.8006,2.41263,0.295614,98.7799
8,FREDERICTON CDA CS,8101605,55.2794,2.73524,-0.00441144,460.303
9,MONCTON INTL A,8103201,51.4459,2.85912,0.0512654,611.427
10,ROYAL ROAD,8104480,48.8501,2.50983,0.14263,226.661


In [41]:
parameters_δ = DataFrame(StationName = String[],
                        StationID = String[],
                        δμ = Float64[],
                        δϕ = Float64[],
                        δξ = Float64[])

for i in 1:(nrow(station_list))    
    df = DataFrame(StationName = station_list[i, :Name],
                        StationID = station_list[i, :ID],
                        δμ = parameters[i, :μₒ] - parameters_ex[i, :μₑ],
                        δϕ = parameters[i, :ϕₒ] - parameters_ex[i, :ϕₑ],
                        δξ = parameters[i, :ξₒ] - parameters_ex[i, :ξₑ])
    
    append!(parameters_δ, df)
end

CSV.write("results/parameters_diff_$DURATION.csv", parameters_δ)
first(parameters_δ, 10)

Unnamed: 0_level_0,StationName,StationID,δμ,δϕ,δξ
Unnamed: 0_level_1,String,String,Float64,Float64,Float64
1,BEECHWOOD,8100512,-0.000311312,-3.15042e-05,-1.3017e-05
2,BELLEDUNE,8100514,-0.000132853,6.93945e-06,3.23456e-06
3,BOUCTOUCHE CDA CS,8100593,0.000224572,8.44414e-06,1.72645e-05
4,CHARLO AUTO,8100885,1.58368e-05,-9.25123e-06,-3.26586e-06
5,MIRAMICHI RCS,8100989,-5.93182e-05,-7.35785e-06,4.0352e-06
6,EDMUNDSTON,8101303,-0.000336122,-4.0458e-05,2.98923e-05
7,FREDERICTON A,8101500,11.8903,0.993141,-0.295614
8,FREDERICTON CDA CS,8101605,-0.000720545,-2.68403e-05,2.01528e-05
9,MONCTON INTL A,8103201,-8.75599e-05,-4.67664e-07,-5.13769e-06
10,ROYAL ROAD,8104480,-0.000306995,-3.22487e-05,7.66201e-06
