Drifts 20: individual seasonality analysis #61

DavorJ · 2020-09-27T19:08:15Z

Here we look at the seasonality of the series, and whether there are any patterns that can be deduced from it. (This is in contrast to #50 where we looked at the whole dataset.) An example:

Some explanations for the graphs:

The top left is the original timeseries. The middle left is the multi comparison (Drifts 04: Analysis of differences between one barometer and many others (improved) #51) and the bottom left is the reference comparison (Drifts 12: analysis of barometers based on KNMI series as reference #57). The two comparison plots have identically scaled Y-axis, which allows for better comparison of the differences (red curves). Here we see that KNMI comparison is a bit more noisy.
On each of these 3 timeseries 2 lasso-glmnet models are fitted which are only sensitive to periodic movement. The green model is strongly regularized (1-sigma) while the blue model is the best possible (under 10fold sequential cross-validation). In plain talk: the blue model will detect more seasonal patterns than the green one. What can we conclude here?
The original timeseries shows no periodicity. That is why the green line is a straight line with on the top left corner: 'Periods: /'.
The third plot from the left, which is just the original measurements plotted per year day, shows this:

I.e. the variance/noise is clearly changing from winter to summer time, but not the amplitude of the measurements, hence no periodicity. This is something which was detected on the whole dataset also: Drifts 03: Seasonality effects of air pressure #50.
Both difference plots do exhibit yearly periodicity (around 365.25 days) and even a slight linear trend:
This periodicity is also apparent from the yearly difference plot where measurements during summer are clearly lower than during winter, vs. reference data:
The histograms at the far right are equally scaled and just show de distribution of the differences (red curve) for a comparison between multi- and reference- analysis. They do not have any added value for seasonality analysis.
Then there are the DTFT plots. It is the result from the first equation from the Wikipedia page. DTFT transforms the signal into a frequency domain which is shown in these graphs:

The Y-axis is the average intensity of the periods and the red vertical line signifies the 365.25 days period. (Note1: frequency = 1/period). (Note2: DFT/FFT was uninformative here since its resolution is constant in frequency, not in period: i.e. for higher periods, the resolution was very low and unusable for plotting.) (Note3: no windowing function is used: the aliasing affect should normally only have effect on high frequencies, and we start our DTFT at period > 2 days.)
What do we learn from the DTFT plots? We learn which periods are most intense in the timeseries. In the above case, the 365.25 days period is most intense, followed by the +2000 days period which signifies a very long period curvature (or trend). The multi-comparison plot has a larger +2000 days curvature than the reference comparison, hence also why the green model detected it in multi-comparison plot.

Why do we use both glmnet and DTFT?

The DTFT gives an overall picture, over all periods, while glmnet would only show relevant "peaks". Also, glmnet approach is more robust in case there are discontinuities in the data, which sometimes happens. (This seems apparent from these many plots, but based on theory I do not see why that should be the case.)

Difference is noise between multi- and reference comparison?

I would expect lower noise in multi-comparison plots, but this is not always the case. It might have to do with barometer availability and difference in heights. I.e. if you have 200 barometers as reference, and suddenly you remove 100 from the coast region (which is lower than main land), then this would result in shift of the difference (red) curve. So compensation for height seems necessary. In #62 the barometer availability is analyzed. Anyway, I think that for the drift detection algorithm, we should first start with a selection of good barometers at same height, spread all over Flanders, and work with that for start.

This pdf contains all the plots.

DavorJ · 2020-09-27T19:36:33Z

Some weird cases

Sometimes the seasonality is not 365 days?

Sometimes there is quite some discrepancy between multi- and reference comparison

I wonder whether this could be the case due to constant addition of barometers from 2010 to 2015 (cf. #62)? The KNMI reference plot is very stable over a very long timeframe. So this case seems to be good for future multi-reference assessment.

Sometimes multi-comparison is much more noisy than KNMI reference cmparison

Not sure why...

DavorJ · 2020-09-28T15:06:06Z

So what can one conclude from all this?

That there indeed are some yearly repeating patterns on some barometers. Those that exhibit these patterns strongly are from the same type? Similar environment?
The repeating yearly pattern is mostly detected on differences, and not on the original series. The original pattern usually has no pattern. (glmnet - green curve is flat)
Multi comparison on 12h interval must be done exclusively on 12h interval barometers. If not, then the differences will jump between comparisons with 90 barometers and 60 barometers (cf. Drifts 21: availability of barometer data #62), which will introduce noise of some kind. This will be taken into consideration for the selection of reference barometers.
All barometers exhibit higher variance during winter, which must be a real effect. (cf. Drifts 03: Seasonality effects of air pressure #50)

fredericpiesschaert · 2020-10-15T07:38:59Z

when browsing through literature, the higher variance during winter as well as seasonal variations are indeed real effects. Here's an example , no access to the paper unfortunately, but there are many examples out there

fredericpiesschaert · 2020-10-15T08:32:17Z

the seasonal variation is usually explained by temperature: cold air is more dense and heavier than warm air, hence lower air pressure in summer

fredericpiesschaert · 2020-10-15T12:02:01Z

BAOL033X is indeed an interesting case. I would consider this the 'perfect' timeseries with seasonality and variance as expected and stable. One seasonality peak in the DFTF-plot and no drift sign at all afterwards. I guess this is how the DFTF of an unsuspected series should look like. Unfortunately, there aren't many cases like this

DavorJ · 2020-10-15T20:59:58Z

when browsing through literature, the higher variance during winter as well as seasonal variations are indeed real effects. Here's an example , no access to the paper unfortunately, but there are many examples out there

the seasonal variation is usually explained by temperature: cold air is more dense and heavier than warm air, hence lower air pressure in summer

We do indeed see higher variance during non-summer period, but we do not see the seasonal effect as well-pronounced as in the example from the paper (which is around 20 cmH2O).

This is a similar plot but with KNMI data (#56) where pressure is averaged over months (Drifts 31):

Higher variance during non-summer is obvious, but there is no dip in pressure during summer. The blue line is the loess-smoothed curve. So although the explanation of colder thus denser air during winter resulting in higher pressure makes sense to me, I fail to see it in our "valid" data. (i.e. if that statement was true, we would see the seasonal effect in all barometers, including KNMI.)

So from my understanding up to this point, seasonal effect I can only interpret through some failing mechanism (cf. temperature) of the barometer. (Which thus would also mean that the Chinese researchers had a failing barometer.)

fredericpiesschaert · 2020-10-16T09:00:41Z

I have no explanation for the summer dip either. If you look at the technical specifications of the barodivers, you would expect that temperature effects are covered

DavorJ added air pressure analysis labels Sep 27, 2020

DavorJ mentioned this issue Nov 1, 2020

Drifts 32/33: detect_drift() v01 #67

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drifts 20: individual seasonality analysis #61

Drifts 20: individual seasonality analysis #61

DavorJ commented Sep 27, 2020 •

edited

Loading

DavorJ commented Sep 27, 2020

DavorJ commented Sep 28, 2020

fredericpiesschaert commented Oct 15, 2020

fredericpiesschaert commented Oct 15, 2020

fredericpiesschaert commented Oct 15, 2020

DavorJ commented Oct 15, 2020 •

edited

Loading

fredericpiesschaert commented Oct 16, 2020

Drifts 20: individual seasonality analysis #61

Drifts 20: individual seasonality analysis #61

Comments

DavorJ commented Sep 27, 2020 • edited Loading

Why do we use both glmnet and DTFT?

Difference is noise between multi- and reference comparison?

DavorJ commented Sep 27, 2020

Some weird cases

Sometimes the seasonality is not 365 days?

Sometimes there is quite some discrepancy between multi- and reference comparison

Sometimes multi-comparison is much more noisy than KNMI reference cmparison

DavorJ commented Sep 28, 2020

fredericpiesschaert commented Oct 15, 2020

fredericpiesschaert commented Oct 15, 2020

fredericpiesschaert commented Oct 15, 2020

DavorJ commented Oct 15, 2020 • edited Loading

fredericpiesschaert commented Oct 16, 2020

DavorJ commented Sep 27, 2020 •

edited

Loading

DavorJ commented Oct 15, 2020 •

edited

Loading