Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drifts 20: individual seasonality analysis #61

Open
DavorJ opened this issue Sep 27, 2020 · 7 comments
Open

Drifts 20: individual seasonality analysis #61

DavorJ opened this issue Sep 27, 2020 · 7 comments

Comments

@DavorJ
Copy link
Collaborator

DavorJ commented Sep 27, 2020

Here we look at the seasonality of the series, and whether there are any patterns that can be deduced from it. (This is in contrast to #50 where we looked at the whole dataset.) An example:

image

Some explanations for the graphs:

  • The top left is the original timeseries. The middle left is the multi comparison (Drifts 04: Analysis of differences between one barometer and many others (improved) #51) and the bottom left is the reference comparison (Drifts 12: analysis of barometers based on KNMI series as reference #57). The two comparison plots have identically scaled Y-axis, which allows for better comparison of the differences (red curves). Here we see that KNMI comparison is a bit more noisy.
  • On each of these 3 timeseries 2 lasso-glmnet models are fitted which are only sensitive to periodic movement. The green model is strongly regularized (1-sigma) while the blue model is the best possible (under 10fold sequential cross-validation). In plain talk: the blue model will detect more seasonal patterns than the green one. What can we conclude here?
  • The original timeseries shows no periodicity. That is why the green line is a straight line with on the top left corner: 'Periods: /'.
    image
  • The third plot from the left, which is just the original measurements plotted per year day, shows this:
    image
    I.e. the variance/noise is clearly changing from winter to summer time, but not the amplitude of the measurements, hence no periodicity. This is something which was detected on the whole dataset also: Drifts 03: Seasonality effects of air pressure #50.
  • Both difference plots do exhibit yearly periodicity (around 365.25 days) and even a slight linear trend:
    image
  • This periodicity is also apparent from the yearly difference plot where measurements during summer are clearly lower than during winter, vs. reference data:
    image
  • The histograms at the far right are equally scaled and just show de distribution of the differences (red curve) for a comparison between multi- and reference- analysis. They do not have any added value for seasonality analysis.
  • Then there are the DTFT plots. It is the result from the first equation from the Wikipedia page. DTFT transforms the signal into a frequency domain which is shown in these graphs:
    image
    The Y-axis is the average intensity of the periods and the red vertical line signifies the 365.25 days period. (Note1: frequency = 1/period). (Note2: DFT/FFT was uninformative here since its resolution is constant in frequency, not in period: i.e. for higher periods, the resolution was very low and unusable for plotting.) (Note3: no windowing function is used: the aliasing affect should normally only have effect on high frequencies, and we start our DTFT at period > 2 days.)
  • What do we learn from the DTFT plots? We learn which periods are most intense in the timeseries. In the above case, the 365.25 days period is most intense, followed by the +2000 days period which signifies a very long period curvature (or trend). The multi-comparison plot has a larger +2000 days curvature than the reference comparison, hence also why the green model detected it in multi-comparison plot.

Why do we use both glmnet and DTFT?

The DTFT gives an overall picture, over all periods, while glmnet would only show relevant "peaks". Also, glmnet approach is more robust in case there are discontinuities in the data, which sometimes happens. (This seems apparent from these many plots, but based on theory I do not see why that should be the case.)

Difference is noise between multi- and reference comparison?

I would expect lower noise in multi-comparison plots, but this is not always the case. It might have to do with barometer availability and difference in heights. I.e. if you have 200 barometers as reference, and suddenly you remove 100 from the coast region (which is lower than main land), then this would result in shift of the difference (red) curve. So compensation for height seems necessary. In #62 the barometer availability is analyzed. Anyway, I think that for the drift detection algorithm, we should first start with a selection of good barometers at same height, spread all over Flanders, and work with that for start.

This pdf contains all the plots.

@DavorJ
Copy link
Collaborator Author

DavorJ commented Sep 27, 2020

Some weird cases

Sometimes the seasonality is not 365 days?

image

Sometimes there is quite some discrepancy between multi- and reference comparison

image

I wonder whether this could be the case due to constant addition of barometers from 2010 to 2015 (cf. #62)? The KNMI reference plot is very stable over a very long timeframe. So this case seems to be good for future multi-reference assessment.

Sometimes multi-comparison is much more noisy than KNMI reference cmparison

image

Not sure why...

@DavorJ
Copy link
Collaborator Author

DavorJ commented Sep 28, 2020

So what can one conclude from all this?

  • That there indeed are some yearly repeating patterns on some barometers. Those that exhibit these patterns strongly are from the same type? Similar environment?
  • The repeating yearly pattern is mostly detected on differences, and not on the original series. The original pattern usually has no pattern. (glmnet - green curve is flat)
  • Multi comparison on 12h interval must be done exclusively on 12h interval barometers. If not, then the differences will jump between comparisons with 90 barometers and 60 barometers (cf. Drifts 21: availability of barometer data #62), which will introduce noise of some kind. This will be taken into consideration for the selection of reference barometers.
  • All barometers exhibit higher variance during winter, which must be a real effect. (cf. Drifts 03: Seasonality effects of air pressure #50)

@fredericpiesschaert
Copy link
Collaborator

when browsing through literature, the higher variance during winter as well as seasonal variations are indeed real effects. Here's an example , no access to the paper unfortunately, but there are many examples out there
image

@fredericpiesschaert
Copy link
Collaborator

the seasonal variation is usually explained by temperature: cold air is more dense and heavier than warm air, hence lower air pressure in summer

@fredericpiesschaert
Copy link
Collaborator

BAOL033X is indeed an interesting case. I would consider this the 'perfect' timeseries with seasonality and variance as expected and stable. One seasonality peak in the DFTF-plot and no drift sign at all afterwards. I guess this is how the DFTF of an unsuspected series should look like. Unfortunately, there aren't many cases like this

image

@DavorJ
Copy link
Collaborator Author

DavorJ commented Oct 15, 2020

when browsing through literature, the higher variance during winter as well as seasonal variations are indeed real effects. Here's an example , no access to the paper unfortunately, but there are many examples out there

the seasonal variation is usually explained by temperature: cold air is more dense and heavier than warm air, hence lower air pressure in summer

We do indeed see higher variance during non-summer period, but we do not see the seasonal effect as well-pronounced as in the example from the paper (which is around 20 cmH2O).

This is a similar plot but with KNMI data (#56) where pressure is averaged over months (Drifts 31):

image

Higher variance during non-summer is obvious, but there is no dip in pressure during summer. The blue line is the loess-smoothed curve. So although the explanation of colder thus denser air during winter resulting in higher pressure makes sense to me, I fail to see it in our "valid" data. (i.e. if that statement was true, we would see the seasonal effect in all barometers, including KNMI.)

So from my understanding up to this point, seasonal effect I can only interpret through some failing mechanism (cf. temperature) of the barometer. (Which thus would also mean that the Chinese researchers had a failing barometer.)

@fredericpiesschaert
Copy link
Collaborator

I have no explanation for the summer dip either. If you look at the technical specifications of the barodivers, you would expect that temperature effects are covered
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants