Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R and Python version give different break dates #14

Closed
mirt001 opened this issue Dec 8, 2020 · 10 comments
Closed

R and Python version give different break dates #14

mirt001 opened this issue Dec 8, 2020 · 10 comments
Labels
bug Something isn't working

Comments

@mirt001
Copy link

mirt001 commented Dec 8, 2020

I am using the following Python and R bfast versions at the date of the creation of this issue.

git clone --single-branch --branch develop https://github.com/gieseke/bfast.git
git clone --single-branch --branch master https://github.com/bfast2/bfast.git

To minimize complexity, I created these 2 self-contained python and R scripts, that are almost direct translations:

import datetime
import numpy as np
import bisect
import bfast

indexedRasterTimeSeries = np.array([[[3141]],[[0]],[[2153]],[[0]],[[2791]],[[2795]],[[2109]],[[2139]],[[2077]],[[2200]],[[0]],[[1260]],[[1301]],[[2496]],[[1429]],[[0]],[[0]],[[0]],[[0]],[[3122]],[[2446]],[[2428]],[[1647]],[[1266]],[[0]],[[25]],[[-996]],[[-21]],[[0]],[[0]],[[0]],[[0]],[[466]],[[1454]],[[0]],[[0]],[[2084]],[[2600]],[[0]],[[2133]],[[1865]],[[0]],[[596]],[[410]],[[0]],[[0]],[[0]],[[0]],[[2438]],[[0]],[[0]],[[0]],[[0]],[[3382]],[[3753]],[[4647]],[[0]],[[2989]],[[3356]],[[3223]],[[2732]],[[3536]],[[2332]],[[0]],[[1957]],[[1325]],[[0]],[[1990]],[[0]],[[0]],[[2740]],[[3510]],[[0]],[[0]],[[3019]],[[0]],[[3676]],[[3790]],[[3370]],[[0]],[[3504]],[[0]],[[0]],[[2984]],[[2762]],[[2794]],[[2845]],[[0]],[[0]],[[3076]],[[0]],[[3213]],[[0]],[[0]],[[0]],[[0]],[[0]],[[4071]],[[0]],[[0]],[[0]],[[0]],[[3387]],[[0]],[[3456]],[[3222]],[[3173]],[[2609]],[[2650]],[[2535]],[[2441]],[[2191]],[[2472]],[[2350]],[[2847]],[[0]],[[0]],[[3484]],[[0]],[[4704]],[[3412]],[[3687]],[[0]],[[0]],[[0]],[[0]],[[3946]],[[0]],[[0]],[[0]],[[0]],[[3764]],[[3266]],[[3451]],[[3552]],[[3289]],[[2517]],[[2588]],[[0]],[[0]],[[1487]],[[1746]],[[2084]],[[1884]],[[1917]],[[0]],[[3765]],[[0]],[[0]],[[0]],[[0]],[[3950]],[[3663]],[[3733]],[[4007]],[[4202]],[[3566]],[[3926]],[[3573]],[[3804]],[[3504]],[[3678]],[[3318]],[[3255]],[[0]],[[0]],[[2768]],[[3016]],[[0]],[[0]],[[3641]],[[3685]],[[4161]],[[0]],[[3975]],[[0]],[[0]],[[4019]],[[0]],[[4026]],[[3697]],[[4053]],[[4292]],[[3741]],[[3911]],[[3575]],[[3467]],[[3410]],[[0]],[[3291]],[[3473]],[[0]],[[0]],[[3467]],[[0]],[[3801]],[[3022]],[[0]],[[0]],[[3363]],[[3742]],[[4035]],[[0]],[[0]],[[0]],[[3996]],[[0]],[[0]],[[3848]],[[0]],[[0]],[[3709]],[[3312]],[[3148]],[[2734]],[[2944]],[[0]],[[0]],[[0]],[[0]],[[4086]],[[0]],[[0]],[[0]],[[0]],[[0]],[[0]],[[0]],[[0]],[[0]],[[0]],[[0]],[[4200]]])
observationDates = ['2010-02-13', '2010-03-09', '2010-04-10', '2010-04-18', '2010-05-12', '2010-06-13', '2010-06-29', '2010-07-23', '2010-07-31', '2010-08-08', '2010-08-16', '2010-08-24', '2010-09-09', '2010-09-25', '2010-10-11', '2010-12-06', '2011-04-21', '2011-05-31', '2011-06-16', '2011-06-24', '2011-07-10', '2011-07-18', '2011-07-26', '2011-08-11', '2011-08-19', '2011-08-27', '2011-09-04', '2011-09-28', '2011-10-05', '2011-10-06', '2011-10-22', '2011-10-30', '2011-11-07', '2011-11-15', '2011-12-25', '2012-03-14', '2012-03-30', '2012-04-15', '2012-05-01', '2012-06-02', '2012-06-18', '2012-07-20', '2012-08-05', '2012-09-06', '2012-10-24', '2012-11-09', '2012-12-11', '2013-01-12', '2013-01-28', '2013-03-01', '2013-04-02', '2013-04-05', '2013-04-10', '2013-04-18', '2013-04-26', '2013-05-04', '2013-05-12', '2013-06-05', '2013-06-21', '2013-06-29', '2013-07-07', '2013-07-15', '2013-07-31', '2013-08-08', '2013-08-16', '2013-09-01', '2013-09-09', '2013-09-17', '2013-09-25', '2013-10-03', '2013-10-27', '2013-12-06', '2013-12-30', '2014-02-16', '2014-03-20', '2014-04-05', '2014-05-23', '2014-05-31', '2014-06-08', '2014-06-24', '2014-07-02', '2014-07-18', '2014-07-26', '2014-08-03', '2014-08-11', '2014-08-19', '2014-08-27', '2014-09-04', '2014-09-12', '2014-09-20', '2014-10-14', '2014-10-22', '2014-10-30', '2014-11-23', '2014-12-01', '2014-12-25', '2015-02-03', '2015-02-27', '2015-03-23', '2015-04-08', '2015-04-16', '2015-05-02', '2015-05-26', '2015-06-11', '2015-06-19', '2015-06-27', '2015-07-21', '2015-07-29', '2015-08-06', '2015-08-14', '2015-08-22', '2015-08-30', '2015-09-07', '2015-09-15', '2015-09-23', '2015-10-09', '2015-10-17', '2015-11-10', '2015-11-18', '2015-11-26', '2015-12-04', '2015-12-12', '2015-12-28', '2016-01-13', '2016-01-21', '2016-02-14', '2016-03-17', '2016-03-25', '2016-04-10', '2016-04-18', '2016-04-26', '2016-05-04', '2016-05-28', '2016-06-13', '2016-06-21', '2016-06-29', '2016-07-15', '2016-07-23', '2016-07-31', '2016-08-08', '2016-08-16', '2016-08-24', '2016-09-01', '2016-09-09', '2016-09-17', '2016-10-19', '2016-11-04', '2016-11-12', '2016-11-20', '2016-11-28', '2016-12-14', '2017-04-05', '2017-04-13', '2017-04-29', '2017-05-07', '2017-06-08', '2017-06-16', '2017-06-24', '2017-07-02', '2017-07-10', '2017-07-18', '2017-07-26', '2017-08-03', '2017-08-11', '2017-08-19', '2017-08-27', '2017-09-04', '2017-09-20', '2017-09-28', '2017-10-06', '2017-10-30', '2017-11-15', '2017-12-01', '2017-12-09', '2017-12-17', '2018-01-10', '2018-03-15', '2018-04-24', '2018-05-02', '2018-05-10', '2018-05-18', '2018-05-26', '2018-06-11', '2018-06-19', '2018-06-27', '2018-07-13', '2018-07-29', '2018-08-14', '2018-08-22', '2018-08-30', '2018-09-07', '2018-09-15', '2018-09-23', '2018-10-01', '2018-10-09', '2018-11-02', '2018-11-18', '2018-12-28', '2019-01-05', '2019-02-22', '2019-03-02', '2019-04-11', '2019-04-19', '2019-05-05', '2019-05-21', '2019-05-29', '2019-06-14', '2019-06-22', '2019-06-30', '2019-07-08', '2019-07-16', '2019-08-01', '2019-08-09', '2019-08-17', '2019-08-25', '2019-09-10', '2019-09-17', '2019-09-18', '2019-10-12', '2019-11-13', '2019-12-07', '2020-01-07', '2020-01-08', '2020-01-16', '2020-01-24', '2020-02-17', '2020-03-04', '2020-03-12', '2020-03-20', '2020-04-21', '2020-04-29', '2020-05-30', '2020-05-31']
observationDates = [datetime.datetime.strptime(date, "%Y-%m-%d") for date in observationDates]

monitoringStartDate = datetime.datetime(2018, 1, 1)

mon = bfast.BFASTMonitor(start_monitor=monitoringStartDate, freq=365, k=3, hfrac=0.25, trend=False, level=0.05, backend="python")
mon.fit(indexedRasterTimeSeries, observationDates)

monitoringDates = observationDates[bisect.bisect(observationDates, monitoringStartDate):]
breakDate = monitoringDates[mon.breaks[0, 0]-1]
print(breakDate.timetuple().tm_year,breakDate.timetuple().tm_yday)

2018 122

install.packages("devtools")
devtools::install_github(c("bfast2/strucchange","bfast2/bfast"), quiet = TRUE)

indexedRasterTimeSeries <- c(3141,NA,2153,NA,2791,2795,2109,2139,2077,2200,NA,1260,1301,2496,1429,NA,NA,NA,NA,3122,2446,2428,1647,1266,NA,25,-996,-21,NA,NA,NA,NA,466,1454,NA,NA,2084,2600,NA,2133,1865,NA,596,410,NA,NA,NA,NA,2438,NA,NA,NA,NA,3382,3753,4647,NA,2989,3356,3223,2732,3536,2332,NA,1957,1325,NA,1990,NA,NA,2740,3510,NA,NA,3019,NA,3676,3790,3370,NA,3504,NA,NA,2984,2762,2794,2845,NA,NA,3076,NA,3213,NA,NA,NA,NA,NA,4071,NA,NA,NA,NA,3387,NA,3456,3222,3173,2609,2650,2535,2441,2191,2472,2350,2847,NA,NA,3484,NA,4704,3412,3687,NA,NA,NA,NA,3946,NA,NA,NA,NA,3764,3266,3451,3552,3289,2517,2588,NA,NA,1487,1746,2084,1884,1917,NA,3765,NA,NA,NA,NA,3950,3663,3733,4007,4202,3566,3926,3573,3804,3504,3678,3318,3255,NA,NA,2768,3016,NA,NA,3641,3685,4161,NA,3975,NA,NA,4019,NA,4026,3697,4053,4292,3741,3911,3575,3467,3410,NA,3291,3473,NA,NA,3467,NA,3801,3022,NA,NA,3363,3742,4035,NA,NA,NA,3996,NA,NA,3848,NA,NA,3709,3312,3148,2734,2944,NA,NA,NA,NA,4086,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,4200)
observationDates <- c("2010-02-13","2010-03-09","2010-04-10","2010-04-18","2010-05-12","2010-06-13","2010-06-29","2010-07-23","2010-07-31","2010-08-08","2010-08-16","2010-08-24","2010-09-09","2010-09-25","2010-10-11","2010-12-06","2011-04-21","2011-05-31","2011-06-16","2011-06-24","2011-07-10","2011-07-18","2011-07-26","2011-08-11","2011-08-19","2011-08-27","2011-09-04","2011-09-28","2011-10-05","2011-10-06","2011-10-22","2011-10-30","2011-11-07","2011-11-15","2011-12-25","2012-03-14","2012-03-30","2012-04-15","2012-05-01","2012-06-02","2012-06-18","2012-07-20","2012-08-05","2012-09-06","2012-10-24","2012-11-09","2012-12-11","2013-01-12","2013-01-28","2013-03-01","2013-04-02","2013-04-05","2013-04-10","2013-04-18","2013-04-26","2013-05-04","2013-05-12","2013-06-05","2013-06-21","2013-06-29","2013-07-07","2013-07-15","2013-07-31","2013-08-08","2013-08-16","2013-09-01","2013-09-09","2013-09-17","2013-09-25","2013-10-03","2013-10-27","2013-12-06","2013-12-30","2014-02-16","2014-03-20","2014-04-05","2014-05-23","2014-05-31","2014-06-08","2014-06-24","2014-07-02","2014-07-18","2014-07-26","2014-08-03","2014-08-11","2014-08-19","2014-08-27","2014-09-04","2014-09-12","2014-09-20","2014-10-14","2014-10-22","2014-10-30","2014-11-23","2014-12-01","2014-12-25","2015-02-03","2015-02-27","2015-03-23","2015-04-08","2015-04-16","2015-05-02","2015-05-26","2015-06-11","2015-06-19","2015-06-27","2015-07-21","2015-07-29","2015-08-06","2015-08-14","2015-08-22","2015-08-30","2015-09-07","2015-09-15","2015-09-23","2015-10-09","2015-10-17","2015-11-10","2015-11-18","2015-11-26","2015-12-04","2015-12-12","2015-12-28","2016-01-13","2016-01-21","2016-02-14","2016-03-17","2016-03-25","2016-04-10","2016-04-18","2016-04-26","2016-05-04","2016-05-28","2016-06-13","2016-06-21","2016-06-29","2016-07-15","2016-07-23","2016-07-31","2016-08-08","2016-08-16","2016-08-24","2016-09-01","2016-09-09","2016-09-17","2016-10-19","2016-11-04","2016-11-12","2016-11-20","2016-11-28","2016-12-14","2017-04-05","2017-04-13","2017-04-29","2017-05-07","2017-06-08","2017-06-16","2017-06-24","2017-07-02","2017-07-10","2017-07-18","2017-07-26","2017-08-03","2017-08-11","2017-08-19","2017-08-27","2017-09-04","2017-09-20","2017-09-28","2017-10-06","2017-10-30","2017-11-15","2017-12-01","2017-12-09","2017-12-17","2018-01-10","2018-03-15","2018-04-24","2018-05-02","2018-05-10","2018-05-18","2018-05-26","2018-06-11","2018-06-19","2018-06-27","2018-07-13","2018-07-29","2018-08-14","2018-08-22","2018-08-30","2018-09-07","2018-09-15","2018-09-23","2018-10-01","2018-10-09","2018-11-02","2018-11-18","2018-12-28","2019-01-05","2019-02-22","2019-03-02","2019-04-11","2019-04-19","2019-05-05","2019-05-21","2019-05-29","2019-06-14","2019-06-22","2019-06-30","2019-07-08","2019-07-16","2019-08-01","2019-08-09","2019-08-17","2019-08-25","2019-09-10","2019-09-17","2019-09-18","2019-10-12","2019-11-13","2019-12-07","2020-01-07","2020-01-08","2020-01-16","2020-01-24","2020-02-17","2020-03-04","2020-03-12","2020-03-20","2020-04-21","2020-04-29","2020-05-30","2020-05-31")

monitoringStartDate <- c(2018,1)

pixelTimeSeries <-  bfast::bfastts(indexedRasterTimeSeries, dates = observationDates, type="irregular")
mon <- bfast::bfastmonitor(pixelTimeSeries, monitoringStartDate, formula = response ~ harmon, order = 3, h = 0.25, level = 0.05, history = "all")

year <- floor(mon$breakpoint)
doy <- (mon$breakpoint-year) * 365 + 1
print(c(year,doy))

2018 162

I believe that both runs are run with the same parameters, and they should give the same break date. Am I missing some parameter?

@mortvest
Copy link
Collaborator

mortvest commented Dec 8, 2020

Thank you for reporting this. I'll look into it

@mortvest mortvest added the bug Something isn't working label Dec 8, 2020
@mirt001
Copy link
Author

mirt001 commented Dec 8, 2020

made some minor edits to the initial comment that do not affect the gist of the issue.
In the R version I was wrongly assuming that calendars are 0 based, so I was miscalculating the day of year by 1.

@mirt001
Copy link
Author

mirt001 commented Dec 10, 2020

after some more tests, in which it happened that there was a break in the last day of the monitoring period, I noticed that the breaks property is a 1 based index. Is this correct?
By this I mean that the first date in the monitoring period must have an index of 1, in order for the breaks property to be used correctly as an index. The implication being that my above script is wrong by 1 date. Unfortunately it moves the detected break even further, not closer to the R version.
Am I doing something wrong with the indexing?
I am editing the comment to correct this bug in my script, but please check the edits to better understand what I mean.

@Carolina710
Copy link

Carolina710 commented Dec 15, 2020

Good evening,

I subscribed to this issue since I'm dealing with the same problem for some of my currents tests, both in Python and OpenCL.

I just was reading the paper (https://arxiv.org/pdf/1807.01751.pdf), and take some time seeing the R files implementation (https://www.rdocumentation.org/packages/strucchange/versions/1.5-2/source - monitoring.R), focusing on breaks calculation.

From the paper, I could understand that (among other things) the calculation of the breaks is dependent on the calculation of bounds values. In the paper the bounds are computed with the following expression:
image
with:
image
being t the monitoring period.

After inspecting the R files I found the "OLS-MOSUM" method the most identical to what is done in Python. In the method, the bounds are calculated as:
image

The formula is identical to what is done in Python since 'lam' (critval) is previously multiplied by the sqrt of 2. However, I think that in Python the division value inside the log is calculated differently from the R code. 't', as stated in the paper should be the monitoring period and 'n' the last value of the history, but in Python, you do this:

image

I think you divide by the last mapped_indice of all time serie. What do you think about:

image

Can you give me some feedback about this practice?

Thank you Dmitry!

@mortvest
Copy link
Collaborator

Hi @Carolina710. Python is 0-indexed, in comparison to R, which is 1-indexed. Furthermore, in Python, you can index with -1 to get the last value in the array, hence array[-1] and array[n-1] are equivalent. Se here for more information

I am currently looking into the issue

@Carolina710
Copy link

Hi, thanks for your reply.

I understand that logic, I just thought that "n" or "self.n" (depending on your version) was the number of data images from the period of history, right? Like: "self.n = self._compute_end_history (dates)". I also thought that the mapped indices were calculated for all time series, including the history and the monitoring periods, with a total of "N" data images.

So, in this scenario, you take the last mapped index of all time series, that is also the last data from the monitoring period [-1] or [N-1], not the last one in history [n-1], I think ...? And my question was, shouldn't it be the last mapped index in history? Like, "histsize" in R.

Thank you very much!

@mortvest
Copy link
Collaborator

mortvest commented Dec 17, 2020

Hi @mirt001 and @Carolina710
This particular bug should be fixed now. However, I need to look more thoroughly into the boundary computation soon (including the issue @Carolina710 mentioned).

Thanks everyone!

@mortvest
Copy link
Collaborator

I think that I have misunderstood you answer earlier @Carolina710. You are right, it should be divided by the last index in the history period. I have pushed it to the develop branch. Nice catch!

@mirt001
Copy link
Author

mirt001 commented Dec 21, 2020

Thanks @mortvest and also thanks @Carolina710
I just checked the results using the latest develop branch, on the above Python test, and while the results are closer, they are still different in the sense that the break is detected at a different date.
The new detected date for python is 2018 146 , or 2018-05-26, which is one date too early, when compared to the R version, which detects the break on 2018 162, or 2018-05-26. Are you getting different results?

I will test this more to see if Python is always 1 date away, or if there are more inconsistencies.

@mortvest
Copy link
Collaborator

The returned breakpoint is 0-indexed, so you don't need to subtract 1 from it. I get 2018 163 this way. The small error is due to the way the dates are converted, which is outside of the scope of this implementation. The output breakpoint index is the same as in R version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants