New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Passing data to temporal_split and other functions #45
Comments
The length of the dataframe or array-like object must correspond to the number of time series in the data set (not the number of samples in a single time series). So if you are working with a single time series, you can just put [ ] around it to make it into a length 1 list. Most data sets have many time series, and so this the reason for the convention. These details are explained in the user guide https://dmbee.github.io/seglearn/user_guide.html Let me know if this fixes your problems. D |
e.g. a typical X_train with three time series would be shaped like this [(100, 5), (150, 5), (200,5)] an X_train with one time series would be like this [(100,5)] I usually use lists or numpy object arrays. If using pandas, again, you'll want each sample to correspond to a time series. |
Thank you very much for your kind and prompt reply.
Pipe is:
as in your example and X_test and y_test come from |
I assume your X_test, y_test is too small to segment with width 200. I probably should add a check in the transformer for that. I'll put that on the todo list. |
It may be this package is not right for your application though based on the data you are looking at. Generally you should have hundreds / thousands of segments to train and test on. Hard to know without knowing more, but you may want to look at methods like ARIMA and such if you just have one series. |
Thanks again. Actually I have several thousands series to analyse (all with the same characteristics that I can easily melt in a single df even though they are mostly independent; and since they are independent and all different ARIMA would require to calculate p and q for each one and that would take to much time; moreover since they have seasonality the model would be a SARIMA, with even more calculations) but before training on the whole dataset I wanted to test it with just one series. I reduced the segment size and now I don't get any more errors but the prediction quality, at least for this one, is not exciting. I will test it on a bunch of series but, in case, what may I do to improve? |
I wouldn't expect good results with one series. Using sliding window segmentation doesn't always make sense for every problem. It's great for some things like earthquakes, activity recognition, where there is no or little time dependency outside the window. Generally, you need to make sure the window length is long enough to incorporate enough dynamics sensible for a prediction. It's important to interpolate the samples (if not regularly sampled) to a fixed sampling rate so the window time is constant. Setting high overlap is a good data augmentation strategy. Concatenating any heuristics available to the calculated features eg season is also very helpful. Just a few thoughts. Good luck. |
Hi, I was following your example code (simple regression), but I'm stuck. I have a DataFrame of shape (1017, 15). The last column is the target so I created two dfs, one for X (1017, 14) and one for y (1017). I tried to pass those values to
temporal_split
but I always get an error no matter what I do (passing the df, passing them as lists). For example, passing them as list gives:If, on the other hand, I pass them as df I get:
The same holds true if I manually split the DataFrames and pass them to
seg.fit_transform(X_train, y_train)
I tried to put the date column in the df as well as in the index but the error is still there.
What's wrong?
Info of the Dataframe:
I tried to use it with date column or date index or as a list. The same for y: I tried to use it a Series, a Dataframe with date column or date index and list both with or without the date column. As you see there are no NaN values.
The text was updated successfully, but these errors were encountered: