Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] unique_stixel_id #43

Closed
juangallegozamorano opened this issue May 3, 2024 · 3 comments
Closed

[BUG] unique_stixel_id #43

juangallegozamorano opened this issue May 3, 2024 · 3 comments
Assignees

Comments

@juangallegozamorano
Copy link

juangallegozamorano commented May 3, 2024

First, thanks for the package! It seems that it would be very useful for my analyses!

I managed to run the example with your data with no problem :) However, when I try to run it with my own data, the fitting crashes with the error: "unique_stixel_id". Specifically, when I use the function .fit() it seems to Generate Ensemble with no problem but later in the Training it crashes.
I'm working with the crs projection EPSG:3035. I tried to change the grid_len_upper_threshold and grid_len_lower_threshold to change the grid size but I always get the same error. Below is the terminal output for the error. Moreover, attached is some sample data in case you want to try yourself. Do you have any idea why the error is happening?

Thanks in advance!

Traceback (most recent call last):
File "", line 1, in
File "C:\Users\XXXX\AppData\Local\miniconda3\envs\r-reticulate\lib\site-packages\stemflow\model\AdaSTEM.py", line 577, in fit
self.SAC_training(self.ensemble_df, X_train, verbosity, njobs)
File "C:\Users\XXXX\AppData\Local\miniconda3\envs\r-reticulate\lib\site-packages\stemflow\model\AdaSTEM.py", line 526, in SAC_training
for ensemble in output_generator:
File "C:\Users\XXXX\AppData\Local\miniconda3\envs\r-reticulate\lib\site-packages\tqdm\std.py", line 1181, in iter
for obj in iterable:
File "C:\Users\XXXX\AppData\Local\miniconda3\envs\r-reticulate\lib\site-packages\stemflow\model\AdaSTEM.py", line 506, in
output_generator = (self.SAC_ensemble_training(index_df=ensemble[1], data=data) for ensemble in groups)
File "C:\Users\XXXX\AppData\Local\miniconda3\envs\r-reticulate\lib\site-packages\stemflow\model\AdaSTEM.py", line 480, in SAC_ensemble_training
.apply(lambda stixel: self.stixel_fitting(stixel))
File "C:\Users\XXXX\AppData\Local\miniconda3\envs\r-reticulate\lib\site-packages\pandas\core\groupby\groupby.py", line 1846, in apply
return self._python_apply_general(f, self._obj_with_exclusions)
File "C:\Users\XXXX\AppData\Local\miniconda3\envs\r-reticulate\lib\site-packages\pandas\core\groupby\groupby.py", line 1885, in _python_apply_general
values, mutated = self._grouper.apply_groupwise(f, data, self.axis)
File "C:\Users\XXXX\AppData\Local\miniconda3\envs\r-reticulate\lib\site-packages\pandas\core\groupby\ops.py", line 919, in apply_groupwise
res = f(group)
File "C:\Users\XXXX\AppData\Local\miniconda3\envs\r-reticulate\lib\site-packages\stemflow\model\AdaSTEM.py", line 480, in
.apply(lambda stixel: self.stixel_fitting(stixel))
File "C:\Users\XXXX\AppData\Local\miniconda3\envs\r-reticulate\lib\site-packages\stemflow\model\AdaSTEM.py", line 403, in stixel_fitting
unique_stixel_id = stixel["unique_stixel_id"].iloc[0]
File "C:\Users\XXXX\AppData\Local\miniconda3\envs\r-reticulate\lib\site-packages\pandas\core\frame.py", line 4102, in getitem
indexer = self.columns.get_loc(key)
File "C:\Users\XXXX\AppData\Local\miniconda3\envs\r-reticulate\lib\site-packages\pandas\core\indexes\base.py", line 3812, in get_loc
raise KeyError(key) from err
KeyError: 'unique_stixel_id'

Sample data:
traindata_sample.csv

Watermark:
image

@chenyangkang
Copy link
Owner

Hi @juangallegozamorano ! Thanks for reporting the issue and providing the sample data. Using your data, I tried model training and I was not able to reproduce the problem. Could you provide codes of your model, including predictor variables, target variables, and model hyper-parameters? Here is what I did:

from stemflow.model.AdaSTEM import AdaSTEM, AdaSTEMClassifier, AdaSTEMRegressor
from stemflow.model.Hurdle import Hurdle
from xgboost import XGBClassifier, XGBRegressor

## "hurdle in Ada"
model = AdaSTEMRegressor(
    base_model=Hurdle(
        classifier=XGBClassifier(tree_method='hist',random_state=42, verbosity = 0, n_jobs=1),
        regressor=XGBRegressor(tree_method='hist',random_state=42, verbosity = 0, n_jobs=1)
    ),                                      # hurdel model for zero-inflated problem (e.g., count)
    save_gridding_plot = True,
    ensemble_fold=10,                       # data are modeled 10 times, each time with jitter and rotation in Quadtree algo
    min_ensemble_required=7,                # Only points covered by > 7 ensembles will be predicted
    grid_len_upper_threshold=2e6,            
    grid_len_lower_threshold=1e6,             
    temporal_start=1,                       # The next 4 params define the temporal sliding window
    temporal_end=366,                            
    temporal_step=20,                       # The window takes steps of 20 DOY (see AdaSTEM demo for details)
    temporal_bin_interval=50,               # Each window will contain data of 50 DOY
    points_lower_threshold=50,              # Only stixels with more than 50 samples are trained and used for prediction
    Spatio1='x',                    # The next three params define the name of 
    Spatio2='y',                     # spatial coordinates shown in the dataframe
    Temporal1='doy',
    use_temporal_to_train=True,             # In each stixel, whether 'DOY' should be a predictor
    njobs=1
)

Notice the grid_length parameters used here, for matching the scale of 'x' and 'y'in your data.

Model fitting:

## fit
tmp = data
X_train = tmp[['BareArea','BrConFor','BroadFor','ConiFor','ConWater','doy','x','y']].fillna(-1)
y_train = tmp[['count']]
model = model.fit(X_train.reset_index(drop=True), y_train, verbosity=1)

@juangallegozamorano
Copy link
Author

Hi @chenyangkang thanks a lot for the quick reply! Checking your code I suspect that error occurred because of NAs in the data and how they are filled. With your new code I could run the models and now I just need to find out the best grid size, variables to use...etc. I think we can close it for now :)

@chenyangkang
Copy link
Owner

@juangallegozamorano Great! I will consider adding warnings for NA input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants