[BUG] unique_stixel_id #43

juangallegozamorano · 2024-05-03T08:54:44Z

First, thanks for the package! It seems that it would be very useful for my analyses!

I managed to run the example with your data with no problem :) However, when I try to run it with my own data, the fitting crashes with the error: "unique_stixel_id". Specifically, when I use the function .fit() it seems to Generate Ensemble with no problem but later in the Training it crashes.
I'm working with the crs projection EPSG:3035. I tried to change the grid_len_upper_threshold and grid_len_lower_threshold to change the grid size but I always get the same error. Below is the terminal output for the error. Moreover, attached is some sample data in case you want to try yourself. Do you have any idea why the error is happening?

Thanks in advance!

Traceback (most recent call last):
File "", line 1, in
File "C:\Users\XXXX\AppData\Local\miniconda3\envs\r-reticulate\lib\site-packages\stemflow\model\AdaSTEM.py", line 577, in fit
self.SAC_training(self.ensemble_df, X_train, verbosity, njobs)
File "C:\Users\XXXX\AppData\Local\miniconda3\envs\r-reticulate\lib\site-packages\stemflow\model\AdaSTEM.py", line 526, in SAC_training
for ensemble in output_generator:
File "C:\Users\XXXX\AppData\Local\miniconda3\envs\r-reticulate\lib\site-packages\tqdm\std.py", line 1181, in iter
for obj in iterable:
File "C:\Users\XXXX\AppData\Local\miniconda3\envs\r-reticulate\lib\site-packages\stemflow\model\AdaSTEM.py", line 506, in
output_generator = (self.SAC_ensemble_training(index_df=ensemble[1], data=data) for ensemble in groups)
File "C:\Users\XXXX\AppData\Local\miniconda3\envs\r-reticulate\lib\site-packages\stemflow\model\AdaSTEM.py", line 480, in SAC_ensemble_training
.apply(lambda stixel: self.stixel_fitting(stixel))
File "C:\Users\XXXX\AppData\Local\miniconda3\envs\r-reticulate\lib\site-packages\pandas\core\groupby\groupby.py", line 1846, in apply
return self._python_apply_general(f, self._obj_with_exclusions)
File "C:\Users\XXXX\AppData\Local\miniconda3\envs\r-reticulate\lib\site-packages\pandas\core\groupby\groupby.py", line 1885, in _python_apply_general
values, mutated = self._grouper.apply_groupwise(f, data, self.axis)
File "C:\Users\XXXX\AppData\Local\miniconda3\envs\r-reticulate\lib\site-packages\pandas\core\groupby\ops.py", line 919, in apply_groupwise
res = f(group)
File "C:\Users\XXXX\AppData\Local\miniconda3\envs\r-reticulate\lib\site-packages\stemflow\model\AdaSTEM.py", line 480, in
.apply(lambda stixel: self.stixel_fitting(stixel))
File "C:\Users\XXXX\AppData\Local\miniconda3\envs\r-reticulate\lib\site-packages\stemflow\model\AdaSTEM.py", line 403, in stixel_fitting
unique_stixel_id = stixel["unique_stixel_id"].iloc[0]
File "C:\Users\XXXX\AppData\Local\miniconda3\envs\r-reticulate\lib\site-packages\pandas\core\frame.py", line 4102, in getitem
indexer = self.columns.get_loc(key)
File "C:\Users\XXXX\AppData\Local\miniconda3\envs\r-reticulate\lib\site-packages\pandas\core\indexes\base.py", line 3812, in get_loc
raise KeyError(key) from err
KeyError: 'unique_stixel_id'

Sample data:
traindata_sample.csv

Watermark:

chenyangkang · 2024-05-03T09:55:15Z

Hi @juangallegozamorano ! Thanks for reporting the issue and providing the sample data. Using your data, I tried model training and I was not able to reproduce the problem. Could you provide codes of your model, including predictor variables, target variables, and model hyper-parameters? Here is what I did:

from stemflow.model.AdaSTEM import AdaSTEM, AdaSTEMClassifier, AdaSTEMRegressor
from stemflow.model.Hurdle import Hurdle
from xgboost import XGBClassifier, XGBRegressor

## "hurdle in Ada"
model = AdaSTEMRegressor(
    base_model=Hurdle(
        classifier=XGBClassifier(tree_method='hist',random_state=42, verbosity = 0, n_jobs=1),
        regressor=XGBRegressor(tree_method='hist',random_state=42, verbosity = 0, n_jobs=1)
    ),                                      # hurdel model for zero-inflated problem (e.g., count)
    save_gridding_plot = True,
    ensemble_fold=10,                       # data are modeled 10 times, each time with jitter and rotation in Quadtree algo
    min_ensemble_required=7,                # Only points covered by > 7 ensembles will be predicted
    grid_len_upper_threshold=2e6,            
    grid_len_lower_threshold=1e6,             
    temporal_start=1,                       # The next 4 params define the temporal sliding window
    temporal_end=366,                            
    temporal_step=20,                       # The window takes steps of 20 DOY (see AdaSTEM demo for details)
    temporal_bin_interval=50,               # Each window will contain data of 50 DOY
    points_lower_threshold=50,              # Only stixels with more than 50 samples are trained and used for prediction
    Spatio1='x',                    # The next three params define the name of 
    Spatio2='y',                     # spatial coordinates shown in the dataframe
    Temporal1='doy',
    use_temporal_to_train=True,             # In each stixel, whether 'DOY' should be a predictor
    njobs=1
)

Notice the grid_length parameters used here, for matching the scale of 'x' and 'y'in your data.

Model fitting:

## fit
tmp = data
X_train = tmp[['BareArea','BrConFor','BroadFor','ConiFor','ConWater','doy','x','y']].fillna(-1)
y_train = tmp[['count']]
model = model.fit(X_train.reset_index(drop=True), y_train, verbosity=1)

juangallegozamorano · 2024-05-03T10:21:50Z

Hi @chenyangkang thanks a lot for the quick reply! Checking your code I suspect that error occurred because of NAs in the data and how they are filled. With your new code I could run the models and now I just need to find out the best grid size, variables to use...etc. I think we can close it for now :)

chenyangkang · 2024-05-03T10:35:45Z

@juangallegozamorano Great! I will consider adding warnings for NA input.

juangallegozamorano assigned chenyangkang May 3, 2024

chenyangkang mentioned this issue May 3, 2024

Add Spatial and Temporal Scale Warnings #44

Closed

chenyangkang added a commit that referenced this issue May 3, 2024

add spatial and temporal scale warnings #43 #44

8ee8e7b

chenyangkang closed this as completed May 3, 2024

chenyangkang mentioned this issue May 3, 2024

NAs detection #45

Closed

chenyangkang referenced this issue May 6, 2024

fix bug

36c4555

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] unique_stixel_id #43

[BUG] unique_stixel_id #43

juangallegozamorano commented May 3, 2024 •

edited

chenyangkang commented May 3, 2024

juangallegozamorano commented May 3, 2024

chenyangkang commented May 3, 2024

[BUG] unique_stixel_id #43

[BUG] unique_stixel_id #43

Comments

juangallegozamorano commented May 3, 2024 • edited

chenyangkang commented May 3, 2024

juangallegozamorano commented May 3, 2024

chenyangkang commented May 3, 2024

juangallegozamorano commented May 3, 2024 •

edited