-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KeyError in Stacker #19
Comments
The error reads that the data has different dimensions. Can you please share the train/valid/test data snippets? |
X_train.shape,X_test.shape,df1.shape and the same data was use on LGBMTuner and it worked well |
what about all True? |
array([ True, True, True, True, True, True, True, True, True, |
Thanks for the data. You didn't mention which is the target column in your data, so I assumed it was I have used your data to replicate the problem you are referring to and didn't catch any errors. Try this code just to make sure that we are doing the same thing. It starts with printing all the dependent libraries versions, so please check if yours are in line with the requirements
These are the libs versions that are required:
|
Thanks @DanilZherebtsov , let me try it and see what it gives. |
I Think all librariesrequiremets are met |
Yes, any model will work just fine |
The key error issue is resolved , now getting the following error:
NotFoundError Traceback (most recent call last) File ~\anaconda3\lib\site-packages\verstack\tools.py:19, in timer..wrapped(*args, **kwargs) File ~\anaconda3\lib\site-packages\verstack\stacking\Stacker.py:615, in Stacker.fit_transform(self, X, y) File ~\anaconda3\lib\site-packages\verstack\stacking\Stacker.py:574, in Stacker._apply_all_or_extra_layers_to_train(self, X, y) File ~\anaconda3\lib\site-packages\verstack\stacking\Stacker.py:510, in Stacker._apply_single_layer(self, layer, X, y) File ~\anaconda3\lib\site-packages\verstack\stacking\Stacker.py:478, in Stacker._create_new_feats_in_train(self, X, y, layer, applicable_feats) File ~\anaconda3\lib\site-packages\verstack\stacking\Stacker.py:316, in Stacker._get_stack_feat(self, model, X, y) File ~\anaconda3\lib\site-packages\verstack\stacking\Stacker.py:303, in Stacker._train_predict_by_model(self, model, X, y) File ~\anaconda3\lib\copy.py:172, in deepcopy(x, memo, _nil) File ~\anaconda3\lib\copy.py:270, in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy) File ~\anaconda3\lib\copy.py:146, in deepcopy(x, memo, _nil) File ~\anaconda3\lib\copy.py:230, in _deepcopy_dict(x, memo, deepcopy) File ~\anaconda3\lib\copy.py:153, in deepcopy(x, memo, _nil) File ~\anaconda3\lib\site-packages\keras\engine\training.py:329, in Model.deepcopy(self, memo) File ~\anaconda3\lib\site-packages\keras\saving\pickle_utils.py:77, in serialize_model_as_bytecode(model) File ~\anaconda3\lib\site-packages\tensorflow\python\lib\io\file_io.py:99, in FileIO.size(self) File ~\anaconda3\lib\site-packages\tensorflow\python\lib\io\file_io.py:910, in stat(filename) File ~\anaconda3\lib\site-packages\tensorflow\python\lib\io\file_io.py:926, in stat_v2(path) NotFoundError: |
this is the code: from verstack import Stacker
|
I noticed that you numpy and pandas version are not in line with the requirements. Can you reinstall the following: |
And which python version are you using? |
3.9.12 (main, Apr 4 2022, 05:22:27) [MSC v.1916 64 bit (AMD64)] |
I've made a clean virtual environment, installed only python 3.9.12 and verstack and still didn't get the error. Can you try the same. I presume you might have some conflict with your local environment. $ cd folder_with_your_data |
And by looking at your loss during training stacker I can see that we are using different data. From the data you provided:
|
you were right , I tried with google colab and everything seems to work fine |
you said that i should use the second layer outputs as inputs in to the final meta_model, why not layer 1 or both layers? |
Great. It's good practice to launch new projects in isolated environments, that way you'll be in full control of all the required depencies. |
Not exactly. You can use the features from layer_1 or from layer_2, or from both layers, or combine them with metafeats that stacker generates, or even throw in the original X feats. It is a matter of experimentation and is subject to the final meta model that you are using. |
Noted..thanks Can I right the code as below if I want to try 2 different algorithms for the final model just to avoid training each separately and hence save time. get lists of features created in each layerlayer_1_feats = stacker.stacked_features['layer_1'] model = CatBoostClassifier(random_state=1) use only the second layer outputs as inputs in to the final meta_model#model.fit(train_X[layer_2_feats], train_y)
|
The code you wrote seems perfectly fine. Only I don't see how you make use of the two models you have in mind.
You have initialised model and model1 and then what? If you want to compare the performance of different final (meta) models, then you can:
Let me know if that was what you were looking for. |
Thanks, that's exactly what I wanted and it worked perfectly |
@chrissny88 How did you resolve the key error as I am facing the same issue? |
Can you provide the code you are using to get the error and a error stack trace?
|
Make sure the indexes are same in X_train and y_train X_train.reset_index(drop=True, inplace=True) |
Thanks, this worked! |
You are welcome. |
I am using google colab |
This is the code:
from verstack import Stacker
stacker = Stacker(objective = 'regression', auto = True)
X_train = stacker.fit_transform(X_train, y_train)
X_val = stacker.transform(X_val)
df1 = stacker.transform(df1)
get lists of features created in each layer
layer_1_feats = stacker.stacked_features['layer_1']
layer_2_feats = stacker.stacked_features['layer_2']
model = LGBMRegressor(random_state=1)
use only the second layer outputs as inputs in to the final meta_model
model.fit(X_train[layer_2_feats], y_train)
pred = model.predict(df1[layer_2_feats])
And below is the error I am getting:
Initiating Stacker.fit_transform
. Optimising model hyperparameters
KeyError Traceback (most recent call last)
File :13, in
File ~\anaconda3\lib\site-packages\verstack\tools.py:19, in timer..wrapped(*args, **kwargs)
16 @wraps(func)
17 def wrapped(*args, **kwargs):
18 start = time.time()
---> 19 result = func(*args, **kwargs)
20 end = time.time()
21 elapsed = round(end-start,5)
File ~\anaconda3\lib\site-packages\verstack\stacking\Stacker.py:615, in Stacker.fit_transform(self, X, y)
613 validate_fit_transform_args(X, y)
614 X_with_stacked_feats = X.reset_index(drop=True).copy()
--> 615 X_with_stacked_feats = self._apply_all_or_extra_layers_to_train(X_with_stacked_feats, y)
616 return X_with_stacked_feats
File ~\anaconda3\lib\site-packages\verstack\stacking\Stacker.py:574, in Stacker._apply_all_or_extra_layers_to_train(self, X, y)
572 if layers_added_after_fit_transform:
573 for layer in layers_added_after_fit_transform:
--> 574 X = self._apply_single_layer(layer, X, y)
575 else:
576 # if no extra layers apply all layers on train set
577 X = self._apply_all_layers(X, y)
File ~\anaconda3\lib\site-packages\verstack\stacking\Stacker.py:510, in Stacker._apply_single_layer(self, layer, X, y)
506 new_feats = self._create_new_feats_in_test(X, y, layer, applicable_feats)
507 # ---------------------------------------------------------------------
508 # create stacked feats in train set
509 else:
--> 510 new_feats = self._create_new_feats_in_train(X, y, layer, applicable_feats)
511 for feat in new_feats:
512 X = pd.concat([X, feat], axis = 1)
File ~\anaconda3\lib\site-packages\verstack\stacking\Stacker.py:478, in Stacker._create_new_feats_in_train(self, X, y, layer, applicable_feats)
476 for model in self.layers[layer]:
477 feat_name = self._create_feat_name(layer)
--> 478 new_feat = self._get_stack_feat(model, X[applicable_feats], y)
479 # append trained models from buffer to self.trained_models_list for layer/feature
480 self.trained_models[layer][feat_name] = self._trained_models_list_buffer
File ~\anaconda3\lib\site-packages\verstack\stacking\Stacker.py:316, in Stacker._get_stack_feat(self, model, X, y)
314 '''Apply stacking features creatin to either train or test set'''
315 if isinstance(y, pd.Series):
--> 316 new_feat = self._train_predict_by_model(model, X, y)
317 else:
318 new_feat = self._predict_by_model(model, X)
File ~\anaconda3\lib\site-packages\verstack\stacking\Stacker.py:300, in Stacker._train_predict_by_model(self, model, X, y)
298 for train_ix, test_ix in kfold.split(X,y):
299 X_train = X.loc[train_ix, :]
--> 300 y_train = y.loc[train_ix]
301 X_test = X.loc[test_ix, :]
302 # create independent model instance for each fold
File ~\anaconda3\lib\site-packages\pandas\core\indexing.py:967, in _LocationIndexer.getitem(self, key)
964 axis = self.axis or 0
966 maybe_callable = com.apply_if_callable(key, self.obj)
--> 967 return self._getitem_axis(maybe_callable, axis=axis)
File ~\anaconda3\lib\site-packages\pandas\core\indexing.py:1191, in _LocIndexer._getitem_axis(self, key, axis)
1188 if hasattr(key, "ndim") and key.ndim > 1:
1189 raise ValueError("Cannot index with multidimensional key")
-> 1191 return self._getitem_iterable(key, axis=axis)
1193 # nested tuple slicing
1194 if is_nested_tuple(key, labels):
File ~\anaconda3\lib\site-packages\pandas\core\indexing.py:1132, in _LocIndexer._getitem_iterable(self, key, axis)
1129 self._validate_key(key, axis)
1131 # A collection of keys
-> 1132 keyarr, indexer = self._get_listlike_indexer(key, axis)
1133 return self.obj._reindex_with_indexers(
1134 {axis: [keyarr, indexer]}, copy=True, allow_dups=True
1135 )
File ~\anaconda3\lib\site-packages\pandas\core\indexing.py:1327, in _LocIndexer._get_listlike_indexer(self, key, axis)
1324 ax = self.obj._get_axis(axis)
1325 axis_name = self.obj._get_axis_name(axis)
-> 1327 keyarr, indexer = ax._get_indexer_strict(key, axis_name)
1329 return keyarr, indexer
File ~\anaconda3\lib\site-packages\pandas\core\indexes\base.py:5782, in Index._get_indexer_strict(self, key, axis_name)
5779 else:
5780 keyarr, indexer, new_indexer = self._reindex_non_unique(keyarr)
-> 5782 self._raise_if_missing(keyarr, indexer, axis_name)
5784 keyarr = self.take(indexer)
5785 if isinstance(key, Index):
5786 # GH 42790 - Preserve name from an Index
File ~\anaconda3\lib\site-packages\pandas\core\indexes\base.py:5845, in Index._raise_if_missing(self, key, indexer, axis_name)
5842 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
5844 not_found = list(ensure_index(key)[missing_mask.nonzero()[0]].unique())
-> 5845 raise KeyError(f"{not_found} not in index")
KeyError: '[3, 10, 16, 20, 21, 23, 25, 35, 38, 40, 41, 43, 47, 48, 60, 61, 65, 74, 77, 85, 86, 93, 98, 100, 110, 113, 121, 126, 128, 129, 133, 134, 135, 136, 142, 143, 149, 150, 151, 158, 159, 162, 165, 169, 172, 175, 177, 183, 188, 195, 203, 204, 219, 221, 223, 228, 229, 231, 236, 249, 264, 271, 274, 275, 283, 290, 296, 303, 304, 308, 311, 319, 323, 330, 331, 332, 338, 343, 354, 358, 359, 363, 366, 372, 379, 383, 385, 386, 388, 394, 399, 416, 419, 421, 424, 426, 436, 441, 442, 445, 449, 462, 468, 471, 474, 480, 489, 501, 507, 510, 530, 542, 547, 551, 555, 557, 559, 567, 572, 576, 582, 584, 585, 588, 609, 610, 611, 613, 615, 618, 619, 620, 622, 625, 627, 633, 641, 642, 660, 663, 668, 669, 670, 671, 674, 675, 690, 692, 693, 695, 703, 706, 710, 714, 716, 726, 730, 733, 739, 742, 749, 755, 756, 757, 761, 762, 763, 765, 768, 771, 772, 784, 793, 804, 806, 815, 821, 824, 831, 834, 835, 836, 848, 858, 861, 871, 874, 875, 878, 880, 884, 885, 892, 895, 896, 897, 904, 909, 912, 917, 921, 923, 925, 926, 928, 931, 932, 937, 938, 948, 950, 952, 962, 964, 974, 984, 987, 991, 996, 998, 1003, 1007, 1011, 1012, 1015, 1026, 1031, 1032, 1033, 1035, 1038, 1040, 1056, 1068, 1070, 1077, 1078, 1082, 1083, 1084, 1085, 1105, 1110, 1112, 1115, 1121, 1126, 1130, 1136, 1137, 1141, 1151, 1154, 1156, 1172, 1175, 1179, 1184, 1185, 1187, 1189, 1192, 1211, 1217, 1218, 1225, 1229, 1231, 1232, 1249, 1256, 1268, 1278, 1280, 1281, 1286, 1287, 1293, 1295, 1298, 1309, 1310, 1311, 1313, 1318, 1323, 1326, 1331, 1332, 1336, 1339, 1344, 1345, 1347, 1348, 1357, 1364, 1365, 1366, 1369, 1372, 1379, 1380, 1383, 1385, 1388, 1392, 1396, 1402, 1404, 1415, 1427, 1429, 1437, 1440, 1441, 1452, 1453, 1457, 1458, 1459, 1463, 1464, 1472, 1473, 1474, 1479, 1482, 1486, 1499, 1507, 1517, 1518, 1521, 1535, 1537, 1539, 1541, 1542, 1544, 1545, 1548, 1551, 1554, 1555, 1556, 1557, 1570, 1577, 1578, 1579, 1587, 1590, 1610, 1613, 1617, 1618, 1624, 1633, 1634, 1635, 1636, 1639, 1643, 1651, 1657, 1673, 1680, 1687, 1692, 1697, 1723, 1732, 1735, 1736, 1737, 1747, 1750, 1753, 1756, 1757, 1762, 1765, 1772, 1774, 1783, 1788, 1794, 1800, 1815, 1824, 1836, 1843, 1844, 1847, 1849, 1850, 1863, 1870, 1877, 1886, 1888, 1891, 1894, 1897, 1899, 1904, 1910, 1914, 1918, 1923, 1925, 1926, 1935, 1938, 1940, 1949, 1951, 1956, 1959, 1961, 1966, 1972, 1973, 1979, 1987, 1994, 1997, 2002, 2014, 2017, 2018, 2023, 2028, 2032, 2039, 2044, 2045, 2048, 2050, 2072, 2077, 2079, 2081, 2089, 2092, 2101, 2105, 2107, 2108, 2113, 2115, 2116, 2118, 2119, 2125, 2135, 2141, 2142, 2143, 2144, 2147, 2149, 2151, 2160, 2163, 2166, 2167, 2174, 2175, 2180, 2181, 2184, 2195, 2198, 2200, 2207, 2215, 2218, 2221, 2222, 2224, 2236, 2239, 2243, 2245, 2248, 2254, 2256, 2260, 2264, 2268, 2270, 2274, 2275, 2283, 2288, 2289, 2291, 2292, 2297, 2305, 2309, 2317, 2323, 2324, 2327, 2329, 2330, 2333, 2335, 2336, 2338, 2349, 2360, 2361, 2363, 2370, 2373, 2379, 2384, 2388, 2393, 2399, 2403, 2405, 2408, 2411, 2413, 2416, 2419, 2421, 2429, 2435, 2437, 2439, 2443, 2444, 2446, 2449, 2461, 2470, 2473, 2478, 2484, 2485, 2492, 2503, 2504, 2505, 2506, 2507, 2515, 2518, 2531, 2538, 2544, 2545, 2548, 2557, 2560, 2564, 2568, 2569, 2573, 2578, 2583, 2587, 2596, 2623, 2626, 2630, 2633, 2647, 2652, 2659, 2662, 2664, 2670, 2678, 2679, 2682, 2683, 2686, 2689, 2690, 2701, 2702, 2704, 2705, 2708, 2723, 2726, 2732, 2734, 2746, 2750, 2752, 2758, 2761, 2767, 2769, 2770, 2777, 2779, 2781, 2785, 2788, 2790, 2793, 2794, 2800, 2803, 2806, 2807, 2810, 2814, 2830, 2832, 2840, 2850, 2859, 2861, 2862, 2867, 2879, 2882, 2887, 2899, 2901, 2904, 2906, 2911, 2920, 2922, 2924, 2927, 2928, 2929, 2934, 2936, 2941, 2944, 2972, 2977, 2979, 2984, 2985, 2986, 2990, 2991, 2995, 3005, 3008, 3013, 3022, 3028, 3031, 3037, 3038, 3039, 3044, 3046, 3048, 3051, 3056, 3059, 3062, 3063, 3064, 3070, 3074, 3076, 3078, 3079, 3083, 3091, 3094, 3096, 3098, 3112, 3120, 3124, 3126, 3133, 3134, 3135, 3145, 3146, 3159, 3160, 3165, 3172, 3183, 3184, 3186, 3188, 3189, 3190, 3203, 3205, 3211, 3214, 3229, 3236, 3241, 3253, 3255, 3260, 3273, 3276, 3280, 3283, 3284, 3289, 3292, 3293, 3294, 3301, 3302, 3306, 3312, 3313, 3317, 3326, 3327, 3328, 3339, 3340, 3346, 3350, 3353, 3355, 3356, 3358, 3372, 3373, 3374, 3375, 3380, 3381, 3382, 3384, 3395, 3397, 3405, 3406, 3407, 3409, 3416, 3418, 3420, 3422, 3423, 3424, 3425, 3427, 3432, 3433, 3438, 3443, 3453, 3455, 3456, 3467, 3469, 3470, 3471, 3472, 3474, 3475, 3476, 3494, 3508, 3516, 3517, 3518, 3520, 3522, 3527, 3532, 3534, 3540, 3544, 3546, 3549, 3550, 3551, 3553, 3565, 3570, 3571, 3577, 3579, 3580, 3583, 3584, 3599, 3602, 3611, 3614, 3615, 3625, 3628, 3637, 3642, 3643, 3646, 3650, 3651, 3664, 3677, 3683, 3698, 3705, 3710, 3711, 3714, 3722, 3728, 3729, 3734 ] not in index'
The text was updated successfully, but these errors were encountered: