Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BRCG train fails in copied "Credit Approval Tutorial" code #106

Closed
gaborpelesz opened this issue Oct 8, 2020 · 4 comments
Closed

BRCG train fails in copied "Credit Approval Tutorial" code #106

gaborpelesz opened this issue Oct 8, 2020 · 4 comments

Comments

@gaborpelesz
Copy link
Contributor

Hi there,

I've actually copied the code (did no modification at all) from the BRCG part of the "Credit Approval Tutorial" code and ran into errors. I'm quite sure that the dataset was loaded appropriately, as I have also trained a scikit learn Decision Tree Classifier on it with no problem and in the same notebook.

Can someone help me with this issue? Am I missing something or is it an internal problem?

Thanks in advance!

Here is the code and the output.
It was run on google colab, with pandas 1.1.2 and the latest aix360 release, which is 0.2.0.

Copied code

import warnings
warnings.filterwarnings('ignore')

# Load FICO HELOC data with special values converted to np.nan
from aix360.datasets.heloc_dataset import HELOCDataset, nan_preprocessing
data = HELOCDataset(custom_preprocessing=nan_preprocessing).data()
# Separate target variable
y = data.pop('RiskPerformance')

# Split data into training and test sets using fixed random seed
from sklearn.model_selection import train_test_split
dfTrain, dfTest, yTrain, yTest = train_test_split(data, y, random_state=0, stratify=y)
dfTrain.head().transpose()

# Binarize data and also return standardized ordinal features
from aix360.algorithms.rbm import FeatureBinarizer
fb = FeatureBinarizer(negations=True, returnOrd=True)
dfTrain, dfTrainStd = fb.fit_transform(dfTrain)
dfTest, dfTestStd = fb.transform(dfTest)
dfTrain['ExternalRiskEstimate'].head()

# Instantiate BRCG with small complexity penalty and large beam search width
from aix360.algorithms.rbm import BooleanRuleCG
br = BooleanRuleCG(lambda0=1e-3, lambda1=1e-3, CNF=True)

# Train, print, and evaluate model
br.fit(dfTrain, yTrain)
from sklearn.metrics import accuracy_score
print('Training accuracy:', accuracy_score(yTrain, br.predict(dfTrain)))
print('Test accuracy:', accuracy_score(yTest, br.predict(dfTest)))
print('Predict Y=0 if ANY of the following rules are satisfied, otherwise Y=1:')
print(br.explain()['rules'])

Output

Learning CNF rule with complexity parameters lambda0=0.001, lambda1=0.001
Initial LP solved
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/pandas/core/series.py in __setitem__(self, key, value)
   1001         try:
-> 1002             self._set_with_engine(key, value)
   1003         except (KeyError, ValueError):

/usr/local/lib/python3.6/dist-packages/pandas/core/series.py in _set_with_engine(self, key, value)
   1032         # fails with AttributeError for IntervalIndex
-> 1033         loc = self.index._engine.get_loc(key)
   1034         validate_numeric_casting(self.dtype, value)

pandas/_libs/index.pyx in pandas._libs.index.BaseMultiIndexCodesEngine.get_loc()

KeyError: 'ExternalRiskEstimate'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-98-8d81fbd6c0e1> in <module>()
     26 
     27 # Train, print, and evaluate model
---> 28 br.fit(dfTrain, yTrain)
     29 from sklearn.metrics import accuracy_score
     30 print('Training accuracy:', accuracy_score(yTrain, br.predict(dfTrain)))

/usr/local/lib/python3.6/dist-packages/aix360/algorithms/rbm/boolean_rule_cg.py in fit(self, X, y)
    118         UB = min(UB.min(), 0)
    119         v, zNew, Anew = beam_search(r, X, self.lambda0, self.lambda1,
--> 120                                     K=self.K, UB=UB, D=self.D, B=self.B, eps=self.eps)
    121 
    122         while (v < -self.eps).any() and (self.it < self.iterMax):

/usr/local/lib/python3.6/dist-packages/aix360/algorithms/rbm/beam_search.py in beam_search(r, X, lambda0, lambda1, K, UB, D, B, wLB, eps, stopEarly)
    285             if i[1] == '<=':
    286                 thresh = Xp[i[0]].columns.get_level_values(1).to_series().replace('NaN', np.nan)
--> 287                 colKeep[i[0]] = (Xp[i[0]].columns.get_level_values(0) == '>') & (thresh < i[2])
    288             elif i[1] == '>':
    289                 thresh = Xp[i[0]].columns.get_level_values(1).to_series().replace('NaN', np.nan)

/usr/local/lib/python3.6/dist-packages/pandas/core/series.py in __setitem__(self, key, value)
   1008             else:
   1009                 # GH#12862 adding an new key to the Series
-> 1010                 self.loc[key] = value
   1011 
   1012         except TypeError as e:

/usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py in __setitem__(self, key, value)
    668 
    669         iloc = self if self.name == "iloc" else self.obj.iloc
--> 670         iloc._setitem_with_indexer(indexer, value)
    671 
    672     def _validate_key(self, key, axis: int):

/usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py in _setitem_with_indexer(self, indexer, value)
   1790                 # setting for extensionarrays that store dicts. Need to decide
   1791                 # if it's worth supporting that.
-> 1792                 value = self._align_series(indexer, Series(value))
   1793 
   1794             elif isinstance(value, ABCDataFrame):

/usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py in _align_series(self, indexer, ser, multiindex_indexer)
   1909             # series, so need to broadcast (see GH5206)
   1910             if sum_aligners == self.ndim and all(is_sequence(_) for _ in indexer):
-> 1911                 ser = ser.reindex(obj.axes[0][indexer[0]], copy=True)._values
   1912 
   1913                 # single indexer

/usr/local/lib/python3.6/dist-packages/pandas/core/series.py in reindex(self, index, **kwargs)
   4397     )
   4398     def reindex(self, index=None, **kwargs):
-> 4399         return super().reindex(index=index, **kwargs)
   4400 
   4401     def drop(

/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in reindex(self, *args, **kwargs)
   4457         # perform the reindex on the axes
   4458         return self._reindex_axes(
-> 4459             axes, level, limit, tolerance, method, fill_value, copy
   4460         ).__finalize__(self, method="reindex")
   4461 

/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in _reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy)
   4480                 fill_value=fill_value,
   4481                 copy=copy,
-> 4482                 allow_dups=False,
   4483             )
   4484 

/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in _reindex_with_indexers(self, reindexers, fill_value, copy, allow_dups)
   4525                 fill_value=fill_value,
   4526                 allow_dups=allow_dups,
-> 4527                 copy=copy,
   4528             )
   4529             # If we've made a copy once, no need to make another one

/usr/local/lib/python3.6/dist-packages/pandas/core/internals/managers.py in reindex_indexer(self, new_axis, indexer, axis, fill_value, allow_dups, copy, consolidate)
   1274         # some axes don't allow reindexing with dups
   1275         if not allow_dups:
-> 1276             self.axes[axis]._can_reindex(indexer)
   1277 
   1278         if axis >= self.ndim:

/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in _can_reindex(self, indexer)
   3283         # trying to reindex on an axis with duplicates
   3284         if not self.is_unique and len(indexer):
-> 3285             raise ValueError("cannot reindex from a duplicate axis")
   3286 
   3287     def reindex(self, target, method=None, level=None, limit=None, tolerance=None):

ValueError: cannot reindex from a duplicate axis
@gaborpelesz
Copy link
Contributor Author

Further analysing the issue, I found the problem and I can see that it has been fixed in #103
I will leave the issue open as it has not yet been released with the newest aix360 version. If the question below will be answered feel free to close this.

Any comment on when we can expect a version update?

@dennislwei
Copy link
Collaborator

@gaborpelesz sorry for the delayed response. I'm glad that you found #103 on your own as that is what I would have pointed you too.

@vijay-arya do you have happen to have an answer about a version update?

@vijay-arya
Copy link
Collaborator

@gaborpelesz We plan to push a version update to pypi in about a week or so.

@gaborpelesz
Copy link
Contributor Author

Thank you @vijay-arya and @dennislwei for your answers! No problem with the delay at all, since it wasn't about an unsolved issue and the master branch has had the fix already. I'm closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants