You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a set of data that includes a 'set' field, which includes an array of strings. When I attempt to train using this data, I get an error down in indexAll within blocking.py
The backtrace looks like this:
File "/Users/kball/git/getit/listing-dedupe/deduper.py", line 84, in train
self.deduper.train(recall=0.90)
File "/Users/kball/.virtualenvs/listing-dedupe/lib/python3.6/site-packages/dedupe/api.py", line 676, in train
self._trainBlocker(recall, index_predicates)
File "/Users/kball/.virtualenvs/listing-dedupe/lib/python3.6/site-packages/dedupe/api.py", line 684, in _trainBlocker
block_learner = self._blockLearner(predicate_set)
File "/Users/kball/.virtualenvs/listing-dedupe/lib/python3.6/site-packages/dedupe/api.py", line 815, in _blockLearner
self.sampled_records)
File "/Users/kball/.virtualenvs/listing-dedupe/lib/python3.6/site-packages/dedupe/training.py", line 104, in __init__
blocker.indexAll(sampled_records)
File "/Users/kball/.virtualenvs/listing-dedupe/lib/python3.6/site-packages/dedupe/blocking.py", line 95, in indexAll
in viewvalues(data_d)
File "/Users/kball/.virtualenvs/listing-dedupe/lib/python3.6/site-packages/dedupe/blocking.py", line 96, in <setcomp>
if record[field]}
TypeError: unhashable type: 'list'
I believe the root cause is that you cannot create a set with lists as the underlying values
({[1]} throws the same error). That said, I'm not sure if it is sufficient to simply iterate through the values in the 'set' for this purpose, or if there are more extensive changes that need to happen... it seems like the combination of values could well be important for learning, so simply flattening doesn't seem right.
The text was updated successfully, but these errors were encountered:
I don't know exactly what you are trying to do, but you might be able to achieve it by make a set of tuples instead of set of lists, which as you note is not allowed by Python.
I have a set of data that includes a 'set' field, which includes an array of strings. When I attempt to train using this data, I get an error down in
indexAll
withinblocking.py
The backtrace looks like this:
I believe the root cause is that you cannot create a set with lists as the underlying values
(
{[1]}
throws the same error). That said, I'm not sure if it is sufficient to simply iterate through the values in the 'set' for this purpose, or if there are more extensive changes that need to happen... it seems like the combination of values could well be important for learning, so simply flattening doesn't seem right.The text was updated successfully, but these errors were encountered: