Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training on data including 'set' field results in error #635

Closed
kball opened this issue Feb 2, 2018 · 1 comment
Closed

Training on data including 'set' field results in error #635

kball opened this issue Feb 2, 2018 · 1 comment

Comments

@kball
Copy link

kball commented Feb 2, 2018

I have a set of data that includes a 'set' field, which includes an array of strings. When I attempt to train using this data, I get an error down in indexAll within blocking.py

The backtrace looks like this:

File "/Users/kball/git/getit/listing-dedupe/deduper.py", line 84, in train
    self.deduper.train(recall=0.90)
  File "/Users/kball/.virtualenvs/listing-dedupe/lib/python3.6/site-packages/dedupe/api.py", line 676, in train
    self._trainBlocker(recall, index_predicates)
  File "/Users/kball/.virtualenvs/listing-dedupe/lib/python3.6/site-packages/dedupe/api.py", line 684, in _trainBlocker
    block_learner = self._blockLearner(predicate_set)
  File "/Users/kball/.virtualenvs/listing-dedupe/lib/python3.6/site-packages/dedupe/api.py", line 815, in _blockLearner
    self.sampled_records)
  File "/Users/kball/.virtualenvs/listing-dedupe/lib/python3.6/site-packages/dedupe/training.py", line 104, in __init__
    blocker.indexAll(sampled_records)
  File "/Users/kball/.virtualenvs/listing-dedupe/lib/python3.6/site-packages/dedupe/blocking.py", line 95, in indexAll
    in viewvalues(data_d)
  File "/Users/kball/.virtualenvs/listing-dedupe/lib/python3.6/site-packages/dedupe/blocking.py", line 96, in <setcomp>
    if record[field]}
TypeError: unhashable type: 'list'

I believe the root cause is that you cannot create a set with lists as the underlying values
({[1]} throws the same error). That said, I'm not sure if it is sufficient to simply iterate through the values in the 'set' for this purpose, or if there are more extensive changes that need to happen... it seems like the combination of values could well be important for learning, so simply flattening doesn't seem right.

@fgregg
Copy link
Contributor

fgregg commented Feb 11, 2018

I don't know exactly what you are trying to do, but you might be able to achieve it by make a set of tuples instead of set of lists, which as you note is not allowed by Python.

@fgregg fgregg closed this as completed Feb 11, 2018
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 8, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants