float division by zero issue #106

MekhledA · 2020-10-28T02:07:25Z

When I want to asses blocking result using metrics rr $ pc , I got the message error "ZeroDivisionError: float division by zero". How could I fix it?

wilko77 · 2020-10-29T05:11:50Z

I had a look at the

blocklib/blocklib/evaluation.py

Line 5 in b569b3f

def assess_blocks_2party(filtered_reverse_indices, data):

function.
There are two reasons why you might get a ZeroDivisionError.

there are no records in the provided data.
there are no true matches in the provided data.

Have a look at your data and make sure you provide it in the right format.
Or provide an example that shows this error and I can try to help.

MekhledA · 2020-10-29T23:45:11Z

I have used two datasets which are in text format in csv file that include 5 attributes (id, title, authors, venue. year).

subdata1 = [x[0] for x in data_alice]
subdata2 = [x[0] for x in data_bob]

rr, pc = assess_blocks_2party([filtered_blocks_alice, filtered_blocks_bob],
                              [subdata1, subdata2])

print('RR={}'.format(rr))
print('PC={}'.format(pc))

I'm using the same above code but I have changed x[0] to x[1] in both lines and it works fine and I got results.
In this case, changing x[0] to x[1] Does it produce wrong results?

joyceyuu · 2020-11-01T22:19:11Z

Note that the subdata1 and subdata2 here represent the entity id of two parties i.e. the ground truth. We use x[0] since the entity id is in the first column of every record. Here 0 represent the column index of entity id. If your entity id is in the second column, then use x[1] in the list comprehension. assess_blocks_2party needs them to compute the pair completeness. Have a look at the documentation for it here.

Hope it helps :)

MekhledA · 2020-11-02T01:42:14Z

Here an example of my dataset:

id, title, authors, venue, year

304, world wide, lyman ram, international conference, 1999
290, safe query, richard lomet, acm sigmod, 2001
279, database, pillip keim, international conference, 1998

The entity id should be x[0] as the id attribute is the unique attribute but using x[0] it doesn't work and gives the error ZeroDivisionError: float division by zero . Is there any way to fix this error instead of using x[1]?

joyceyuu · 2020-11-02T23:31:01Z

Given entity id is in column 0, I don't think you should put x[1] in the list comprehension. There are few ways potentially might locate the problem:

Check if there is intersection between the id column in your two datasets
Check if your filtered_blocks_alice and filtered_blocks_bob are empty
Clone the latest the blocklib and install it manually with pip install. Wilko has pushed a PR to capture all float division by zero cases and throw the reason of that

MekhledA · 2020-11-03T01:33:21Z

Thanks Wang for your suggestions.

I have checked the data types of id in both datasets and have the same type.
2- filtered_blocks_acm and filtered_blocks_dblp are not empty.
3- I have installed the latest version.

I still have the same issue. and here is the screenshot:

---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-46-4b6a340a3700> in <module>
      5 
      6 rr, pc = assess_blocks_2party([filtered_blocks_acm, filtered_blocks_dblp],
----> 7                               [subdata1, subdata2])
      8 
      9 print('RR={}'.format(rr))

~\AppData\Roaming\Python\Python37\site-packages\blocklib\evaluation.py in assess_blocks_2party(filtered_reverse_indices, data)
     45     # pair completeness is the "recall" before matching stage
     46     rr = 1.0 - float(num_cand_rec_pairs) / total_rec
---> 47     pc = float(num_block_true_matches) / num_all_true_matches
     48     return rr, pc

ZeroDivisionError: float division by zero

kishanpython · 2020-11-12T13:40:11Z

Hello Wang I am also getting the same error "ZeroDivisionError: float division by zero". Can you suggest something to remove this error?

kishanpython · 2020-11-12T13:43:16Z

Here is the error sample :-

`

kishanpython · 2020-11-13T07:46:46Z

After running the evaluation methods I figure out that the ground truth value provided is different.
For Ex:- Id for data -1 is "conf/sigmod/AbadiC02" and Id for Data - 2 is in this form "f2Lea-RN8dsJ". So, when it calculating the intersection for num_all_true_matches = len(entity1.intersection(entity2)) it become zero and it raises "ZeroDivisionError: float division by zero" when we calculating pc value.

My question is, it's necessary that ID or ground truth columns of both dataset should must be in same format?
In this case if my ID is different what is the other approach by which we can calculate rr and pc values.
Can we use year columns for this purpose?

id, title, authors, venue, year

conf/sigmod/AbadiC02, world wide, lyman ram, international conference, 1999

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

float division by zero issue #106

float division by zero issue #106

MekhledA commented Oct 28, 2020

wilko77 commented Oct 29, 2020

MekhledA commented Oct 29, 2020

joyceyuu commented Nov 1, 2020

MekhledA commented Nov 2, 2020

joyceyuu commented Nov 2, 2020

MekhledA commented Nov 3, 2020 •

edited

kishanpython commented Nov 12, 2020

kishanpython commented Nov 12, 2020

kishanpython commented Nov 13, 2020

float division by zero issue #106

float division by zero issue #106

Comments

MekhledA commented Oct 28, 2020

wilko77 commented Oct 29, 2020

MekhledA commented Oct 29, 2020

joyceyuu commented Nov 1, 2020

MekhledA commented Nov 2, 2020

joyceyuu commented Nov 2, 2020

MekhledA commented Nov 3, 2020 • edited

kishanpython commented Nov 12, 2020

kishanpython commented Nov 12, 2020

kishanpython commented Nov 13, 2020

MekhledA commented Nov 3, 2020 •

edited