Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Neither export_pairs_to_csv nor export_to_df works on EntityMatching object for clean-clean ER #25

Closed
zmbc opened this issue Aug 26, 2024 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@zmbc
Copy link

zmbc commented Aug 26, 2024

Describe the bug
export_pairs_to_csv and export_to_df both fail with KeyError on an EntityMatching object for clean-clean ER.

To Reproduce

import pandas as pd
from pyjedai.datamodel import Data

d1 = pd.read_csv("https://raw.githubusercontent.com/AI-team-UoA/pyJedAI/main/data/ccer/D2/abt.csv", sep='|', engine='python', na_filter=False)
d2 = pd.read_csv("https://raw.githubusercontent.com/AI-team-UoA/pyJedAI/main/data/ccer/D2/buy.csv", sep='|', engine='python', na_filter=False)
gt = pd.read_csv("https://raw.githubusercontent.com/AI-team-UoA/pyJedAI/main/data/ccer/D2/gt.csv", sep='|', engine='python')

data = Data(dataset_1=d1,
            id_column_name_1='id',
            dataset_2=d2,
            id_column_name_2='id',
            ground_truth=gt)

from pyjedai.block_building import StandardBlocking

bb = StandardBlocking()
blocks = bb.build_blocks(data, attributes_1=['name'], attributes_2=['name'])

from pyjedai.matching import EntityMatching

em = EntityMatching(
    metric='cosine',
    tokenizer='char_tokenizer',
    vectorizer='tfidf',
    qgram=3,
    similarity_threshold=0.0
)

pairs_graph = em.predict(blocks, data, tqdm_disable=True)

em.export_pairs_to_csv('foo.csv') # Fails with "KeyError: 0"
em.export_to_df(pairs_graph) # Fails with "KeyError: 1094"

Expected behavior

A CSV file or dataframe of pairs to be generated.

@Nikoletos-K Nikoletos-K self-assigned this Aug 27, 2024
@Nikoletos-K Nikoletos-K added the bug Something isn't working label Aug 27, 2024
@Nikoletos-K
Copy link
Member

Nikoletos-K commented Aug 27, 2024

Hello there!

Thanks for the detailed report. Yeah there's a mistake. I have fixed it, in dev-stage.
These changes will be available in the next few days (if not today) in a new release.

Also, fyi, export_to_csv will be removed and you should export_to_df and then call pandas to_csv().

Cheers,
Konstantinos

@Nikoletos-K
Copy link
Member

@zmbc Check new release 0.1.9. If it's okay ping me to close this issue.

@zmbc
Copy link
Author

zmbc commented Aug 27, 2024

Thank you so much for the prompt response and fix! I'll test it out.

@zmbc
Copy link
Author

zmbc commented Aug 28, 2024

It works now, thank you 😃

@zmbc zmbc closed this as completed Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants
@zmbc @Nikoletos-K and others