Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KP-Miner: why candidate_df is 1 for n-grams except unigram? #214

Open
atalnarayan opened this issue Dec 26, 2022 · 1 comment
Open

KP-Miner: why candidate_df is 1 for n-grams except unigram? #214

atalnarayan opened this issue Dec 26, 2022 · 1 comment

Comments

@atalnarayan
Copy link

atalnarayan commented Dec 26, 2022

In KP-Miner implementation, n-gram candidates with n>1 are assigned candidate_df=1.
See

candidate_df += df.get(k, 0)

....
        # loop throught the candidates
        for k, v in self.candidates.items():

            # get candidate document frequency
            candidate_df = 1

            # get the df for unigram only
            if len(v.lexical_form) == 1:
                candidate_df += df.get(k, 0)
...

I do not think the paper intends to do this. Shouldn't we get DF for all candidates whether unigram or not, and assign a default value of 1 if candidate is not present in the vocabulary?

@atalnarayan
Copy link
Author

The default DF file: https://github.com/boudinfl/pke/blob/master/pke/models/df-semeval2010.tsv.gz also contains n-grams (n>1) with df>1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant