KP-Miner: why candidate_df is 1 for n-grams except unigram? #214

atalnarayan · 2022-12-26T07:31:37Z

In KP-Miner implementation, n-gram candidates with n>1 are assigned candidate_df=1.
See

pke/pke/unsupervised/statistical/kpminer.py

Line 143 in 8f1d05d

candidate_df += df.get(k, 0)

....
        # loop throught the candidates
        for k, v in self.candidates.items():

            # get candidate document frequency
            candidate_df = 1

            # get the df for unigram only
            if len(v.lexical_form) == 1:
                candidate_df += df.get(k, 0)
...

I do not think the paper intends to do this. Shouldn't we get DF for all candidates whether unigram or not, and assign a default value of 1 if candidate is not present in the vocabulary?

The text was updated successfully, but these errors were encountered:

atalnarayan · 2022-12-26T07:38:53Z

The default DF file: https://github.com/boudinfl/pke/blob/master/pke/models/df-semeval2010.tsv.gz also contains n-grams (n>1) with df>1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KP-Miner: why candidate_df is 1 for n-grams except unigram? #214

KP-Miner: why candidate_df is 1 for n-grams except unigram? #214

atalnarayan commented Dec 26, 2022 •

edited

Loading

atalnarayan commented Dec 26, 2022

KP-Miner: why candidate_df is 1 for n-grams except unigram? #214

KP-Miner: why candidate_df is 1 for n-grams except unigram? #214

Comments

atalnarayan commented Dec 26, 2022 • edited Loading

atalnarayan commented Dec 26, 2022

atalnarayan commented Dec 26, 2022 •

edited

Loading