New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limiting -ne to sentence-initial words #300

Closed
Jmuccigr opened this Issue Jun 1, 2016 · 9 comments

Comments

Projects
None yet
3 participants
@Jmuccigr

Jmuccigr commented Jun 1, 2016

One way to eliminate a lot of the counting of a non-enclitic word-final ne the enclitic -ne would be to count as enclitics only those -ne that occur on sentence-initial words. The enclitic can't appear elsewhere, so this would leave all the other word-final -ne alone.

You'd still have the problem of an ambiguous use of the non-enclitic word-final -ne occurring on the first word in a sentence, but those will be a lot fewer.

@diyclassics

This comment has been minimized.

Show comment
Hide comment
@diyclassics

diyclassics Jun 1, 2016

Contributor

@kylepjohnson—I am working on this. The current system for handling '-ne' isn't efficient and subject to far too many exceptions. Could have the tokenizer tokenize sentences first, then each sentence, and only check the [0] token for -'ne'. Still not perfect, as @Jmuccigr says, but better and headed in the right direction.

Contributor

diyclassics commented Jun 1, 2016

@kylepjohnson—I am working on this. The current system for handling '-ne' isn't efficient and subject to far too many exceptions. Could have the tokenizer tokenize sentences first, then each sentence, and only check the [0] token for -'ne'. Still not perfect, as @Jmuccigr says, but better and headed in the right direction.

@kylepjohnson

This comment has been minimized.

Show comment
Hide comment
@kylepjohnson

kylepjohnson Jun 1, 2016

Member

@diyclassics Excellent. Is this ticket covered by your latest PR #303?

Member

kylepjohnson commented Jun 1, 2016

@diyclassics Excellent. Is this ticket covered by your latest PR #303?

@diyclassics

This comment has been minimized.

Show comment
Hide comment
@diyclassics

diyclassics Jun 1, 2016

Contributor

@kylepjohnson No—this will be covered soon, but since it requires a major change to the control flow of the tokenizer it might take a bit longer than the other changes.

Contributor

diyclassics commented Jun 1, 2016

@kylepjohnson No—this will be covered soon, but since it requires a major change to the control flow of the tokenizer it might take a bit longer than the other changes.

@diyclassics

This comment has been minimized.

Show comment
Hide comment
@diyclassics

diyclassics Jun 9, 2016

Contributor

@Jmuccigr @kylepjohnson See https://github.com/diyclassics/cltk/blob/master/cltk/tokenize/word.py for better handling of sentence-initial "-ne". As John suggests, the enclitic is also marked with a hyphen so that it can be distinguished from conj. "ne". Submitted pull request.

Contributor

diyclassics commented Jun 9, 2016

@Jmuccigr @kylepjohnson See https://github.com/diyclassics/cltk/blob/master/cltk/tokenize/word.py for better handling of sentence-initial "-ne". As John suggests, the enclitic is also marked with a hyphen so that it can be distinguished from conj. "ne". Submitted pull request.

@kylepjohnson

This comment has been minimized.

Show comment
Hide comment
@kylepjohnson

kylepjohnson Jun 9, 2016

Member

Patrick, the PR was #311?

Member

kylepjohnson commented Jun 9, 2016

Patrick, the PR was #311?

@diyclassics

This comment has been minimized.

Show comment
Hide comment
@diyclassics

diyclassics Jun 9, 2016

Contributor

Yes—#311 includes the "-ne" handling.

Contributor

diyclassics commented Jun 9, 2016

Yes—#311 includes the "-ne" handling.

@kylepjohnson

This comment has been minimized.

Show comment
Hide comment
@kylepjohnson

kylepjohnson Jun 9, 2016

Member

Thanks.

@Jmuccigr and @diyclassics I'll leave it to one of you to close, if this particular issue is wrapped up.

Member

kylepjohnson commented Jun 9, 2016

Thanks.

@Jmuccigr and @diyclassics I'll leave it to one of you to close, if this particular issue is wrapped up.

@Jmuccigr

This comment has been minimized.

Show comment
Hide comment
@Jmuccigr

Jmuccigr Jun 9, 2016

Gonna let @diyclassics decide that. :-)

Jmuccigr commented Jun 9, 2016

Gonna let @diyclassics decide that. :-)

@diyclassics

This comment has been minimized.

Show comment
Hide comment
@diyclassics

diyclassics Jun 10, 2016

Contributor

Fixed with commit f60510c

Contributor

diyclassics commented Jun 10, 2016

Fixed with commit f60510c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment