Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clean annotations of commentaries from jdsw #10

Closed
thatbudakguy opened this issue May 27, 2022 · 0 comments
Closed

clean annotations of commentaries from jdsw #10

thatbudakguy opened this issue May 27, 2022 · 0 comments
Assignees

Comments

@thatbudakguy
Copy link
Member

copied/adapted notes from 5/27 meeting:

  1. Look thru JDSW and break it up into k: v store, where each key is every unbroken sequence of characters prior to an annotation
  2. For each key: value pair...
    a. Look through the source text (same chapter) and find the first instance of the key (unbroken) that occurs after the previous annotation (annotations must be sequential)
    b. If that key is found and it's in the source text (not a commentary), leave it alone in the JDSW
    c. If that key is found and it's in the commentary (indicated in SBCK editions in brackets), drop it from the JDSW
    d. If that key isn't found at all, log it along with the previous and next annotations so that @GDRom can investigate manually

Assumption: If LDM annotates two successive characters, the second annotation refers to the instance of that character that is closest in the source text to the previous character.

this will produce a version of the JDSW that leaves out any annotations referring to commentaries, which we can later align to the 正文 versions.

@thatbudakguy thatbudakguy self-assigned this May 27, 2022
thatbudakguy added a commit that referenced this issue May 30, 2022
- Ensure we don't accidentally collapse repeat annotations
- Allow overwriting the input JDSW file with flag
- Add column to file indicating annotation status

See #10
thatbudakguy added a commit that referenced this issue May 30, 2022
- Ensure we don't accidentally collapse repeat annotations
- Allow overwriting the input JDSW file with flag
- Add column to file indicating annotation status

See #10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant