-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Character counts in the concatenated USFM files for the 73 Bible books #17
Comments
2 out of 9 instances of the NO BREAK SPACE are artefacts and may be replaced by a normal space. They are found in these two verses:
The other 7 instances are rightly used between single and double quotation marks. By contrast, there are 7 places where an ordinary space is used between double and single left qm.
And there are 205 places where an ordinary space is used between single and double right qm. These inconsistencies should be fixed. |
The attached tab delimited text file may be a useful analysis:
merged.usfm.character.frequency.txt
Observe the difference in counts for characters that are usually in pairs.
This indicates that there may be some unpaired characters, which is often worth checking.
The right single quotation mark is also used as the typographical apostrophe, which helps explain the large difference observed.
The text was updated successfully, but these errors were encountered: