Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Character counts in the concatenated USFM files for the 73 Bible books #17

Open
DavidHaslam opened this issue Dec 6, 2018 · 1 comment

Comments

@DavidHaslam
Copy link

DavidHaslam commented Dec 6, 2018

The attached tab delimited text file may be a useful analysis:

merged.usfm.character.frequency.txt

Observe the difference in counts for characters that are usually in pairs.

U+0028	(	5,803	LEFT PARENTHESIS
U+0029	)	5,800	RIGHT PARENTHESIS

U+2018	‘	3,771	LEFT SINGLE QUOTATION MARK
U+2019	’	5,085	RIGHT SINGLE QUOTATION MARK

U+201C	“	6,206	LEFT DOUBLE QUOTATION MARK
U+201D	”	6,188	RIGHT DOUBLE QUOTATION MARK

This indicates that there may be some unpaired characters, which is often worth checking.

The right single quotation mark is also used as the typographical apostrophe, which helps explain the large difference observed.

@DavidHaslam
Copy link
Author

2 out of 9 instances of the NO BREAK SPACE are artefacts and may be replaced by a normal space.

They are found in these two verses:

\v 2 ¶ But the earth was empty and unoccupied, and darknesses were over the face of the abyss; and so the Spirit of God was brought over the waters.\f + \fr 1:2 \ft After earth was created, it was empty and unoccupied. Darkness is plural in the Latin. This could symbolize fallen angels, with the abyss symbolizing Hell. The word ‘darknesses’ can also refer to the absences of so many good things, so that God had to continue creating. The Spirit of God was brought or was carried over the waters, passive tense.\fl (Conte)\f*
\v 4 And God saw the light, that it was good; and so he divided the light from the darknesses.\f + \fr 1:4 \ft God divided light from darkness. He also divided Heaven from Hell, once Hell was created (or became necessary due to the angels that fell from grace). Notice that God chooses to create Heaven and Earth (Universe), but Hell comes about as a result of sin. God creates Good, but Evil comes about because of sin. God does not directly create evil or darkness.\fl (Conte)\f*

The other 7 instances are rightly used between single and double quotation marks.

By contrast, there are 7 places where an ordinary space is used between double and single left qm.

\v 17 And he told Joseph that he should command his brothers, saying: “ ‘Burden your beasts, and go into the land of Canaan,
\v 3 “ ‘Prepare the heavy and the light shield, and advance to war!\f + \fr 46:3 \ft In other words, prepare the heavy and light weapons of war.\fl (Conte)\f*
\v 1 ¶ “ ‘For this reason, the Lord our God has fulfilled his word, which he has spoken to us, and to our judges, who have judged Israel, and to our kings, and to our leaders, and to all Israel and Judah.
\v 1 ¶ “ ‘And now, O Lord Almighty, the God of Israel, the soul in anguish and the troubled spirit cry out to you.
\v 1 ¶ “ ‘This is the book of the commandments of God and of the law, which exists in eternity. All those who keep it will attain to life, but those who have forsaken it, to death.
\v 1 ¶ “ ‘Take off, O Jerusalem, the garment of your sorrow and troubles, and put on your beauty and the honor of that eternal glory, which you have from God.
\v 37 Jesus said to him: “ ‘You shall love the Lord your God from all your heart, and with all your soul and with all your mind.’

And there are 205 places where an ordinary space is used between single and double right qm.
That's too many to list here. Search for regexp \x{2019} \x{201D}

These inconsistencies should be fixed.

@DavidHaslam DavidHaslam changed the title Character counts in the concatenated 73 USFM files for the Bible books Character counts in the concatenated USFM files for the 73 Bible books Dec 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant