Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(tlg0001.tlg001_tlg0005.tlg001_emdash) Change "--" to em dash #1270

Merged

Conversation

whoopsedesy
Copy link
Contributor

@whoopsedesy whoopsedesy commented May 12, 2021

The Beta Code for em dash is _, but these two texts used -- instead, which remained after the conversion to Unicode.

I spot-checked some of the instances against book scans:

The corresponding SEDES commit is sasansom/sedes@ef61834.

David Fifield added 2 commits May 12, 2021 20:41
I spot-checked against a scan of the 1912 edition by George W. Mooney.
1.333 https://archive.org/details/argonauticaedite00apoluoft/page/90/mode/2up
2.913 https://archive.org/details/argonauticaedite00apoluoft/page/202/mode/2up

A possible cause was "--" wrongly being used in Beta Code to indicate an
em dash (should have been "_").

The corresponding SEDES commit is
sasansom/sedes@ef61834
I spot-checked against a scan of the 1919 edition by R. J. Cholmeley.
1.83 https://archive.org/details/idyllsoftheocrit00theouoft/page/64/mode/2up
2.104 https://archive.org/details/idyllsoftheocrit00theouoft/page/70/mode/2up

A possible cause was "--" wrongly being used in Beta Code to indicate an
em dash (should have been "_").

The corresponding SEDES commit is
sasansom/sedes@ef61834
@lcerrato
Copy link
Collaborator

Perseus texts used a modified version of beta code which varies from the standard in some cases, so the em dash was either the entity or, as in these cases, read as is. The source files use two hyphens. That was how it was encoded.

The unconverted works should typically have these fixed in the course of the conversion review.

I see two hyphens in the current Perseus editions so this isn’t a case of the entity being missed, but rather the early conversion work on the grc2 file was just done prior to the current workflow and there were inconsistent checks on the data.

In all cases, entities such as — are resolved apart from beta code conversion.

@whoopsedesy
Copy link
Contributor Author

Thanks for the reply. Do I understand correctly, then, that the XML files are the wrong place to make this change (i.e., parsers are supposed to substitute -- for on the fly)? Or, is this pull request still a necessary part of conversion, and your comment is to explain why things currently stand the way they do? We're willing to work with you if any adjustments are necessary.

Right now, I see -- at Argonautica 1.333 in both Hopper and Scaife.

The unconverted works should typically have these fixed in the course of the conversion review.

What I understand by "converted" and "unconverted" is conversion of TEI to conform to the requirements of CTS and EpiDoc. So tlg0001.tlg001 is converted and in the Scaife viewer at https://scaife.perseus.org/library/urn:cts:greekLit:tlg0001.tlg001.perseus-grc2/, but tlg0005.tlg001 is still unconverted. Do you prefer not to have pull requests for unconverted files (because they will be batch-processed at some point in the future)?

Thanks again for your quick and helpful responses.

@lcerrato
Copy link
Collaborator

@whoopsedesy
Thanks for the questions.

No, there is no parser changing the hyphens to em dash (to my knowledge). I was noting that it is not necessarily a mistake in the beta code transformation here (rather it's a choice made by the initial preparer/editor of the electronic text).

It's really a matter of preference. I wouldn't necessarily prioritize unconverted (non EpiDoc/non CTS) texts at this time as this is something we should catch at conversion. I prefer a deep dive on those texts, as there tend to be issues and challenges unique to each file (or set of files).

Even some of these earlier conversions were not done consistently, or were mostly mechanical conversions, so things were certainly missed.

Thanks again — sorry for the delay.

@lcerrato lcerrato merged commit 63ab223 into PerseusDL:master May 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants