Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Malformed dclp-hybrid values #317

Closed
hcayless opened this issue Aug 17, 2017 · 14 comments
Closed

Malformed dclp-hybrid values #317

hcayless opened this issue Aug 17, 2017 · 14 comments

Comments

@hcayless
Copy link

There are 233 DCLP documents that have broken dclp-hybrid <idno> values. A full list can be found at https://gist.github.com/hcayless/0c99cb6af2b27239f397ca854e52e677. They all seem to be P.Herc. docs.

This error prevents correct indexing of the documents for search.

@jcowey
Copy link
Contributor

jcowey commented Aug 17, 2017

@HolgerEssler this needs to be sorted as soon as we can manage. Do you want to have all P.Herc. publications collected under one standardised dclp-hybrid so that they ressemble e.g. p.oxy;12;1234?
If the answer is yes then we have to change
<idno type="dclp-hybrid">P.Herc. 1120</idno>
into
<idno type="dclp-hybrid">p.herc;;1120</idno> (that assumes no volume)

If the answer is no, then we replace these dclp-hybrid with the relevant "na;;23456" value, that is "na" (viz. no author) followed by the TM number.

@HolgerEssler
Copy link

Yes, please change
<idno type="dclp-hybrid">P.Herc. 1120</idno>
into
<idno type="dclp-hybrid">p.herc;;1120</idno>.
I suppose
<idno type="dclp-hybrid">P.Herc. 1043 + 1045</idno>
should then become
<idno type="dclp-hybrid">p.herc;;1043;1045</idno>
and
<idno type="dclp-hybrid">P.Herc. 419, 697, 1634</idno>
should become
<idno type="dclp-hybrid">p.herc;;419;697;1634</idno>.
Would that be ok?

@hcayless
Copy link
Author

I would recommend something like:

<idno type="dclp-hybrid">P.Herc. 1043 + 1045</idno> -> <idno type="dclp-hybrid">p.herc;;1043+1045</idno>
and
<idno type="dclp-hybrid">P.Herc. 419, 697, 1634</idno> -> <idno type="dclp-hybrid">p.herc;;419,697,1634</idno>

@Edelweiss Edelweiss self-assigned this Aug 17, 2017
@paregorios
Copy link
Member

paregorios commented Aug 17, 2017 via email

@hcayless
Copy link
Author

@paregorios
Copy link
Member

@jcowey ?

@jcowey
Copy link
Contributor

jcowey commented Aug 23, 2017

pretty sure I know how to fix this and will do so

o.frangé;;438 => o.frange;;438; as in ddbdp
o.wångstedt;;80 => o.wangstedt;;80; will have to be added to collection.rdf
p.genève[horssérie];;1 => p.geneve[horsserie];;1; will have to be added to collection.rdf
p.murabba'ât;2;108 => p.mur;2;108; as in ddbdp
p.demarée;;5 => p.demaree;;5; will have to be added to collection.rdf

is that analysis correct @hcayless ?

@hcayless
Copy link
Author

I'm not sure how the square brackets will play. We'll have to see.

@jcowey
Copy link
Contributor

jcowey commented Aug 23, 2017

So have now created a new issue #324, to keep these two separate.

@jcowey
Copy link
Contributor

jcowey commented Aug 23, 2017

@hcayless would you please check that
<idno type="dclp-hybrid">p.herc;;.+</idno>
is now fine.
There should now be no more
<idno type="dclp-hybrid">P.Herc.
left in
https://github.com/DCLP/idp.data/tree/master/DCLP

I have made a number of commits to make the required corrections.

@jcowey jcowey assigned hcayless and unassigned Edelweiss and HolgerEssler Aug 23, 2017
@jcowey jcowey modified the milestones: Final Release, Merge Two Jan 25, 2018
@Edelweiss
Copy link

files that still need repair

./63/62411.xml: P.Herc. 228, 403, 407, 1425, 1581
./63/62425.xml: P.Herc. 495
./63/62426.xml: P.Herc. 558
./63/62476.xml: P.Herc. 1471

change to…

./63/62411.xml: p.herc;;228,403,407,1425,1581
./63/62425.xml: p.herc;;495
./63/62426.xml: p.herc;;558
./63/62476.xml: p.herc;;1471

@Edelweiss
Copy link

Edelweiss commented Mar 7, 2018

case-sensitive search for P.Herc. in xpath tei:idno[@type='dclp-hybrid'] didn’t bring forth any further idnos of the kind

@Edelweiss
Copy link

Edelweiss commented Mar 7, 2018

https://github.com/DCLP/idp.data/tree/issue317

(in development and master)

@Edelweiss Edelweiss assigned jcowey and unassigned Edelweiss Mar 8, 2018
@Edelweiss
Copy link

Files can be viewed on github and will be picked up with the next sync.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants