Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add West Old Turkic concept list #1240

Merged
merged 34 commits into from Oct 14, 2022
Merged

Add West Old Turkic concept list #1240

merged 34 commits into from Oct 14, 2022

Conversation

martino-vic
Copy link
Contributor

@martino-vic martino-vic commented Oct 6, 2022

Pull request checklist

  • add new concept list
  • add new metadata
  • add new Concepticon concept sets
    • checked whether the new concept(s) can be applied to existing lists with
      concepticon notlinked --gloss "NEW_GLOSS"
  • add new Concepticon concept relations
  • refine existing Concepticon concept set mappings
  • refine Concepticon glosses
  • refine Concepticon concept relations
  • refine Concepticon concept definitions
  • retire data

Additional information

I've followed the instructions mentioned in this issue
I didn't add new Concepticon concept sets, as that was the outcome of this issue
I'm not sure what the last four bullets mean, would be glad if somebody could give me a pointer
Also, when I run conception test I get ValueError: invalid Conceptset.semanticfield: ['Agriculture and Vegetation'], I couldn't decipher the rest of the error message either, which is:

INFO concepticon/concepticon-data at /home/viktor/Documents/GitHub/concepticon-data
testing 410 concept lists
Traceback (most recent call last):
File "/home/viktor/Documents/cldfvenv3.9/bin/concepticon", line 8, in
sys.exit(main())
File "/home/viktor/Documents/cldfvenv3.9/lib/python3.9/site-packages/pyconcepticon/main.py", line 62, in main
return args.main(args) or 0
File "/home/viktor/Documents/cldfvenv3.9/lib/python3.9/site-packages/pyconcepticon/commands/test.py", line 21, in run
if args.repos.check(*args.clids): # pragma: no cover
File "/home/viktor/Documents/cldfvenv3.9/lib/python3.9/site-packages/pyconcepticon/api.py", line 391, in check
'concepticon_id': set(self.conceptsets.keys()),
File "/home/viktor/Documents/cldfvenv3.9/lib/python3.9/site-packages/clldutils/misc.py", line 197, in get
result = instance.dict[self.name] = self.fget(instance)
File "/home/viktor/Documents/cldfvenv3.9/lib/python3.9/site-packages/pyconcepticon/api.py", line 120, in conceptsets
return to_dict(
File "/home/viktor/Documents/cldfvenv3.9/lib/python3.9/site-packages/pyconcepticon/util.py", line 36, in to_dict
for obj in iterobjects:
File "/home/viktor/Documents/cldfvenv3.9/lib/python3.9/site-packages/pyconcepticon/api.py", line 121, in
Conceptset(api=self, **lowercase(d))
File "", line 10, in init
File "/home/viktor/Documents/cldfvenv3.9/lib/python3.9/site-packages/pyconcepticon/models.py", line 69, in valid_key
raise ValueError('invalid {0}.{1}: {2}'.format(
ValueError: invalid Conceptset.semanticfield: ['Agriculture and Vegetation']
...

martino-vic added 3 commits October 5, 2022 17:54
Add new concepts from [West Old Turkic](#1180) -- soon to be added to concepticon
remove extra tab in last line
@AnnikaTjuka
Copy link
Collaborator

The devil is in the details: The semantic field tag needs to be changed from "Agriculture and Vegetation" to "Agriculture and vegetation".

I only see the file concepticon.tsv in the tab "Files changed". Did you want to discuss the error or add a new list? If the latter, you can check out an old PR from @mathildavz to see which files need to be uploaded before we can start the review process. #1188

@martino-vic
Copy link
Contributor Author

martino-vic commented Oct 7, 2022

Okay strange, my GitHub desktop didn't push the changes, now the 5 Files are modified. Sorry for the blooper.

Will correct the "V" zu "v" in a bit.

@xrotwang
Copy link
Contributor

xrotwang commented Oct 7, 2022

From the few interactions I had with GitHub desktop, I found it not very helpful, in particular when it comes to diagnosing bugs/unexpected situations. Using git on the command line might make this more transparent.

@AnnikaTjuka
Copy link
Collaborator

Three more issues:

  • The description of the list in conceptlists.tsv is incomplete.
  • Some columns have quotation marks for the entries. This is usually caused by Excel or LibreOffice. I recommend opening lists only with a text editor (or Google Sheets when working together on a list). I use Sublime Text, but you can also use another text editor.
  • The result of the discussion in issue adding <20 concepts for West Old Turkic #1238 was not to add any new concept sets. Thus, the file concepticon.tsv should not be updated in this PR.

One question:

  • Why do so many glosses occur more than once?

@martino-vic
Copy link
Contributor Author

So now I have:

  • completed the description
  • removed quotation marks
  • removed the 20 new concept sets
  • removed glosses that occurred more than once (they were an accident)

Getting following error message when running concepticon test:

INFO concepticon/concepticon-data at /home/viktor/Documents/GitHub/concepticon-data
testing 410 concept lists
Traceback (most recent call last):
File "/home/viktor/Documents/cldfvenv3.9/bin/concepticon", line 8, in
sys.exit(main())
File "/home/viktor/Documents/cldfvenv3.9/lib/python3.9/site-packages/pyconcepticon/main.py", line 62, in main
return args.main(args) or 0
File "/home/viktor/Documents/cldfvenv3.9/lib/python3.9/site-packages/pyconcepticon/commands/test.py", line 21, in run
if args.repos.check(*args.clids): # pragma: no cover
File "/home/viktor/Documents/cldfvenv3.9/lib/python3.9/site-packages/pyconcepticon/api.py", line 423, in check
for col in cl.metadata.tableSchema.columns:
File "/home/viktor/Documents/cldfvenv3.9/lib/python3.9/site-packages/clldutils/misc.py", line 197, in get
result = instance.dict[self.name] = self.fget(instance)
File "/home/viktor/Documents/cldfvenv3.9/lib/python3.9/site-packages/pyconcepticon/models.py", line 234, in metadata
return self.tg.tables[0]
File "/home/viktor/Documents/cldfvenv3.9/lib/python3.9/site-packages/clldutils/misc.py", line 197, in get
result = instance.dict[self.name] = self.fget(instance)
File "/home/viktor/Documents/cldfvenv3.9/lib/python3.9/site-packages/pyconcepticon/models.py", line 226, in tg
tg = TableGroup.from_file(md)
File "/home/viktor/Documents/cldfvenv3.9/lib/python3.9/site-packages/csvw/metadata.py", line 565, in from_file
res = cls.fromvalue(data)
File "/home/viktor/Documents/cldfvenv3.9/lib/python3.9/site-packages/csvw/metadata.py", line 202, in fromvalue
return cls(**cls.partition_properties(d))
File "", line 39, in init
File "/home/viktor/Documents/cldfvenv3.9/lib/python3.9/site-packages/csvw/metadata.py", line 808, in
converter=lambda v: [Table.fromvalue(vv) for vv in v])
File "/home/viktor/Documents/cldfvenv3.9/lib/python3.9/site-packages/csvw/metadata.py", line 808, in
converter=lambda v: [Table.fromvalue(vv) for vv in v])
File "/home/viktor/Documents/cldfvenv3.9/lib/python3.9/site-packages/csvw/metadata.py", line 202, in fromvalue
return cls(**cls.partition_properties(d))
File "", line 31, in init
File "/home/viktor/Documents/cldfvenv3.9/lib/python3.9/site-packages/csvw/metadata.py", line 545, in
converter=lambda v: Schema.fromvalue(v))
File "/home/viktor/Documents/cldfvenv3.9/lib/python3.9/site-packages/csvw/metadata.py", line 517, in fromvalue
return cls(**cls.partition_properties(v))
File "", line 26, in init
File "/home/viktor/Documents/cldfvenv3.9/lib/python3.9/site-packages/csvw/metadata.py", line 491, in
converter=lambda v: [Column.fromvalue(c) for c in v])
File "/home/viktor/Documents/cldfvenv3.9/lib/python3.9/site-packages/csvw/metadata.py", line 491, in
converter=lambda v: [Column.fromvalue(c) for c in v])
File "/home/viktor/Documents/cldfvenv3.9/lib/python3.9/site-packages/csvw/metadata.py", line 202, in fromvalue
return cls(**cls.partition_properties(d))
File "", line 12, in init
File "/home/viktor/Documents/cldfvenv3.9/lib/python3.9/site-packages/csvw/metadata.py", line 362, in
converter=lambda v: v if not v else Datatype.fromvalue(v))
File "/home/viktor/Documents/cldfvenv3.9/lib/python3.9/site-packages/csvw/metadata.py", line 268, in fromvalue
return cls(**DescriptionBase.partition_properties(v))
File "", line 22, in init
File "/home/viktor/Documents/cldfvenv3.9/lib/python3.9/site-packages/attr/validators.py", line 269, in call
self.validator(inst, attr, value)
File "/home/viktor/Documents/cldfvenv3.9/lib/python3.9/site-packages/attr/validators.py", line 306, in call
raise ValueError(
ValueError: 'base' must be in {'any': <class 'csvw.datatypes.anyAtomicType'>, 'string': <class 'csvw.datatypes.string'>, 'anyURI': <class 'csvw.datatypes.anyURI'>, 'binary': <class 'csvw.datatypes.base64Binary'>, 'hexBinary': <class 'csvw.datatypes.hexBinary'>, 'boolean': <class 'csvw.datatypes.boolean'>, 'datetime': <class 'csvw.datatypes.dateTime'>, 'date': <class 'csvw.datatypes.date'>, 'dateTimeStamp': <class 'csvw.datatypes.dateTimeStamp'>, 'time': <class 'csvw.datatypes._time'>, 'duration': <class 'csvw.datatypes.duration'>, 'decimal': <class 'csvw.datatypes.decimal'>, 'integer': <class 'csvw.datatypes.integer'>, 'nonNegativeInteger': <class 'csvw.datatypes.nonNegativeInteger'>, 'float': <class 'csvw.datatypes._float'>, 'number': <class 'csvw.datatypes.number'>, 'double': <class 'csvw.datatypes.double'>, 'QName': <class 'csvw.datatypes.QName'>, 'gDay': <class 'csvw.datatypes.gDay'>, 'gMonth': <class 'csvw.datatypes.gMonth'>, 'gMonthDay': <class 'csvw.datatypes.gMonthDay'>, 'gYear': <class 'csvw.datatypes.gYear'>, 'gYearMonth': <class 'csvw.datatypes.gYearMonth'>, 'xml': <class 'csvw.datatypes.xml'>, 'html': <class 'csvw.datatypes.html'>, 'json': <class 'csvw.datatypes.json'>} (got 'positiveInteger')

@xrotwang
Copy link
Contributor

positiveInteger was only added to csvw recently (see https://github.com/cldf/csvw/blame/master/src/csvw/datatypes.py#L763-L769). So it looks like you'd need to update your csvw package.

@martino-vic
Copy link
Contributor Author

Thanks for the quick help!
I ran pip install csvw --upgrade so that the current version is csvw==3.1.1
now I get following error message:

INFO concepticon/concepticon-data at /home/viktor/Documents/GitHub/concepticon-data
testing 410 concept lists
ERROR:RonaTas-2011-431: expected string or bytes-like object
ERROR inconsistent data in repository /home/viktor/Documents/GitHub/concepticon-data

@xrotwang
Copy link
Contributor

Looks like the software does what it's supposed to do: pointing out problems with the data. Do you think the reported problem is a false positive?

@martino-vic
Copy link
Contributor Author

I'm having troubles understanding what the error message is trying to tell me.
My first question:
Which file contains the error? does ERROR:RonaTas-2011-431: mean the error is in RonaTas-2011-431.tsv?

What I so far understand is that some file somewhere in the repository is not a string or byte-like object. But which one that could be? There are three tsv-files, one json, and one .bib

@xrotwang
Copy link
Contributor

That's a bit tricky to debug, I admit. But NUMBER is acutally checked against a regular expression, here:
https://github.com/concepticon/pyconcepticon/blob/5ce8ad7944f408c006c1014a5c6ca532305dd6bb/src/pyconcepticon/models.py#L119-L120
Since you defined NUMBER as integer, this check raises a TypeError, the stack trace of which is unfortunately swallowed here
https://github.com/concepticon/pyconcepticon/blob/5ce8ad7944f408c006c1014a5c6ca532305dd6bb/src/pyconcepticon/api.py#L486-L488

and implemented rest of the recommendations.
@martino-vic
Copy link
Contributor Author

martino-vic commented Oct 10, 2022

Cool, thanks for the help! Re-defined NUMBER as string and renamed col GLOSS to ENGLISH, also in the metadata. Corrected two faulty entries to which the error messages successfully pointed me. Now the tests are passing:

INFO concepticon/concepticon-data at /home/viktor/Documents/GitHub/concepticon-data
testing 410 concept lists
INFO all integrity tests passed: OK

@AnnikaTjuka
Copy link
Collaborator

Ok, great! It looks like the PR is ready for our usual review process of checking the mappings and metadata. @mathildavz and @Tarotis Could you be a reviewer for this PR?

Copy link
Collaborator

@FredericBlum FredericBlum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll happily take over one of the review. Thank you @martino-vic for adding this concept list. Many times I find the mapping difficult because of the blurry meanings involved with the historical data.This is why I would thus propose to unmap in some cases, as you can see in my comments. I also had doubts on two cases, where I tagged @AnnikaTjuka to provide guidance.

concepticondata/concepticon.tsv Outdated Show resolved Hide resolved
concepticondata/references/references.bib Outdated Show resolved Hide resolved
concepticondata/conceptlists/RonaTas-2011-431.tsv Outdated Show resolved Hide resolved
concepticondata/conceptlists/RonaTas-2011-431.tsv Outdated Show resolved Hide resolved
concepticondata/conceptlists/RonaTas-2011-431.tsv Outdated Show resolved Hide resolved
concepticondata/conceptlists/RonaTas-2011-431.tsv Outdated Show resolved Hide resolved
concepticondata/conceptlists/RonaTas-2011-431.tsv Outdated Show resolved Hide resolved
concepticondata/conceptlists/RonaTas-2011-431.tsv Outdated Show resolved Hide resolved
concepticondata/conceptlists/RonaTas-2011-431.tsv Outdated Show resolved Hide resolved
concepticondata/conceptlists/RonaTas-2011-431.tsv Outdated Show resolved Hide resolved
removed 1972 KID once and replaced it once with YOUNG GOAT (KID)
@AnnikaTjuka
Copy link
Collaborator

Many thanks for the review, @Tarotis! I left some comments. @martino-vic In some cases, it would be helpful if you could give us a bit more detail about the list to disambiguate the mappings.

@AnnikaTjuka
Copy link
Collaborator

@martino-vic Could you please run the concepticon test again? The checks seem to fail again.

Changed the description: Removed "Cuman", since this was resolved in LoanpyDataHub/ronataswestoldturkic#13 and mentioned the author of the original source, for clarity.
@martino-vic
Copy link
Contributor Author

Had ERROR:: conceptlist missing in conceptlists.tsv: ignore.tsv. I had placed the ignore.tsv (#1240 (comment)) into the wrong folder. Fixed this in dc9e153. Now the tests should pass, since I get:

testing 410 concept lists
INFO all integrity tests passed: OK

@LinguList
Copy link
Contributor

LinguList commented Oct 13, 2022 via email

@AnnikaTjuka
Copy link
Collaborator

@Tarotis Could you check that all the changes you requested have been implemented and approve the PR if so? I'll do a final check afterward.

Copy link
Collaborator

@FredericBlum FredericBlum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a small amount of comments, primarily with questions and some comments on the bib-file :)

concepticondata/conceptlists.tsv Outdated Show resolved Hide resolved
concepticondata/conceptlists/RonaTas-2011-431.tsv Outdated Show resolved Hide resolved
concepticondata/conceptlists/RonaTas-2011-431.tsv Outdated Show resolved Hide resolved
concepticondata/references/references.bib Outdated Show resolved Hide resolved
concepticondata/references/references.bib Show resolved Hide resolved
@martino-vic
Copy link
Contributor Author

Thank you for the helpful comments @Tarotis. I implemented the recommendations during the last PR. The tests seem to be passing too.

Copy link
Collaborator

@FredericBlum FredericBlum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the work put into this, green light from my side now :)

@martino-vic martino-vic requested review from mathildavz and AnnikaTjuka and removed request for mathildavz and AnnikaTjuka October 14, 2022 11:08
Copy link
Collaborator

@AnnikaTjuka AnnikaTjuka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience and for resolving all comments, @martino-vic! Feel free to merge now. And congratulations on submitting your first list!

@LinguList LinguList merged commit 423a4c1 into concepticon:master Oct 14, 2022
@LinguList
Copy link
Contributor

I just merged, so @martino-vic can concentrate on the blogpost, which is important for the dissertation ;-) Thanks to all of you for the patience, great help, and enthusiasm to discuss so many details!

@martino-vic
Copy link
Contributor Author

Thanks everybody for the great cooperation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

West Old Turkic
6 participants