Errors in data by Urban-2011-160 #1338

LinguList · 2023-11-10T19:44:54Z

I tried to be consistent, but a recent check showed spelling errors in my annotation of semantic changes in Urban-2011-160.
@xrotwang, we have a situation in this list that maybe warrants intervention by deciding how to handle these cases. My format is clear, it can be parsed, it does not throw errors apart from me having typos in the glosses, but we may want to model these cases in JSON to avoid problems, so if you have an idea how to handle this list, it would be very useful.

LinguList · 2023-11-10T19:50:13Z

To provide explanation, I add a column semantic_change to the data, which looks in extreme cases like this:

[3] «smoke» > «fog/mist» (11 polysemies, 3 overt markings); [7] «smoke» > «dust» (8 polysemies, 4 overt markings); [8] «smoke» > «cloud» (7 polysemies, 2 overt markings)

The first order split is by a ; , each item of this list refers to a semantic change reference of the base concept ("smoke") to the other concept:

[3] «smoke» > «fog/mist» (11 polysemies, 3 overt markings)

Parsing can be done with regex of other modes, but obviously, this representation only works if an explanation is given, and my tests showed that the concept glosses, which are a key to other items in the list (fog/mist has a separate row) fail in three cases:

{
        "mirrow": "mirror",
        "straw/hay": "straw",
        "cheeck": "cheek",
        }

While these spelling errors are easily corrected, I wonder if we can make a consistent typical network link inside a concept list, that refers to another concept and adds (arbitrary) information. Should I try JSON?

xrotwang · 2023-11-10T19:58:32Z

I'll have a look. I think this could be done with JSON and the link syntax from CLDF markdown. Johann-Mattis List ***@***.***> schrieb am Fr., 10. Nov. 2023, 20:50:

…

To provide explanation, I add a column semantic_change to the data, which looks in extreme cases like this: [3] «smoke» > «fog/mist» (11 polysemies, 3 overt markings); [7] «smoke» > «dust» (8 polysemies, 4 overt markings); [8] «smoke» > «cloud» (7 polysemies, 2 overt markings) The first order split is by a ; , each item of this list refers to a semantic change reference of the base concept ("smoke") to the other concept: [3] «smoke» > «fog/mist» (11 polysemies, 3 overt markings) Parsing can be done with regex of other modes, but obviously, this representation only works if an explanation is given, and my tests showed that the concept glosses, which are a key to other items in the list ( fog/mist has a separate row) fail in three cases: { "mirrow": "mirror", "straw/hay": "straw", "cheeck": "cheek", } While these spelling errors are easily corrected, I wonder if we can make a consistent typical network link inside a concept list, that refers to another concept and adds (arbitrary) information. Should I try JSON? — Reply to this email directly, view it on GitHub <#1338 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGUOKCVQBLFPQIH746ECVDYD2AP7AVCNFSM6AAAAAA7GUA2QCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBWGM2DQNZRGA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

LinguList · 2023-11-24T12:01:32Z

I have a concrete proposal of how to deal with this, using a new example by Winter-2022-103.
My JSON now looks like you can see below:

ID	NUMBER	ENGLISH	CONCEPTICON_ID	CONCEPTICON_GLOSS	SOURCES	TARGETS
Winter-2022-102-1	1	cloud	1489	CLOUD	[{"name": "smoke", "id": "Winter-2022-102-1", "overt_marking": 2, "polysemy": 7}, {"name": "sky", "id": "Winter-2022-102-85", "overt_marking": 2, "polysemy": 8}, {"name": "rain", "id": "Winter-2022-102-84", "overt_marking": 2, "polysemy": 4}]	[{"name": "fog/mist", "id": "Winter-2022-102-2", "overt_marking": 7, "polysemy": 24}, {"name": "day", "id": "Winter-2022-102-19", "overt_marking": 3, "polysemy": 2}, {"name": "sky", "id": "Winter-2022-102-85", "overt_marking": 11, "polysemy": 8}, {"name": "rain", "id": "Winter-2022-102-84", "overt_marking": 2, "polysemy": 4}]

LinguList · 2023-11-24T12:02:53Z

So I have links of sources and targets (we could reduce to one of them), and a source node contains the ID of the source (Concepticon-Conceptlist-Entry-ID), the name of the concept, and other properties that would be properties of the edge from source current node.

eva-dlce-zenodo · 2023-11-24T12:10:34Z

I'd like to make the function of "id" here more explicit - borrowing syntax from CLDF markdown: We could use "ValueTable#cldf-Winter-2022-102-1" as value for "id" - and maybe call the the field valueReference?

LinguList · 2023-11-24T12:12:05Z

Ah, okay, easy to do.

LinguList · 2023-11-24T12:12:34Z

I'd prepare -- when I find time -- a PR for both Urban's previous dataset and Winter-2022-102.

xrotwang · 2023-11-24T12:13:24Z

It would need to be FormTable and formReference, though. That's how we model glosses (i.e. items in concept lists) in concepticon-cldf: https://github.com/concepticon/concepticon-cldf/tree/main/cldf#table-glossescsv

xrotwang · 2023-11-24T12:21:13Z

Btw.: In the current concepticon CLDF data, we have no standard way to refer to a "Concept", i.e. the set of all glosses for the same concept in one concept list. In your example above, that wouldn't be a problem, I think, because refering to the particular gloss (i.e. the concept in a particular language) is the right thing to do. But there may be cases, where we want to refer to a concept with many glosses in the same concept list, e.g. https://concepticon.clld.org/values/Luniewska-2016-299-2

LinguList · 2023-11-24T12:49:58Z

Would the tabular representation not restrict the link anyway to a row which is a concept? I think for the Multi-Simlex-Data, we may have another version, where we'd want to link to a cell in the tabular data, which would then be not a concept, but a gloss?

xrotwang · 2023-11-24T13:13:25Z

Ah, yes, if the data is represented in tabular form that could be made explicit through the metadata. I was thinking of something intermediate - i.e. some sort of JSON with some CLDF semantics.

LinguList added question errata labels Nov 10, 2023

xrotwang mentioned this issue Nov 24, 2023

Add a ParameterNetwork component cldf/cldf#140

Closed

LinguList closed this as completed Dec 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Errors in data by Urban-2011-160 #1338

Errors in data by Urban-2011-160 #1338

LinguList commented Nov 10, 2023

LinguList commented Nov 10, 2023

xrotwang commented Nov 10, 2023 via email

LinguList commented Nov 24, 2023

LinguList commented Nov 24, 2023

eva-dlce-zenodo commented Nov 24, 2023

LinguList commented Nov 24, 2023

LinguList commented Nov 24, 2023

xrotwang commented Nov 24, 2023

xrotwang commented Nov 24, 2023

LinguList commented Nov 24, 2023

xrotwang commented Nov 24, 2023

Errors in data by Urban-2011-160 #1338

Errors in data by Urban-2011-160 #1338

Comments

LinguList commented Nov 10, 2023

LinguList commented Nov 10, 2023

xrotwang commented Nov 10, 2023 via email

LinguList commented Nov 24, 2023

LinguList commented Nov 24, 2023

eva-dlce-zenodo commented Nov 24, 2023

LinguList commented Nov 24, 2023

LinguList commented Nov 24, 2023

xrotwang commented Nov 24, 2023

xrotwang commented Nov 24, 2023

LinguList commented Nov 24, 2023

xrotwang commented Nov 24, 2023