Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concept list for chaconbaniwa #807

Closed
tresoldi opened this issue Mar 31, 2020 · 14 comments · Fixed by #904
Closed

Concept list for chaconbaniwa #807

tresoldi opened this issue Mar 31, 2020 · 14 comments · Fixed by #904
Assignees

Comments

@tresoldi
Copy link
Contributor

The Lexibank chaconbaniwa dataset is currently using a local concept list, as, when first prepared, the data was still unpublished.

The paper has been published last year (https://revistas.unal.edu.co/index.php/formayfuncion/article/view/80814), and the local concept list (https://github.com/lexibank/chaconbaniwa/blob/master/etc/concepts.csv) should now be added to Concepticon.

@chrzyki
Copy link
Contributor

chrzyki commented Mar 31, 2020

Great, thanks! I'll have a go at this.

@tresoldi
Copy link
Contributor Author

I was checking and there are some errors, I'll review everything and paste them here (or do you prefer to ping me during review?)

@chrzyki
Copy link
Contributor

chrzyki commented Mar 31, 2020

If you don't mind pasting them here that would be nice, thanks!

@tresoldi
Copy link
Contributor Author

tresoldi commented Mar 31, 2020

236_withpedro | with pedro | 1340 | WITH |  

This is not "with", but indeed "with Pedro", as clear from the forms. A single concept, but not what we have in Lexibank as a single word/morpheme.

113_homeathome | home (at home) | 1460 | IN | em casa

Same thing here, it is indeed "at/in home"

215_hometohome | home (to home) | 2754 | TOWARDS | casa (para casa

Again, all forms are "to home" and not "to/towards".

152_personcivilized | person, civilized | 2661 | PEOPLE OR PERSON | Pessoa Civilizado

This is closer to "FOREIGN PERSON" than PEOPLE OR PERSON in general, should be unlinked (also considering it is a single form)

238_worm | worm | 1219 | WORM |  

The correct Portuguese gloss for this one is "verme" (it is wrong in the source, but it is clearly an Excel artefact, coming after "vermelho")

155_throw | throw | 1413 | PLAY | jogar

Artefact from the Portuguese, it is of course THROW in English

@chrzyki
Copy link
Contributor

chrzyki commented Apr 1, 2020

Thanks, working on this right now. A couple of questions for you @tresoldi:

  • 9_porrige is unmapped and the English gloss seems to be a typo, the Portuguese gloss seems this might be 3651 MUSH (FOOD) or similar. Can you confirm this?
  • 10_sand is somewhat unrelated to the Portuguese Praia and the BEACH gloss.
  • Can 35_boychild be changed from 2391 BOY OR SON to 1366 BOY?
  • 51_devil is mapped to 1973 DEMON but it seems to better leave this unmapped?
  • 84_jungle being mapped to 420 FOREST makes me think we might want to have a JUNGLE concept? There are other candidates (jungles mapped to FOREST) in Concepticon already.
  • 93_herbgrass might need a HERB OR GRASS gloss? It is mapped to 606 GRASS right now.
  • 100_louse (piolho) is mapped to 310 HEAD LOUSE but it seems more general, i.e. 1392 LOUSE?
  • 108_hot is mapped to HOT OR WARM, could this just be mapped to 1286 HOT?
  • 152_personcivilized is mapped to 2661 PEOPLE OR PERSON but it seems to have the connotation of 'civilised person' (as opposed to 'uncivilised person'). Should this be unmapped?

Sorry if any of these remarks don't make much sense to a native speaker. :)

@chrzyki
Copy link
Contributor

chrzyki commented Apr 1, 2020

And another question:

The paper states that 220 concepts were used in the study, the list has 243 items. Do you know where the additional 23 items come from? Unfortunately, the link to the supplements download doesn't work for me.

@tresoldi
Copy link
Contributor Author

tresoldi commented Apr 1, 2020

  • 9 "mingau" is a type of porridge: https://pt.wikipedia.org/wiki/Mingau
  • On 10 "sand", "praia" is indeed beach. I am not 100% sure, but from the data it seems to be beach indeed (even though in the Amazon it might very well refer to a river beach, which I think is not the definition in Concepticon). As the list was collected in Portuguese and the authors know Concepticon, I was allowing the concept to be closer to Portuguese in all cases.
  • On "boy / son", I think it can changed to "boy" -- while the same word would be used in most situations, we can't really say they colexify.
  • On "devil", I had the same doubt... Mapping it to DEMON would make sense if it would allow us to compare different lists for the same geographic area, but I'd probably leave it unmapped.
  • For jungle, I agree with you. There is a certain distinction between "mata" and "floresta" in Portuguese, but I don't know how much it is mirrored in the questionnaire, not to mention in the languages.
  • For herb or grass, they colexify in Portuguese. Not sure what would be the best approach -- HERB OR GRASS makes sense, but do we want a new superconcept only for this case?
  • "piolho" is indeed "head louse" in Portuguese (many people can use it for crab louse as well, though... But I think in Northern Brazil it's more common to call crab lice "chato" ["(the) boring (one)"])
  • I think HOT OR WARM is better, they colexify in Portuguese
  • For the CIVILIZED PERSON, yes, it is better to unmap -- it might actually be just FOREIGNER, but the closest I found in Concepticon is WHITE MAN, which is too far semantically (and I'd prefer not to link CIVILIZED, which is already problematic as a concept, to WHITE MAN)

Feel free to ask more, I probably found lots of things obvious because I am a native speaker. 😉

@tresoldi
Copy link
Contributor Author

tresoldi commented Apr 1, 2020

On the number of items, I checked and also have no idea. We could ask Chacon...

@chrzyki
Copy link
Contributor

chrzyki commented Apr 1, 2020

Pinging @LinguList, do you maybe know about the difference in items in this concept list? I'll periodically try and get the supplementary materials from the website.

@LinguList
Copy link
Contributor

I think it is best to ask @thiagochacon.

@LinguList
Copy link
Contributor

But we can also ignore it: If the paper says there are 220 items, we have to call the list Chacon-2019-220, as we did with Matisoff-1978-200, although this list has far over 210, as it often has variant a and variant b, etc. So we list all items and mention it in the note field of conceptlists.tsv

@chrzyki
Copy link
Contributor

chrzyki commented Apr 1, 2020

Alright, I'll proceed with this then and see whether I can get the supplements for detailed information to be include in the concept list's description.

@thiagochacon
Copy link

hi guys, I am available might you have any further questions, but I totally agree with the decisions made above

@LinguList
Copy link
Contributor

LinguList commented Apr 1, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants