Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to change phonology/morphology on SignBank #1219

Open
rosestamp opened this issue Apr 13, 2024 · 28 comments
Open

how to change phonology/morphology on SignBank #1219

rosestamp opened this issue Apr 13, 2024 · 28 comments
Assignees

Comments

@rosestamp
Copy link
Collaborator

new_ISL_lemma_updates (1).csv

I am trying to upload this CSV in order to update existing entries on our ISL dataset with new morphological and phonological information.
When I try, using "Import CSV Update Existing Glosses", it states: Attempt to update Lemma translations. Use Import CSV Lemma Update instead.
When I do what it suggests using Lemma Update, it states:
The header row of the csv file looks like this: Handedness, Strong hand, Weak hand, Strong hand letter, Contact type, Location, Movement direction, Movement Shape, Relation between Articulators, Handshape Change, Repeated movement, Alternating movement, 42719, ISL, DUCK, 2s, B, no, , neutral space, to and fro, straight, next to, Yes, yes, 42687, DEER, W, beak2_open_spread, initial, forehead, forwards, arc, No

I do not understand what is wrong with my file.
Please let me know. Thanks!

@susanodd
Copy link
Collaborator

susanodd commented Apr 15, 2024

@rosestamp the easiest is to remove the columns you do not need to update from the CSV file.
Update only needs the Signbank ID and Dataset columns and the specific columns you are updating.

Here is the file with the headers fixed and that lemma column removed.

new_ISL_lemma_updates.1e.csv

However, it still has capitalisation problems on some of the fields. (You will see this if you "import csv update existing gloss" on the updated file.)

May I ask, how did you enter the data? I haven't seen this before without caps. Did the spreadsheet program do this?

[I will see if I can revise the code to accept the choice field values also if the first letter is not a cap.
It is easy to fix but a bit annoying.]

@susanodd
Copy link
Collaborator

susanodd commented Apr 16, 2024

IMPLEMENTATION CODE COMMENTS

I found it!! iexact

I'm wondering why we didn't use this from the start? (Although I don't know if there is actually e.g., a difference in b versus B in the field choices?
A test will be needed on existing field choices to detect multiple objects found in case case is relevant in the choices.

BACKGROUND
@Woseseltops @vanlummelhuizen any ideas here? This is the first time this comes up. This is about the case of the field choices. If they don't match the case in the database exactly, the retrieval does not match. It looks like others also have this problem, as somebody has implemented a new kind of Django field for this:

https://github.com/iamoracle/django_case_insensitive_field

The Tags model (prefab) is case sensitive as well. If you create new tags that differ in case, they are different tags.
In this issue, the case of the field choice values does not match those in the database because they don't start with a cap or differ in spacing.
This is only a problem with "user" input of the values. All of the Signbank templates use Model Form choices. This will potentially be a problem with the API as well. The API Gloss Update error code needs to report the mis-matched values in case the case differs or the case is relevant and multiple matches are found.

@susanodd susanodd self-assigned this Apr 16, 2024
@susanodd
Copy link
Collaborator

susanodd commented Apr 16, 2024

Hmmmm. The iexact will need to be on a model translation multilingual name field for the FieldChoice model.

@susanodd
Copy link
Collaborator

I revised the code to use iexact.
There are much fewer field choices that don't match.
But this one needs to be changed:

@rosestamp:
For DUCK (42719), could not find option next to for Relation between Articulators
(There are about 800 rows with this error.)

There is a field "Next-to" as a choice. It requires the hyphen. (You can use next-to now, without the cap. But the code is not yet live.)

I'll put the revision up asap.

@susanodd
Copy link
Collaborator

susanodd commented Apr 16, 2024

@rosestamp there are also rows that update the same gloss. There should only be one row per gloss ID. (This is to prevent problems with conflicting updates in different rows.) You can sort the spreadsheet by Signbank ID to detect these.

@susanodd
Copy link
Collaborator

@rosestamp here's another one:

For CORRECT (42658), could not find option downwards + contralateral for Movement Direction

This needs to be a > instead of a + to match.

You could ask @ocrasborn if your research needs this to be different.

susanodd added a commit that referenced this issue Apr 16, 2024
@uklomp
Copy link
Collaborator

uklomp commented Apr 16, 2024

@susanodd those kind of changes can come to me now :) There happens to be a difference between the > and the + categories.

@rosestamp I will change downwards + contralateral/ipsilateral (which is a weird category anyway) to downwards + contralateral.

@susanodd
Copy link
Collaborator

@susanodd those kind of changes can come to me now :) There happens to be a difference between the > and the + categories.

@rosestamp I will change downwards + contralateral/ipsilateral (which is a weird category anyway) to downwards + contralateral.

Great! I have no idea what symbols are syntax or have semantics. Thanks.
There are a few more that didn't match any field choices.

susanodd added a commit that referenced this issue Apr 17, 2024
#1219: Case insensitive CSV input field choices, handshapes, semantic…
@susanodd
Copy link
Collaborator

@rosestamp the CSV import is now case insensitive for the choice fields.
So your file will give far fewer feedback errors. Now only if no match is found, as for the examples above.

@susanodd
Copy link
Collaborator

@uklomp are there other fields where the syntax of the choice can vary?
For example, the Next-to above. If the - is only syntax and some people don't use it, I can code it so it also looks that up.
(To match Next-to as well as Next to.)
I modified the code so it's case insensitive for the choices now. What about the use of _ in the names? Could that also be used with a space instead?

@uklomp
Copy link
Collaborator

uklomp commented Apr 17, 2024

next-to and next to would be the same indeed.
the ">"and + and / are not interchangeable in most cases. For the rest, I can't think of any examples where it matters.
the underscore in names also doesnt seem very important, but which names do you mean? names of the fields?

@susanodd
Copy link
Collaborator

susanodd commented Apr 17, 2024

next-to and next to would be the same indeed. the ">"and + and / are not interchangeable in most cases. For the rest, I can't think of any examples where it matters. the underscore in names also doesnt seem very important, but which names do you mean? names of the fields?

Like in the choices for e.g., Strong Hand:

1_curved
Baby_beak

...

Do researchers use any other notation for the _ ?

@uklomp
Copy link
Collaborator

uklomp commented Apr 18, 2024 via email

@rosestamp
Copy link
Collaborator Author

rosestamp commented Apr 22, 2024

Thank you. I updated 'next to' to 'next-to'. I understood that the other changes were made but maybe I missed something? When I enter it now, there are still multiple errors saying that the options for example "fingertips" for "location" are not found. Is there a more general problem I am missing?
For the duplicate rows, I removed them.
Here is my file and here is the screenshot when I tried to upload it:
new_ISL_lemma_updates.1e.csv

Screenshot 2024-04-22 at 12 14 33

@susanodd
Copy link
Collaborator

susanodd commented Apr 22, 2024

If you look at the page for uploading, there is a scroll bar where it shows a pull-down list of choices for each field.
https://signbank.cls.ru.nl/signs/import_csv_update/
If you see that the syntax is different (as for the example above with the plus sign), then @uklomp can change that or add a choice.

If the choices are in the pull-down, then it could be something with extra spaces or no spaces around the symbols? (I will check this.)

If there are more than one that match, then that needs to be corrected in the system. (The names should be unique. But it could be that we didn't notice there are duplicates.)

If none of those are the case, then there is something going on with the query search. (That would be a bug. There are choices where some are prefixes of others. So it could be that a prefix matches or something and it returns multiple instead of a unique result. It needs to obtain a unique choice.)

The example choice lists are not sorted alphabetically, so this is also not good. (I'll fix that.)

@susanodd
Copy link
Collaborator

susanodd commented Apr 22, 2024

@rosestamp @uklomp
there is no choice 2n for Handedness.
there is no choice U for Handshape (Strong Hand and Weak Hand)
there is no choice X for Handshape, but there is choice X for Handedness

It's Location
Weak hand: finger tips (not fingertips)
It's Weak Hand
C2_closed (not C2-closed)
Weak Hand
1_curved(not 1-curved)

@susanodd
Copy link
Collaborator

susanodd commented Apr 22, 2024

@rosestamp another place you can see the existing choices for fields is on the Analysis > Frequencies page.
They are sorted there.

It's Movement Direction
Ipsilateral + up and down (not >) (gloss Spicy)

susanodd added a commit that referenced this issue Apr 22, 2024
@rosestamp
Copy link
Collaborator Author

Can i just ask if spaces matter between words like 'upwards' and > or + etc?
if yes, can this be changed?

@susanodd
Copy link
Collaborator

Can i just ask if spaces matter between words like 'upwards' and > or + etc?
if yes, can this be changed?

It's because that's how they were defined when created by @ocrasborn.

I shall add some additional parsing to allow them without spaces. (There can only be one in the list of choices in the interface, in order to allow searching. So internally they will be mapped -- after parsing away/adding back the spaces for the particular operations + and > -- to the internal representation.) I can see it's quite annoying as it is.

Is this also the case for the _ that you also use a - for your research? I'm guessing you write them for publication in a certain way.

If you use a different interface language, you can also check what the translations look like for the field choices, to see if any of those are written differently in practice. (I can only read the English and Dutch.)

At the moment, the CSV uses English for the values.
The API interface allows other languages now.

If you need operators themselves (the + and >) to be modified, @uklomp can do that.
I'll do the spaces.

@susanodd
Copy link
Collaborator

@rosestamp I modified the code locally to also try to match the "+" and ">" with differing space.

But for these, the feedback about not matching, they really don't match. (Some don't exist. Some have a + instead of a > or vice verse.) Can you browse these and see if you need extra choices? Like e.g., U or 2n ? @uklomp can accommodate or discuss.

Import CSV Update Existing Glosses

For RABBIT (43043), could not find option U for Weak Hand

For BANANA1 (42539), could not find option motivated for Movement Shape

For PASTA (42997), could not find option U for Weak Hand

For GRAPE (42799), could not find option motivated for Movement Shape

For WINDOW (43288), could not find option motivated for Movement Shape

For BABY-CRIB (42530), could not find option ipsilateral > backwards for Movement Direction

For BIN (42559), could not find option upwards + forwards for Movement Direction

For SOAP (43136), could not find option proximal > distal for Movement Direction

For BASKET (42542), could not find option 2n for Handedness

For FLOWER (42769), could not find option upwards + forwards for Movement Direction

For NEIGHBOUR (42964), could not find option U for Weak Hand

For SLEEP (43127), could not find option U for Weak Hand

For WIPE (43291), could not find option proximal > distal for Movement Direction

For RIDE-ANIMAL (43069), could not find option U for Weak Hand

For LATER1 (42889), could not find option U for Weak Hand

For NEAR (42961), could not find option U for Weak Hand

For FIRE (42764), could not find option upwards/downwards for Movement Direction

For ANSWER (42514), could not find option U for Weak Hand

For SHOUT (43115), could not find option Ipsilateral + forwards for Movement Direction

For INSULTED (42858), could not find option upwards/downwards for Movement Direction

For IRON (42863), could not find option U for Weak Hand

For RETURN (44237), could not find option U for Weak Hand

For SPICY (43150), could not find option ipsilateral > up and down for Movement Direction

For DEODORANT (42689), could not find option ipsilateral and contralateral/downwards for Movement Direction

For EYEBROWS (42746), could not find option motivated for Movement Shape

For LECTURER (42894), could not find option unsure for Location

For SEWING-PIN (43101), could not find option backwards > upwards for Movement Direction

For WIG (43286), could not find option unsure for Location

For JUNE (42872), could not find option 2n for Handedness

For AUGUST (42524), could not find option 2n for Handedness

For SEPTEMBER (43099), could not find option 2n for Handedness

For SIP (43122), could not find option unsure for Location

For IMPOSSIBLE (42851), could not find option U for Weak Hand

For COME-CLOSER (42648), could not find option U for Weak Hand

For WRING (43296), could not find option forwards/backwards for Movement Direction

For IMPORTANT-NOT (42850), could not find option U for Weak Hand

For RUMOUR (43077), could not find option U for Weak Hand

For CHEERS (42621), could not find option contralateral > upwards for Movement Direction

For CHAIRPERSON (42615), could not find option unsure for Location

For REPRESENTATIVE (43062), could not find option 2n for Handedness

For DEPENDENT (42690), could not find option X for Weak Hand

For DIACRITICS (42693), could not find option motivated for Movement Shape

@susanodd
Copy link
Collaborator

FYI

input:       mouth>weak hand
normalised:  mouth > weak hand
input:       mouth>weak hand
normalised:  mouth > weak hand
input:       eye>neutral space
normalised:  eye > neutral space
input:       Chin>neutral space
normalised:  Chin > neutral space
input:       mouth>weak hand
normalised:  mouth > weak hand
input:       forehead>neutral space
normalised:  forehead > neutral space

@uklomp
Copy link
Collaborator

uklomp commented May 2, 2024

Hi @rosestamp. I can change or add options to the drop-down menus, but I'd like to do that only in cases where it is necessary, and not e.g. a mismatch with the available options. To go through the errors:

  • The type of '2n' for two-handed signs, does not exist anymore. I believe we emailed about this previously. I feel it would be better to adapt to the now existing types for comparison between data sets. Do you agree?
  • Also, I'm not sure what handshape 'U' looks like, but could this be the 'N' in Signbank? If not, please let me know.
  • I'm surprised that 'motivated' for movement shape did not work. This should work, actually. @susanodd could you take another look at this one?
  • 'unsure' in location does not work indeed. I would suggest to pick a location, or leave the field empty.
  • 'X' for weak hand --> is this also a handshape or was something else meant here ?
  • I will go through the options where multiple movements are combined with + or > or / and see if they are available. If not, I will add them.

@rosestamp
Copy link
Collaborator Author

Thank you, I managed to solve all of the errors now so thank you for your help and for solving these issues.

@uklomp
Copy link
Collaborator

uklomp commented May 2, 2024

so, just to clarify, do I still need to look into the fields with > and + etc or did you find these as well?
And should Susan still look into the 'motivated' form ?

@rosestamp
Copy link
Collaborator Author

sorry, i didn't manage to keep up with all of the questions...what is the question about > and +? motivated form?

@susanodd
Copy link
Collaborator

susanodd commented May 3, 2024

For this one,

For EYEBROWS (42746), could not find option motivated for Movement Shape

It should be "Motivated shape"

(You can see the choices in the Import CSV update example pull-downs. Those are computed dynamically when you view the page.)

@uklomp
Copy link
Collaborator

uklomp commented May 7, 2024

sorry, i didn't manage to keep up with all of the questions...what is the question about > and +? motivated form?

See my last message with the bullet point list. I went through all the errors and described if we needed to do something about it, or if you needed change the input in the fields. Then you said you managed to solve everything, and my question is if this means I don't need to check things like 'backwards > upwards' for movement direction anymore.

@rosestamp
Copy link
Collaborator Author

Thanks! so i think it's all resolved. Yes 'motivated' should have been 'motivated shape'. and yes, sometimes some < + combinations don't exist and if they don't, I guess they do need to be added. they are not interchangable. but it's possible that the combinations doesn't appear in NGT but does in ISL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants