Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

En language in wordlist #331

Closed
monzug opened this issue Nov 18, 2021 · 22 comments
Closed

En language in wordlist #331

monzug opened this issue Nov 18, 2021 · 22 comments

Comments

@monzug
Copy link

monzug commented Nov 18, 2021

I have never seen the en language id in wordlist before yesterday and latest FF builds. Note that this is only for Scaife site.
As soon as I install the new FF build and I do some lookup/double click, the words are saved in wordlist with en language id. Irina, do you have any idea where this en language id comes from, only in Scaife? is it the page language defined on the site? see attachment below.
as soon as I reset page language to greek or latin in Options, I do not get these errors any longer.

english

by clicking on any word, saved under en language id, I get a bunch of errors. See below

greek-data

@irina060981
Copy link
Member

irina060981 commented Nov 18, 2021

Yes, it has en as a page language.
It is easy to check - open context menu and select View Page source

image

You would see a page in non-rendered mode
image

You could see that it is defined as en =>
<html lang="en">

So it is not a bug, as a page language has priority

@monzug
Copy link
Author

monzug commented Nov 19, 2021 via email

@irina060981
Copy link
Member

No, I don't remember a number - may be you could search for it in issues list?

@monzug
Copy link
Author

monzug commented Nov 19, 2021 via email

@irina060981
Copy link
Member

Yes, I think you are quite right

@monzug
Copy link
Author

monzug commented Nov 19, 2021

an other one: alpheios-project/alignment-editor-new#328
but I think this is the one I was looking for: alpheios-project/alpheios-core#639

@monzug
Copy link
Author

monzug commented Nov 19, 2021

I do not understand why in wordlist words are saved in both en language list and latin language list. there is something not quite right here.

Screen Shot 2021-11-19 at 12 55 10 PM

Screen Shot 2021-11-19 at 2 20 24 PM

@monzug monzug assigned irina060981 and unassigned monzug Nov 19, 2021
@monzug monzug added bug and removed question labels Nov 19, 2021
@monzug
Copy link
Author

monzug commented Nov 19, 2021

also the show contexts icon is available for words in the en language list but not for same words in the lat language list. see below

show-context

@monzug monzug reopened this Nov 19, 2021
@irina060981
Copy link
Member

I described the same problem here
#330

@irina060981
Copy link
Member

Bug is fixed.
New Release is published.

@monzug
Copy link
Author

monzug commented Nov 22, 2021

I still see the same issue: words been saved in en and also in the Alpheios page language in wordlist and that the show contexts icon is available for words in the en language list only.
see example of same word repeated 3 times in wordlist: en as page language in scaife, greek as page language in alpheios, latin when I changed the page language to latin.
@irina060981, is there anything here that can be done to improve this? if English is the page language of the site but it's a language that we do not support (it's not latin or greek or persian or arabic or chinese or syriac or ge'ez), would it be possible to not show it in wordlist?

English page language
Screen Shot 2021-11-22 at 2 49 16 PM

Italian page language
Screen Shot 2021-11-22 at 3 07 35 PM

what I do not like:

  1. having both the page language list of words + the supported language list of words can cause loooong wordlists.
  2. the eng list of words had the show contexts icon but it's missing the lemma. and latin list of words has the lemma, the # of occurancies, but not show contexts.

@monzug monzug assigned irina060981 and unassigned monzug Nov 22, 2021
@irina060981
Copy link
Member

I find a small bug in language definition (hope it is last) - now it is fixed and I could not reproduced it any more.

Anyway I would describe what was the source of the problem, may be you could face with some other edge case.

There are two aspects:

  1. When an application prepare a word to get morphology dater it creates two objects: one for word information (TextSelector) and one for context information (TextQuoteSelector). Previous bug was in the case when application decides to change word language (according to Add ability to define language of the text from the character's set alpheios-core#639) , but in fact it changes it only for one object (TextSelctor) but not for the second (TextQuoteSelector) - so application first creates a worditem for one object (TextQuoteSelector) and then for the other (TextSelector). In normal case they would be merged automatically and finally we have one worditem. So my fix is - now language is updated in both objects.

  2. Each object (TextSelector) has two properties for storing language data - one is for plain language name (as is), one for formatted from supported list. (It happened historically, and out team has no time for removing such duplication, but it was in plans - refactor languageId to remove Symbols alpheios-core#6) . And there was a case when a site has defined language that is not from supported list (like en), and that's why application decides to use Page language and updated only one language value, but the second was still en. And wordlist again registered two different words - on for homonym and one for context - because they use different language properties. - So again my fix - is to update both language objects.

Now I hope I found all bugs of this issue alpheios-project/alpheios-core#639

@monzug
Copy link
Author

monzug commented Nov 23, 2021 via email

@monzug
Copy link
Author

monzug commented Nov 23, 2021

No more en language. Great. Thanks.
to retest in Chrome and Safari

@monzug
Copy link
Author

monzug commented Nov 30, 2021

Actually, I just experienced this problem in Loeb Classics. in FF/PC I used loebclassic for first time in a really long time. select a latin text, click on few words, check wordlist and I have the words saved in two different language list: la and lat. The la language list has show context link, while the lat language list has the number of occurancies and the lemmas.
see attachment
duplicate language

and the link from wordlist of any words saved in La, does not work: lexical data is loading pop-up is generated, need to kill the pop-up. @irina060981 , any idea?
mettalique

@monzug monzug reopened this Nov 30, 2021
@monzug monzug assigned irina060981 and unassigned monzug Nov 30, 2021
@monzug monzug removed the verified label Nov 30, 2021
@monzug
Copy link
Author

monzug commented Nov 30, 2021

an other example of la and lat language list

due-la-lat-languages

@irina060981
Copy link
Member

irina060981 commented Dec 1, 2021

It is because the block language is defined as "la" - shorter variant of "lat", that we used.
Will check why it is not handeled correctly

@monzug
Copy link
Author

monzug commented Dec 1, 2021

Thanks. I figured that la is short def of lat. is there a more general way to prevent similar scenarios from happening?

@irina060981
Copy link
Member

Such problems are from our using two different forms of language inside (I described it in prebious comments here)
I added normalization for language code - it would work for all similiar cases.

Will upload to releases later

@irina060981
Copy link
Member

Fixed

@monzug
Copy link
Author

monzug commented Dec 1, 2021

tested in FF/PC and Mac and Chrome/Mac. fixed.

@monzug monzug removed their assignment Dec 1, 2021
@monzug monzug removed the testcase label Dec 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants