Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternate script missing for Chinese sentences #1784

Closed
trang opened this Issue Feb 16, 2019 · 6 comments

Comments

Projects
None yet
2 participants
@trang
Copy link
Member

trang commented Feb 16, 2019

Reported by sharris123 on the Wall:

Also, it would be really helpful if simplified and traditional Chinese could be somehow separated. A lot of otherwise simple sentences take me a while to translate because I have to change it to simplified chinese on google translate first.

My reply:

Actually we used to automatically generate the sentences in the other script for Chinese.
Like this sentence for instance: https://tatoeba.org/eng/sentences/show/7776103

If the sentence is in simplified, it would have the traditional version in grey. If it was in traditional, it would have the simplified version in great.

I don't think we've ever decide to remove this feature so it must have broken at some point...

@jiru jiru added the regression label Feb 18, 2019

@jiru

This comment has been minimized.

Copy link
Member

jiru commented Feb 24, 2019

This bug only affect a part of the Chinese sentences. Chinese transcriptions stored in the transcriptions table are displayed correctly. However, for Chinese sentences without transcriptions, we used to generate their transcription on the fly, using the afterFind() method of the Transcriptable behavior. That code isn’t working any more.

This means the update to CakePHP3 uncovered a lot of Chinese sentences that were lacking a transcription. Such sentences are problematic because they cannot be found by searching from their alternative script. So I wonder if it’s a good idea to keep the previous behavior of generating transcriptions on the fly, because it has been hiding a problem for a long time. Instead, I think it’s better to find a way to make sure that transcriptions are always being added no matter what.

I don’t know why we’re lacking so many transcriptions in Chinese right now, but I know that a transcription may not be added after adding of modifying a sentence if the underlying program that generates it (sinoparserd) fails for some reason.

@trang

This comment has been minimized.

Copy link
Member Author

trang commented Feb 24, 2019

I'm also in favor of generating and saving the transcriptions upon the creation of the sentence. In addition of ensuring that sentences can be searched from their alternative script, it will simplify the logic of displaying sentences with alternate scripts.

@jiru

This comment has been minimized.

Copy link
Member

jiru commented Feb 25, 2019

I'm also in favor of generating and saving the transcriptions upon the creation of the sentence

Just to make things clear: this is already what we’re doing. But this alone is not enough, because the generation of the transcription may fail for whatever reason. For example, when the autotranscriptions are turned off in app_local.php.

@jiru jiru self-assigned this Mar 6, 2019

jiru added a commit that referenced this issue Mar 6, 2019

jiru added a commit that referenced this issue Mar 6, 2019

jiru added a commit that referenced this issue Mar 6, 2019

Fix batchOperation() used from TranscriptionsShell
Due to the migration to CakePHP 3.

Refs #1784.

jiru added a commit that referenced this issue Mar 6, 2019

Migrate transcriptions shell (script) to CakePHP3
`cake transcriptions script sentences`

Refs #1784.

jiru added a commit that referenced this issue Mar 6, 2019

Migrate transcriptions shell (script) to CakePHP3
`cake transcriptions script contributions`

Refs #1784.
@jiru

This comment has been minimized.

Copy link
Member

jiru commented Mar 6, 2019

I migrated the transcriptions shell to CakePHP 3. We just need to run the following commands to restore missing Chinese transcriptions:

bin/cake transcriptions script sentences cmn
bin/cake transcriptions autogen cmn

It took 13 minutes to run on dev.tatoeba.org; it should be similar on tatoeba.org. Just to be sure, you can run the following SQL command to check that there are no missing Chinese transcriptions any more (the result should be zero):

select count(*) from sentences s
left join transcriptions t on s.id = t.sentence_id
where s.lang = 'cmn' and t.sentence_id is null;

jiru added a commit that referenced this issue Mar 6, 2019

Remove dead code
This code used to generate transcriptions on the fly in CakePHP 2.
We don’t want that behaviour any more because it’s hiding missing
transcriptions. Such missing transcriptions are harmful because
they can’t be searched.

Refs #1784.
@jiru

This comment has been minimized.

Copy link
Member

jiru commented Mar 6, 2019

And about future transcriptions that potentially won’t be created upon sentence creation or update, I think we could just have a cronjob to regularly check for missing transcriptions and add them. We just need to add such a task to the transcriptions shell. Something like 'autogen' but without removing existing transcriptions first.

@trang trang added this to the 2019-03-09 milestone Mar 7, 2019

@trang

This comment has been minimized.

Copy link
Member Author

trang commented Mar 9, 2019

Thanks for the fix @jiru. I ran the commands and there are no more missing transcriptions for Chinese.

@trang trang closed this Mar 9, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.