Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC][11.0] Translation of Imported Data (ICD-10, etc.) #136

Closed
lasley opened this issue Dec 5, 2016 · 1 comment
Closed

[RFC][11.0] Translation of Imported Data (ICD-10, etc.) #136

lasley opened this issue Dec 5, 2016 · 1 comment
Labels
enhancement question stale PR/Issue without recent activity, it'll be soon closed automatically.
Milestone

Comments

@lasley
Copy link

lasley commented Dec 5, 2016

The current plan in #134 is to outright remove the giant data files that we have in oe_medical_emr_data in favor of import systems for dynamic update of data. These XML files are more than 10 MiB each & are updated yearly, so this is essentially the only way to go about handling these processes in a forward-efficient manor.

This brings up an interesting problem that I will not be solving in that PR - translations.

Most of the code data files are available in other languages, so obtaining translations for the data itself won't be an issue. What will though is when two languages come into play for the same datasets. For example:

  • English ICD-10-CM is imported
  • French ICD-10-CM is needed

In order to import the French ICD-10 data, the English data would either need to be replaced or duplicated if using standard record creates/writes. This is because records are naive of languages.

My only idea is to convert all translatable text in imports into a unique identifier string for the field (uuid-4 or something). The import would then create/update a record of the non-translatable data + all of the identifier strings. It would then add the actual text into ir.translation so that the identifiers are translated by the system.

The disadvantage here is that translations essentially become impossible to maintain manually, although I'm not sure if this was an option anyways given the size of the data. There will also likely be data duplications due to the identifier system not taking into account word lemmas.

I think this disadvantage is negligible though and outweighed the advantage of not storing and maintaining these giant XML files in source control.

I'm wondering if anyone has some ideas or strategies for the translations that I'm not thinking of?

@lasley lasley added this to the 11.0 milestone Dec 5, 2016
@lasley lasley changed the title [RFC] Translation of Imported Data (ICD-10, etc.) [RFC][11.0] Translation of Imported Data (ICD-10, etc.) Dec 6, 2016
@github-actions
Copy link

github-actions bot commented Nov 6, 2022

There hasn't been any activity on this issue in the past 6 months, so it has been marked as stale and it will be closed automatically if no further activity occurs in the next 30 days.
If you want this issue to never become stale, please ask a PSC member to apply the "no stale" label.

@github-actions github-actions bot added the stale PR/Issue without recent activity, it'll be soon closed automatically. label Nov 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement question stale PR/Issue without recent activity, it'll be soon closed automatically.
Projects
None yet
Development

No branches or pull requests

1 participant