Example database configurations for i18n of dynamic user data.
For each example this document contains
- Description
- Discussion of the Pros & Cons of this choice
- When you would be likely to want to use it.
- A diagram of the database schema
- A diagram with some sample data
- Commands to run in order to install and populate the sample database
For each example there is a sub-directory which contains SQL scripts which
- Create of a sample schema
- Populate this schema with some example data
- A separate table for each translated field. - This is the way to go for most frameworks and ORMs
- A single table which is used for all translations. - More complicated, your code has to do more work
We are talking here about dynamic user data stored in the database not static presentation data that would normally be controlled by the creation of text file translations which would then be loaded and switched in and out by the application framework or some i18n plugin. i.e. The functionality available in Struts, Struts2, Rails or Zend. In some cases framework based solutions exist for this dynamic data but by no means is this available for all languages or frameworks.
In both these examples if a reduction in the number of joins being made to retrieve the values of the multi-lingual columns is desired then the locales table could be removed and the string code for each locale duplicated in the translation table. This is not recommended, however, because it does not remove the need for some type of administration of available locales and results in the need for related data to be stored in at least two places (the database, in a config file on disk or in a constants class).
Separate tables which are created and used for each field that is to be translated. This is the route that most frameworks take. In general this is the way that I would always tend to do it because it fits best with the way that most ORMs (Object Relational Mapper) work. You simply define a one to many relationship from the main (entity) table to the table that represents each translated field, the objects that represent these entities would then contain, for each translated field, a structure of the form:{ "es" => "Hola", "en" => "Hello", "eu" => "Kaixo", "fr" => "Bonjour" }
At the database level it's vital that you use some standard naming conventions so that it's easy to see from their names which tables are related. In the example schema given here for example it would be better to start the names of the two translation tables with the name of the "main" table, in this case Person.
Best fit with frameworks and ORMs.
Allows you to use the database (primary and foreign keys) to guarantee the integrity of your data, instead of having to program the logic yourself.
The fact that the translation tables all have the same structure means that, if you follow some standard naming conventions, you should be able to use the same logic to process all of the translated fields.
You can use a "on delete cascade" type relationship to guarantee that no unused translations continue to exist when an entity is deleted.
If there are lots of translated fields then you are going to have a lot of very similar tables.
Translation information is spread throughout the system so it's not easy to get a global view of the translations that exist. For example, it's not easy to create a single command that can show you which fields have not been translated to a particular language.
When you are going to use an ORM (Hibernate, Propel, Doctrine, Active Record or something similar). When the fact that the translation data will be spread across a lot of tables is not a big inconvenience. When translations are going to be edited as part of the entity represented by the separate table rather than in some sort of centralised translation tool which is used to process all the translations that exist in the system at the same time.
table_per_field_trans/table_per_field_trans.jpg
table_per_field_trans/example_data.jpg
./run localhost table_per_field_trans
.\run.ps1 localhost table_per_field_trans
An improvement to this design would be to add a translation_group table between the table whose data is being translated and the translation table (which contains the actual translations). As you will be able to see from the diagram (example_data_trans_group.jpg) this would enable us to use the primary key of the translation_group table in both the action column of the table Person (the translated column) and the field (we could rename this column to transaction_group) column of the translation table. The database could then be used to validate the integrity of these relationships avoiding the need to manually manage the id of each group of fields. In both examples we could use a unique index on translation.field and translation.id_locale to make sure that a field could only be translated once per locale.
This design allows us to minimise duplication of the translations that are present in the system. Since they are all located in the same table, the code that is responsable for managing them can optimise for words or phrases that already exist.
This is a more "explicit" solution in that it is clear from looking at the table that has multi-language columns which columns that table contains because they physically exist in the definition of that table, although a join will still be necessary to see their translations.
As there are less tables, in theory, less database maintenance is required. When a new field requires multi-lingual capabilities there is no need to add a new translation table to the system.
As you can see from the diagram the table that contains the translated fields (in this case person) must have a column for each field that is to be translated. This column is then used to link to one or more rows in the translation table (using the field column the referenced key). Since neither of these fields is a primary key we cannot enforce the integrity of relationships at the database level - we have to manage them in our code.
The translation management mechanism needs to be informed of each new field so that it can manage it.
As the design stands it would need a sequence or something similar to generate the new ids for each tranlation group.
The translation table is likely to become a bottleneck.
Care needs to be taken to allow the system to cater for more than one translation for words or phrases whose meaning changes depending on the context of their use.
When you want to have a centralised translations repository that you can use to manage all the tranlations in the system and when the way that you are persiting your data in the database has the necessary flexibility.
single_table_for_all_trans/single_table_for_all_trans.jpg
single_table_for_all_trans/example_data.jpg
./run localhost single_table_for_all_trans
.\run.ps1 localhost single_table_for_all_trans