Skip to content

Commit

Permalink
Adapted performance figures
Browse files Browse the repository at this point in the history
  • Loading branch information
Nils Hammerla committed Feb 7, 2018
1 parent 222f0f5 commit 78215a3
Showing 1 changed file with 44 additions and 43 deletions.
87 changes: 44 additions & 43 deletions README.md
Expand Up @@ -57,80 +57,80 @@ First things first, let's test the translation performance from English into eve
|-----------------|--------------|--------------|---------------|
| fr | 0.73 | 0.86 | 0.88 |
| pt | 0.73 | 0.86 | 0.89 |
| es | 0.73 | 0.85 | 0.88 |
| it | 0.71 | 0.86 | 0.89 |
| nl | 0.69 | 0.83 | 0.86 |
| no | 0.68 | 0.85 | 0.88 |
| ca | 0.67 | 0.82 | 0.86 |
| da | 0.66 | 0.83 | 0.88 |
| es | 0.72 | 0.85 | 0.88 |
| it | 0.70 | 0.86 | 0.89 |
| nl | 0.68 | 0.83 | 0.86 |
| no | 0.68 | 0.85 | 0.89 |
| da | 0.66 | 0.84 | 0.88 |
| ca | 0.66 | 0.81 | 0.86 |
| sv | 0.65 | 0.82 | 0.86 |
| cs | 0.64 | 0.81 | 0.85 |
| ro | 0.63 | 0.81 | 0.85 |
| hu | 0.62 | 0.80 | 0.85 |
| pl | 0.62 | 0.79 | 0.82 |
| fi | 0.61 | 0.79 | 0.84 |
| de | 0.61 | 0.75 | 0.78 |
| ru | 0.61 | 0.77 | 0.82 |
| de | 0.62 | 0.75 | 0.78 |
| pl | 0.62 | 0.79 | 0.83 |
| hu | 0.61 | 0.80 | 0.84 |
| fi | 0.61 | 0.80 | 0.84 |
| eo | 0.61 | 0.80 | 0.85 |
| ru | 0.60 | 0.78 | 0.82 |
| gl | 0.60 | 0.77 | 0.82 |
| id | 0.58 | 0.81 | 0.86 |
| mk | 0.58 | 0.79 | 0.84 |
| bg | 0.58 | 0.77 | 0.82 |
| id | 0.58 | 0.81 | 0.86 |
| bg | 0.57 | 0.77 | 0.82 |
| ms | 0.57 | 0.81 | 0.86 |
| sh | 0.56 | 0.77 | 0.82 |
| uk | 0.56 | 0.75 | 0.79 |
| uk | 0.57 | 0.75 | 0.79 |
| sh | 0.56 | 0.77 | 0.81 |
| hr | 0.56 | 0.75 | 0.80 |
| tr | 0.56 | 0.77 | 0.81 |
| sl | 0.56 | 0.77 | 0.82 |
| hr | 0.55 | 0.75 | 0.80 |
| el | 0.55 | 0.75 | 0.80 |
| el | 0.54 | 0.75 | 0.80 |
| sk | 0.54 | 0.75 | 0.81 |
| et | 0.53 | 0.73 | 0.78 |
| sk | 0.53 | 0.75 | 0.81 |
| sr | 0.53 | 0.72 | 0.77 |
| af | 0.52 | 0.75 | 0.80 |
| lt | 0.50 | 0.72 | 0.79 |
| ar | 0.48 | 0.69 | 0.75 |
| bs | 0.48 | 0.70 | 0.76 |
| bs | 0.47 | 0.70 | 0.77 |
| lv | 0.47 | 0.68 | 0.75 |
| eu | 0.46 | 0.68 | 0.75 |
| fa | 0.45 | 0.68 | 0.75 |
| hy | 0.43 | 0.66 | 0.73 |
| be | 0.43 | 0.64 | 0.71 |
| sq | 0.42 | 0.65 | 0.71 |
| zh | 0.41 | 0.67 | 0.74 |
| sq | 0.43 | 0.65 | 0.71 |
| be | 0.43 | 0.64 | 0.70 |
| zh | 0.40 | 0.68 | 0.75 |
| ka | 0.40 | 0.63 | 0.71 |
| hi | 0.40 | 0.58 | 0.63 |
| cy | 0.39 | 0.63 | 0.71 |
| hi | 0.39 | 0.58 | 0.63 |
| az | 0.38 | 0.60 | 0.67 |
| ko | 0.37 | 0.58 | 0.66 |
| te | 0.36 | 0.56 | 0.63 |
| ko | 0.36 | 0.58 | 0.66 |
| kk | 0.35 | 0.60 | 0.68 |
| he | 0.33 | 0.45 | 0.48 |
| fy | 0.33 | 0.52 | 0.61 |
| vi | 0.31 | 0.53 | 0.61 |
| fy | 0.33 | 0.52 | 0.60 |
| vi | 0.31 | 0.53 | 0.62 |
| ta | 0.31 | 0.50 | 0.56 |
| bn | 0.30 | 0.49 | 0.56 |
| ur | 0.29 | 0.52 | 0.61 |
| is | 0.28 | 0.51 | 0.59 |
| is | 0.29 | 0.51 | 0.59 |
| tl | 0.28 | 0.51 | 0.59 |
| kn | 0.28 | 0.43 | 0.46 |
| gu | 0.25 | 0.44 | 0.51 |
| mn | 0.25 | 0.48 | 0.58 |
| mn | 0.25 | 0.49 | 0.58 |
| uz | 0.24 | 0.43 | 0.51 |
| si | 0.22 | 0.40 | 0.45 |
| ml | 0.21 | 0.35 | 0.39 |
| th | 0.21 | 0.33 | 0.38 |
| ky | 0.20 | 0.40 | 0.49 |
| mr | 0.20 | 0.37 | 0.44 |
| th | 0.20 | 0.33 | 0.38 |
| la | 0.19 | 0.34 | 0.42 |
| ja | 0.18 | 0.43 | 0.56 |
| ja | 0.18 | 0.44 | 0.56 |
| ne | 0.16 | 0.33 | 0.38 |
| pa | 0.16 | 0.32 | 0.38 |
| tg | 0.15 | 0.31 | 0.39 |
| km | 0.12 | 0.26 | 0.31 |
| my | 0.10 | 0.20 | 0.23 |
| lb | 0.10 | 0.18 | 0.21 |
| mg | 0.07 | 0.19 | 0.25 |
| ceb | 0.06 | 0.14 | 0.18 |
| tg | 0.14 | 0.31 | 0.39 |
| km | 0.12 | 0.26 | 0.30 |
| my | 0.10 | 0.19 | 0.23 |
| lb | 0.09 | 0.18 | 0.21 |
| mg | 0.07 | 0.18 | 0.25 |
| ceb | 0.06 | 0.13 | 0.18 |

As you can see, the alignment is consistently much better than random! In general, the procedure works best for other European languages like French, Portuguese and Spanish. We use 2500 word pairs, because of the 5000 words in the test dictionary, not all the words found by the Google Translate API are actually present in the fastText vocabulary.

Expand All @@ -147,23 +147,24 @@ Intriquingly, even though we only directly aligned the languages to English, som
| Language 1 | Language 2 | Inter-pair precision @1 | English-pair precision @1 |
|:----------:|:----------:|:-----------------------:|:-------------------------:|
| bs | sh | 0.88 | 0.52 |
| ru | uk | 0.84 | 0.59 |
| ru | uk | 0.84 | 0.58 |
| ca | es | 0.82 | 0.69 |
| cs | sk | 0.82 | 0.59 |
| ca | es | 0.82 | 0.70 |
| hr | sh | 0.78 | 0.56 |
| be | uk | 0.77 | 0.49 |
| be | uk | 0.77 | 0.50 |
| gl | pt | 0.76 | 0.66 |
| bs | hr | 0.74 | 0.52 |
| be | ru | 0.73 | 0.52 |
| sr | sh | 0.73 | 0.54 |
| be | ru | 0.73 | 0.51 |
| da | no | 0.73 | 0.67 |
| sr | sh | 0.73 | 0.54 |
| pt | es | 0.72 | 0.72 |
| ca | pt | 0.70 | 0.69 |
| gl | es | 0.70 | 0.66 |
| hr | sr | 0.69 | 0.54 |
| ca | gl | 0.68 | 0.63 |
| bs | sr | 0.67 | 0.50 |
| mk | sr | 0.57 | 0.55 |
| mk | sr | 0.56 | 0.55 |
| kk | ky | 0.30 | 0.28 |
| kk | uz | 0.29 | 0.29 |

All of these language pairs share very close linguistic roots. For instance the first pair above are Bosnian and Serbo-Croatian; Bosnian is a variant of Serbo-Croatian. The second pair is Russian and Ukranian; both east-slavic languages. It seems that the more similar two languages are, the more similar the geometry of their fastText vectors; leading to improved translation performance.

Expand Down

0 comments on commit 78215a3

Please sign in to comment.