-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Add ITN pt Signed-off-by: Guilherme Steinmann <guist@linse.ufsc.br> * Fix style Signed-off-by: Guilherme Steinmann <guist@linse.ufsc.br> * Fix style Signed-off-by: Guilherme Steinmann <guist@linse.ufsc.br> * Update copyright year to 2022 on ITN pt rules and tests Signed-off-by: Guilherme Steinmann <guist@linse.ufsc.br>
- Loading branch information
1 parent
1f97094
commit 2089016
Showing
87 changed files
with
3,386 additions
and
7 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
17 changes: 17 additions & 0 deletions
17
nemo_text_processing/inverse_text_normalization/pt/__init__.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
from nemo_text_processing.inverse_text_normalization.pt.taggers.tokenize_and_classify import ClassifyFst | ||
from nemo_text_processing.inverse_text_normalization.pt.verbalizers.verbalize import VerbalizeFst | ||
from nemo_text_processing.inverse_text_normalization.pt.verbalizers.verbalize_final import VerbalizeFinalFst |
5 changes: 5 additions & 0 deletions
5
nemo_text_processing/inverse_text_normalization/pt/data/currency_plural.tsv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
€ euros | ||
£ libras esterlinas | ||
US$ dólares americanos | ||
$ dólares | ||
R$ reais |
5 changes: 5 additions & 0 deletions
5
nemo_text_processing/inverse_text_normalization/pt/data/currency_singular.tsv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
€ euro | ||
£ libra esterlina | ||
US$ dólar americano | ||
$ dólar | ||
R$ real |
13 changes: 13 additions & 0 deletions
13
nemo_text_processing/inverse_text_normalization/pt/data/electronic/__init__.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. |
26 changes: 26 additions & 0 deletions
26
nemo_text_processing/inverse_text_normalization/pt/data/electronic/domain.tsv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
com | ||
es | ||
uk | ||
fr | ||
net | ||
br | ||
in | ||
ru | ||
de | ||
it | ||
edu | ||
co | ||
ar | ||
bo | ||
cl | ||
co | ||
ec | ||
fk | ||
gf | ||
fy | ||
pe | ||
py | ||
sr | ||
ve | ||
uy | ||
pt |
11 changes: 11 additions & 0 deletions
11
nemo_text_processing/inverse_text_normalization/pt/data/electronic/server_name.tsv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
gmail g mail | ||
gmail | ||
nvidia n vidia | ||
nvidia | ||
outlook | ||
hotmail | ||
yahoo | ||
aol | ||
live | ||
msn | ||
live |
6 changes: 6 additions & 0 deletions
6
nemo_text_processing/inverse_text_normalization/pt/data/electronic/symbols.tsv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
. ponto | ||
- traço | ||
- hífen | ||
_ traço baixo | ||
_ underscore | ||
/ barra |
56 changes: 56 additions & 0 deletions
56
nemo_text_processing/inverse_text_normalization/pt/data/measurements_plural.tsv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
h horas | ||
min minutos | ||
s segundos | ||
ms milissegundos | ||
ns nanossegundos | ||
μs microssegundos | ||
t toneladas | ||
kg quilos | ||
kg quilogramas | ||
g gramas | ||
mg miligramas | ||
μm micrômetros | ||
nm nanômetros | ||
mm milímetros | ||
cm centímetros | ||
cm² centímetros quadrado | ||
cm³ centímetros cúbico | ||
m metros | ||
m² metros quadrados | ||
m³ metros cúbicos | ||
km quilômetros | ||
km² quilômetros quadrados | ||
ha hectares | ||
kph quilômetros por hora | ||
mph milhas por hora | ||
m/s metros por segundo | ||
l litros | ||
ml mililitros | ||
kgf quilogramas forças | ||
kgf quilogramas força | ||
% por cento | ||
°F fahrenheit | ||
°C celsius | ||
°F graus fahrenheit | ||
°C graus celsius | ||
Hz hertz | ||
kHz quilo hertz | ||
MHz mega hertz | ||
GHz giga hertz | ||
W watts | ||
kW quilowatts | ||
MW megawatts | ||
GW gigawatts | ||
Wh watts hora | ||
kWh quilowatts hora | ||
MWh megawatts hora | ||
GWh gigawatts hora | ||
kV quilovolts | ||
V volts | ||
mV milivolts | ||
A amperes | ||
mA miliamperes | ||
rpm rotações por minuto | ||
db decibéis | ||
cal calorias | ||
kcal quilocalorias |
55 changes: 55 additions & 0 deletions
55
nemo_text_processing/inverse_text_normalization/pt/data/measurements_singular.tsv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
h hora | ||
min minuto | ||
s segundo | ||
ms milissegundo | ||
ns nanossegundo | ||
μs microssegundo | ||
t tonelada | ||
kg quilo | ||
kg quilograma | ||
g grama | ||
mg miligrama | ||
μm micrômetro | ||
nm nanômetro | ||
mm milímetro | ||
cm centímetro | ||
cm² centímetro quadrado | ||
cm³ centímetro cúbico | ||
m metro | ||
m² metro quadrado | ||
m³ metro cúbico | ||
km quilômetro | ||
km² quilômetro quadrado | ||
ha hectare | ||
kph quilômetro por hora | ||
mph milha por hora | ||
m/s metro por segundo | ||
l litro | ||
ml mililitro | ||
kgf quilograma força | ||
% por cento | ||
°F fahrenheit | ||
°C celsius | ||
°F grau fahrenheit | ||
°C grau celsius | ||
Hz hertz | ||
kHz quilo hertz | ||
MHz mega hertz | ||
GHz giga hertz | ||
W watt | ||
kW quilowatt | ||
MW megawatt | ||
GW gigawatt | ||
Wh watt hora | ||
kWh quilowatt hora | ||
MWh megawatt hora | ||
GWh gigawatt hora | ||
kV quilovolt | ||
V volt | ||
mV milivolt | ||
A ampere | ||
mA miliampere | ||
rpm rotação por minuto | ||
db decibel | ||
cal caloria | ||
kcal quilocaloria |
12 changes: 12 additions & 0 deletions
12
nemo_text_processing/inverse_text_normalization/pt/data/months.tsv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
janeiro | ||
fevereiro | ||
março | ||
abril | ||
maio | ||
junho | ||
julho | ||
agosto | ||
setembro | ||
outubro | ||
novembro | ||
dezembro |
13 changes: 13 additions & 0 deletions
13
nemo_text_processing/inverse_text_normalization/pt/data/numbers/__init__.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. |
11 changes: 11 additions & 0 deletions
11
nemo_text_processing/inverse_text_normalization/pt/data/numbers/digit.tsv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
um 1 | ||
uma 1 | ||
dois 2 | ||
duas 2 | ||
três 3 | ||
quatro 4 | ||
cinco 5 | ||
seis 6 | ||
sete 7 | ||
oito 8 | ||
nove 9 |
17 changes: 17 additions & 0 deletions
17
nemo_text_processing/inverse_text_normalization/pt/data/numbers/hundreds.tsv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
cento 1 | ||
duzentos 2 | ||
duzentas 2 | ||
trezentos 3 | ||
trezentas 3 | ||
quatrocentos 4 | ||
quatrocentas 4 | ||
quinhentos 5 | ||
quinhentas 5 | ||
seiscentos 6 | ||
seiscentas 6 | ||
setecentos 7 | ||
setecentas 7 | ||
oitocentos 8 | ||
oitocentas 8 | ||
novecentos 9 | ||
novecentas 9 |
1 change: 1 addition & 0 deletions
1
nemo_text_processing/inverse_text_normalization/pt/data/numbers/onehundred.tsv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
cem 100 |
11 changes: 11 additions & 0 deletions
11
nemo_text_processing/inverse_text_normalization/pt/data/numbers/teen.tsv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
dez 10 | ||
onze 11 | ||
doze 12 | ||
treze 13 | ||
catorze 14 | ||
quatorze 14 | ||
quinze 15 | ||
dezesseis 16 | ||
dezessete 17 | ||
dezoito 18 | ||
dezenove 19 |
8 changes: 8 additions & 0 deletions
8
nemo_text_processing/inverse_text_normalization/pt/data/numbers/ties.tsv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
vinte 2 | ||
trinta 3 | ||
quarenta 4 | ||
cinquenta 5 | ||
sessenta 6 | ||
setenta 7 | ||
oitenta 8 | ||
noventa 9 |
9 changes: 9 additions & 0 deletions
9
nemo_text_processing/inverse_text_normalization/pt/data/numbers/twenties.tsv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
vinte um 21 | ||
vinte dois 22 | ||
vinte três 23 | ||
vinte quatro 24 | ||
vinte cinco 25 | ||
vinte seis 26 | ||
vinte sete 27 | ||
vinte oito 28 | ||
vinte nove 29 |
1 change: 1 addition & 0 deletions
1
nemo_text_processing/inverse_text_normalization/pt/data/numbers/zero.tsv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
zero 0 |
13 changes: 13 additions & 0 deletions
13
nemo_text_processing/inverse_text_normalization/pt/data/ordinals/__init__.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. |
18 changes: 18 additions & 0 deletions
18
nemo_text_processing/inverse_text_normalization/pt/data/ordinals/digit.tsv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
primeiro 1 | ||
primeira 1 | ||
segundo 2 | ||
segunda 2 | ||
terceiro 3 | ||
terceira 3 | ||
quarto 4 | ||
quarta 4 | ||
quinto 5 | ||
quinta 5 | ||
sexto 6 | ||
sexta 6 | ||
sétimo 7 | ||
sétima 7 | ||
oitavo 8 | ||
oitava 8 | ||
nono 9 | ||
nona 9 |
Oops, something went wrong.