-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Improve tests. Blackify and isortify. Optimize the code.
- Loading branch information
1 parent
a1ac470
commit 20e0341
Showing
10 changed files
with
146 additions
and
44 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
input,output,mistake |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
input,output | ||
i payed one hundred and twenty seven dollars and twenty six cents, i payed 127$ and 26¢ | ||
was your birthday on march twenty ninth, was your birthday on march 29th | ||
my phone number is nine four nine six eight two seventy fourteen, my phone number is 9496827014 | ||
i have a minus one hundred point four three balance, i have a -100.43 balance | ||
calling to place an order of three hundred thousand sixty four hundred and eighteen parts, calling to place an order of 30006418 parts | ||
my order id is seven eighteen fourteen fifteen nine eight zero, my order id is 7181415980 | ||
my date of birth is three seven fifty four, my date of birth is 3754 | ||
what is eighty percent of negative point nine four, what is 80% of -.94 | ||
seems to cost a thousand or more maybe a few hundred or so, seems to cost a 1000 or more maybe a few 100 or so | ||
double zero, 00 | ||
triple zero, 000 | ||
quadruple zero, 0000 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
import pandas as pd | ||
|
||
import itnpy.vocab as vocab | ||
|
||
__all__ = ( | ||
"get_word2number_dict", | ||
"error_message", | ||
) | ||
|
||
|
||
def get_word2number_dict(path="assets/vocab.csv"): | ||
df = vocab.get_dataframe(path) | ||
return vocab.get_word2number_dict(df) | ||
|
||
|
||
def error_message(spoken, written, output): | ||
df = [ | ||
{ | ||
"[spoken]".upper(): spoken, | ||
"[written]".upper(): written, | ||
"[output]".upper(): output, | ||
} | ||
] | ||
df = pd.DataFrame(df) | ||
df = df.set_index("[spoken]".upper()) | ||
df = df.T | ||
return "\n" + df.to_string() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
import pandas as pd | ||
import pytest | ||
|
||
import itnpy | ||
|
||
from .helpers import error_message, get_word2number_dict | ||
|
||
|
||
@pytest.mark.parametrize( | ||
"path", | ||
[ | ||
"tests/assets/vocab/passing.csv", | ||
"tests/assets/vocab/failing.csv", | ||
], | ||
) | ||
def test_vocab(path): | ||
df = pd.read_csv(path, dtype={"input": object, "output": object}) | ||
df = df.fillna("") | ||
# --- Get the vocab for converting spoken-form text into written-form text | ||
word2number = get_word2number_dict() | ||
|
||
for _, row in df.iterrows(): | ||
tokens = row["input"].strip() | ||
output = row["output"].strip() | ||
# NOTE: This can be modified depending on your needs | ||
spoken2 = itnpy.preprocess(tokens.split(), word2number) | ||
# --- Convert spoken-form tokens to written-form tokens | ||
digit = itnpy.inverse_normalize_numbers(spoken2, word2number) | ||
# --- Convert tokens to string | ||
digit = " ".join(digit) | ||
assert output == digit, error_message(" ".join(tokens), digit, output) |