Parse number feature #9

arnavkapoor · 2020-06-18T12:20:56Z

Added the feature 'parse_number' #7 that parses a single number.

noviluni

Good job @arnavkapoor! That was quick!

I added some comments to improve the code, let me know if you need to clarify something.

Don't worry too much if you see a lot of comments. These early stages are really important to set a good codebase, as it will have a really big impact on the future of this library.

On the other side, I would ask you to add a simple one-line docstring in each function explaining what it does. It's not necessary to add the arguments, etc. just the description. In that way, it will be easier to understand the code and we would find better names for some functions (like check_validity, which I think that it's not a good name). If you see that any of these functions do a lot of things you can separate part of the logic to another function (even if it's not reused) to improve readability (single responsibility).

And again, good job! I think we really have a good PoC! Keep in that way! 🚀

noviluni · 2020-06-18T17:02:52Z

number_parser/parser.py

-            if previous_token in MTENS:
-                return False
+        if previous_token in MTENS:
+            return False


There are some unnecessarily nested ifs. It would be better to rewrite them with an and to avoid too many levels:

Example:

if current_token in HUNDRED and previous_token in MTENS: return False

noviluni · 2020-06-18T17:07:28Z

number_parser/parser.py

@@ -75,8 +72,9 @@ def number_builder(token_list):
    value_list = []

    for each_token in token_list:


I would rename each_token to simply token.

noviluni · 2020-06-18T17:14:27Z

number_parser/parser.py

-        # Basically implying beginning of a new number hence resetting values.
+        if each_token.isspace():
+            continue
+        valid = check_validity(each_token, previous_token)
        if not valid:
            total_value += current_grp_value
            value_list.append(str(total_value))


(This comment refers to the code below, but I can't attach a comment there)

The token can't be (or shouldn't be) in more than one type (BASE_NUMBERS, MTENS, HUNDRED, MULTIPLIERS) at the same time, so I would change the if to elif to avoid doing additional unnecessary checkings.

Example:

if each_token in BASE_NUMBERS: ... elif each_token in MTENS: ... elif each_token in HUNDRED: ... elif each_token in MULTIPLIERS: ...

noviluni · 2020-06-18T17:25:34Z

number_parser/parser.py

-def parse(input_string):
-    # Fails when two numbers have no SENTENCE_SEPERATORS or words between them
-    # eg) 'one two three' doesn't work but 'one,two,three' and 'one apple , two mango , three' works.
+def tokeniser(input_string):


Conceptually, it's not a class/object but a function (action) so it should be renamed to tokenise. On the other hand, dateparser and other libraries as nltk use the "tokenize/tokenizer" spelling, so I would change this function name to tokenize().

Sure, make sense Will surely think a bit now before naming variables, I didn't use to give too much thought to it. Updated

noviluni · 2020-06-18T17:49:33Z

number_parser/parser.py

+
+def parse_number(input_string):
+    if input_string.isnumeric():
+        return int(input_string)


I think that this approach is not good enough, as it doesn't work with float numbers.

Please, add these examples to the tests and fix it:

>>> parse_number("1,000,000") 1000000 >>> parse_number("4.3") 4.3 >>> parse_number("4") 4 >>> parse_number("1,000,000.25") 1000000.25

Something like this could work:

input_string = input_string.replace(',', '') if input_string.replace('.', '').isnumeric(): try: return int(input_string) except ValueError: return float(input_string)

But it will fail if we add multiple points (example: parse_number("10.00.25")). Maybe you find a better approach. We should probably add a setting or option to change the , and . behavior, as in some locales the commas are used as a decimal separator.

>>> parse_number("1.000,25", decimal_separator=',') 1000000.25

But we don't need to spend time on this right now. You can open a new issue to handle this in the future.

Yes, I did realize that it won't work with float numbers. But was planning to fix it in a later iteration once the support for decimals is added. The main parse function would need to handle the 'point' keyword and the number_builder logic would also change. Even more, modifications would be needed for negative numbers as ' - ' (minus sign) would need to be treated separately. I guess it's better to raise a separate issue for decimal support as you suggested.

You are right, open a new ticket and it will be fixed in a future iteration. If you can, please, link to this comments (URL: https://github.com//pull/9#discussion_r442400344).

noviluni · 2020-06-18T17:53:07Z

number_parser/parser.py


+    all_tokens = tokeniser(input_string)
+    for index, each_token in enumerate(all_tokens):


I would rename all_tokens to tokens and each_token to token.

noviluni

I think that we will rewrite most of these functions when implementing the data from CLDR, but for now, I think it works and looks pretty well.

Good job @arnavkapoor! 💪

arnavkapoor added 5 commits June 18, 2020 15:29

removing redundant comments

22fcdfa

added parse_number functionality

f2f7e95

added tests for parse_number

cb42d22

removing redundant prints

70d309c

removing redundant if condition

7455938

arnavkapoor mentioned this pull request Jun 18, 2020

Add a parse_number() function #7

Closed

noviluni suggested changes Jun 18, 2020

View reviewed changes

arnavkapoor added 2 commits June 20, 2020 02:59

refactoring code for readability / efficiency

1ac4ad1

remove comments , add docstring for main parse fn

7caaf02

arnavkapoor mentioned this pull request Jun 19, 2020

Adding support for decimal and negative numbers #11

Open

noviluni approved these changes Jun 22, 2020

View reviewed changes

arnavkapoor merged commit a26a452 into master Jun 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse number feature #9

Parse number feature #9

arnavkapoor commented Jun 18, 2020

noviluni left a comment

noviluni Jun 18, 2020

arnavkapoor Jun 19, 2020

noviluni Jun 18, 2020

arnavkapoor Jun 19, 2020

noviluni Jun 18, 2020

arnavkapoor Jun 19, 2020

noviluni Jun 18, 2020

arnavkapoor Jun 19, 2020

noviluni Jun 18, 2020

arnavkapoor Jun 19, 2020 •

edited

Loading

noviluni Jun 19, 2020

noviluni Jun 18, 2020

arnavkapoor Jun 19, 2020

noviluni left a comment

		@@ -75,8 +72,9 @@ def number_builder(token_list):
		value_list = []

		for each_token in token_list:


		all_tokens = tokeniser(input_string)
		for index, each_token in enumerate(all_tokens):

Parse number feature #9

Parse number feature #9

Conversation

arnavkapoor commented Jun 18, 2020

noviluni left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arnavkapoor Jun 19, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

noviluni left a comment

Choose a reason for hiding this comment

arnavkapoor Jun 19, 2020 •

edited

Loading