## Introduction
- Processing Numbers in Text Files

This notebook processes text files to extract numeric sequences and convert them into their word equivalents. 

*Assumptions:*
1. Numeric sequences should be standalone numbers (e.g., "123").
2. Invalid numbers include:
   - Numbers with commas, spaces, or invalid separators (e.g., "1,234", "12.34").
3. The program relies on a regular expression (regex) to identify valid numbers.

*Functionality:*
2. Convert numbers to words using the `number_to_words` function.
3. Process input files with `process_input_file` to extract and validate numbers.

## Imports and Environment Setup

In [23]:
# Import required libraries
import re  # For regular expressions
import sys  # For system information and module management
import os  # For directory management. The `os` module is used to interact with the file system.

# Confirm the module is loaded
print(dir(re))  # Display list of attributes and methods in the 're' module
print(dir(sys))  # Display list of attributes and methods in the 'sys' module

# Display documentation for modules
print(re.__doc__)  # Documentation for the `re` module
print(sys.__doc__)  # Documentation for the `sys` module

#set directory
os.chdir("/Users/paige_macmillan/Documents/Ninety One/Interview Process/Technical Assessment")

['A', 'ASCII', 'DEBUG', 'DOTALL', 'I', 'IGNORECASE', 'L', 'LOCALE', 'M', 'MULTILINE', 'Match', 'NOFLAG', 'Pattern', 'RegexFlag', 'S', 'Scanner', 'T', 'TEMPLATE', 'U', 'UNICODE', 'VERBOSE', 'X', '_MAXCACHE', '_MAXCACHE2', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', '_cache', '_cache2', '_casefix', '_compile', '_compile_template', '_compiler', '_constants', '_parser', '_pickle', '_special_chars_map', '_sre', 'compile', 'copyreg', 'enum', 'error', 'escape', 'findall', 'finditer', 'fullmatch', 'functools', 'match', 'purge', 'search', 'split', 'sub', 'subn', 'template']
['__breakpointhook__', '__displayhook__', '__doc__', '__excepthook__', '__interactivehook__', '__loader__', '__name__', '__package__', '__spec__', '__stderr__', '__stdin__', '__stdout__', '__unraisablehook__', '_base_executable', '_clear_type_cache', '_current_exceptions', '_current_frames', '_debugmallocstats', '_framework', 

## Core Functionality: Number-to-Words Conversion

**Number-to-Words Conversion Function**
    This function converts an integer into its word equivalent. It handles numbers from zero to billions.

In [16]:
#Define Number to Words conversion function
def number_to_words(num):
    """
    Converts an integer to its word equivalent.

    Parameters:
    num (int): The integer to be converted.

    Returns:
    str: The word equivalent of the integer.
    """
    if num == 0:
        return "zero"

    def ones(n): # Base case for single digits
        return ["", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine"][n]

    def teens(n):
        return ["", "eleven", "twelve", "thirteen", "fourteen", "fifteen", "sixteen",
                "seventeen", "eighteen", "nineteen"][n-10]

    def tens(n):
        return ["", "", "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety"][n]

    def helper(n): #this is the main function 
        if n < 10:
            return ones(n)
        elif 10 < n < 20:
            return teens(n)
        elif n < 100:
            return tens(n // 10) + ('' if n % 10 == 0 else '-' + ones(n % 10))
        elif n < 1000:
            return ones(n // 100) + " hundred" + ('' if n % 100 == 0 else ' and ' + helper(n % 100))
        else:
            for idx, word in enumerate(["thousand", "million", "billion"], 1):
                if n < 1000 ** (idx + 1):
                    return helper(n // (1000 ** idx)) + f" {word}" + \
                           ('' if n % (1000 ** idx) == 0 else ', ' + helper(n % (1000 ** idx)))

    return helper(num)

**number_to_words function**
1. Nested Helper Functions
   - ones(n),teens(n),tens(n)

2. The main function
   - helper(n)
   - helper() function simplifies handling larger numbers by isolating logic for specific number ranges.
   - the helper function here, in number_to_words, uses recursion to handle numbers in the thousands, millions, and billions
   - efficiency: function calls itself to solve smaller instances of the same problem without duplicating code
   - Instead of writing separate code for each range (e.g., thousands, millions, billions), recursion allows the function to “break down” a number into smaller chunks and process each part systematically.
   - easier to maintain

## Core Functionality: Processing Input Files

**Processing Input Files**
This function reads a text file line by line, extracts numeric sequences, validates them, and converts them into words.

In [21]:
#SOLUTION E**
#define function named process_input_file
def process_input_file(input_file):  #function takes a single parameter, input_file
    """
    This function processes the 
    contents of a text file to extract
    and convert numbers into words

    Parameters:
    input_file (str): The path to the text file to be processed.

    Returns:
    None: Prints the word equivalent of valid numbers or 'number invalid' for invalid sequences. 
    """
    with open(input_file, 'r') as lines: #open the input_file in read mode ('r') and assign the file object to 'lines'
        for line in lines: #iterate through each line in the file object, 'lines' for processing
            # Extract numeric sequences in the line matching the regex pattern 
            matches = re.findall(r'(?<![^\s])\d+(?![^\s.]|\s?\d|\s?,|\.(?!\s|$))', line) #find all occurences of the regex pattern in the string (line)
                                                                                        #Stores all matches in a list matches.
            
            # Check if valid numbers are found
            if matches: #Checks if any numeric sequences were found in the current line.
                for match in matches: #Iterates through the matches identified in the current line
                    #for each match, try convert it to an integer
                    try: 
                        num = int(match)  # Convert valid match to an integer
                        print(number_to_words(num)) #if conversion to a valid integer is successful, converts the integer to words using number_to_words() and prints the result.
                    except ValueError: # if conversion unsuccessful, print "number invalid"
                        print("number invalid")
            else:
               # If no matches are found prints "number invalid".
                print("number invalid")

## Notes:
- Regex to ensure only valid standalone numbers are matched.
- The `try-except` block ensures invalid matches do not crash the program.

## Execution or 'Running the App"

__2 parts:__
   1.  Test that the input_file exists
   2.  Run the Application (i.e. processes the input file if found)


In [27]:
#This section tests if the input file exists and processes it if found.

#Verifies the input file's existence before proceeding to avoid runtime errors.
if os.path.exists("test_file_3.txt"):
    print("File found!")
else:
    print("File not found. Check the file path.")

File found!


In [29]:
# Run the main function
if __name__ == "__main__":
    # Specify the name of your input file in the current working directory
    input_file = "test_file_3.txt"  
    process_input_file(input_file)

four hundred
number invalid
number invalid
number invalid
seven hundred and eighty-nine
nine thousand, eight hundred and seventy-six
number invalid
five hundred and sixty-seven
one hundred
one hundred and twenty-three
number invalid
number invalid
number invalid
forty
number invalid
two hundred and thirty-four thousand, five hundred and sixty-seven
number invalid
number invalid
number invalid
number invalid


## Appendix 

Instructions necessary to run the Application:

**Optional Tests**

In [103]:
# Test `re` by using a regex function
#result = re.match(r"hello", "hello world")
#print(result)  # Should print a match object if successful

<re.Match object; span=(0, 5), match='hello'>
