# Researching different Word to Digit Libraries
<b>Goal of the notebook:</b> The goal of the notebook it look into different libraries that can convert numeric words to digits. I have found two libraries that look quiet promising in this field which are the :
- wordtodigit -> https://pypi.org/project/wordtodigits/
- text_to_num -> https://pypi.org/project/text2num/

I will be testing each library on a few examples which include numerical words to see how they perform by converting them to actual digits.

# Importing Libraries

In [2]:
import wordtodigits
from text_to_num import alpha2digit

# 1. Working with Word to Digit

## 1.1. Testing the Library
Here I will be testing the Library "Word to Digits" with the example text given from the Library source to see if it functions.

In [3]:
text1 = 'A car accelerates for five point two one seconds for a distance of one hundred and ten meters.'
print(wordtodigits.convert(text1))

A car accelerates for 5.21 seconds for a distance of 110 meters.


As it can be seen above, the library can succesfully convert word numbers into actual integers in a text.

## 1.2. Lunch Form Full Context
In this section of the notebook I will be testing 4 cases of lunch exmple validation we have created to see how well the library converts the number words.

### 1.2.1. First Lunch Exmple
<b> Context</b>

"I would like to arrange a lunch on __*March the twenty fourth*__ in the main hall. I expect about __*twenty five to thirty*__ people. Lunch starts at __*twelve thirty*__. Please also bring some vegetarian sandwiches."

<b>I am expecting for the text to output the following:</b>

"I would like to arrange a lunch on __*March 24*__ in the main hall. I expect about __*25 to 30*__ people. Lunch starts at __*12:30*__. Please also bring some vegetarian sandwiches."


In [17]:
lunch_eg_1 = 'I would like to arrange a lunch on March the twenty fourth in the main hall. I expect about twenty five to thirty people. Lunch starts at twelve thirty. Please also bring some vegetarian sandwiches.'
print(wordtodigits.convert(lunch_eg_1))

I would like to arrange a lunch on March the 20 fourth in the main hall. I expect about 25 to 30 people. Lunch starts at 42. Please also bring some vegetarian sandwiches.


Lets rephrase the context to see if the libary can interpret it differently

In [16]:
lunch_eg_1 = 'I would like to arrange a lunch on March the twenty four th in the main hall. I expect about twenty five to thirty people. Lunch starts at twelve : thirty. Please also bring some vegetarian sandwiches.'
print(wordtodigits.convert(lunch_eg_1))

I would like to arrange a lunch on March the 24 th in the main hall. I expect about 25 to 30 people. Lunch starts at 12 : 30. Please also bring some vegetarian sandwiches.


As it can be seen above, the library only can work with numbers but not time or date

### 1.2.2. Lunch Example
<b>Context</b>

"Next Thursday I have an appointment with __*six*__ visitors. I'd like to offer them lunch. I have reserved the room __*one point two*__ for this. Preferably no fish sandwiches. Orange juice is always there, right?"

<b>I am expecting for the text to output the following:</b>

"Next Thursday I have an appointment with __*6*__ visitors. I'd like to offer them lunch. I have reserved the room __*1.2*__ for this. Preferably no fish sandwiches. Orange juice is always there, right?"

In [18]:
lunch_eg_2 = "Next Thursday I have an appointment with six visitors. I'd like to offer them lunch. I have reserved the room one point two for this. Preferably no fish sandwiches. Orange juice is always there, right?"
print(wordtodigits.convert(lunch_eg_2))

Next Thursday I have an appointment with 6 visitors. I'd like to offer them lunch. I have reserved the room 1.2 for this. Preferably no fish sandwiches. Orange juice is always there, right?


As we can see, the above text is numbers are converted correctly.

### 1.2.3. Lunch Example
<b>Context</b>

"For an event with <b>ninety </b>students I would like to organize a meeting with some drinks and snacks. It is scheduled for <b>March thirtieth</b> in the auditorium. The drink is at <b>four pm</b>, but the event starts at <b>three pm</b>. It may cost a maximum of <b>five hundred euros</b>, will that work?"


<b>I am expecting for the text to output the following:</b>

"For an event with <b>90</b> students I would like to organize a meeting with some drinks and snacks. It is scheduled for <b>March 30 </b>in the auditorium. The drink is at <b>16:00</b>, but the event starts at <b>15:00</b>. It may cost a maximum of <b> 500 euros</b>, will that work?"

In [20]:
lunch_eg_3 = "For an event with ninety students I would like to organize a meeting with some drinks and snacks. It is scheduled for March thirtieth in the auditorium. The drink is at four pm, but the event starts at three pm. It may cost a maximum of five hundred euros, will that work?"
print(wordtodigits.convert(lunch_eg_3))

For an event with 90 students I would like to organize a meeting with some drinks and snacks. It is scheduled for March thirtieth in the auditorium. The drink is at 4 pm, but the event starts at 3 pm. It may cost a maximum of 500 euros, will that work?


Here there are in total 5 numbers, where three of them are based on time and date which we already observed the library cannot convert but it has succesfully converted the other two word numbers to integers.

### 1.2.4. Lunch Example
<b>Context</b>

"Hello, my name is Jane Doe and I am from the Software department at Fontys Applied University. I would like to organize a lunch event with my team this Friday, __*the twenty fourth of March, two thousand and twenty three*__, from __*twelve noon until one o'clock*__ in the afternoon. The event will take place at Fontys TQ, located at Achtseweg Zuid __*one hundred fifty three, Five Thousand six hundred fifty one GW*__ Eindhoven. We have a total of __*fifteen*__ people attending, some of whom have allergies and special dietary preferences. Two of our members require a halal meal, and another requires a vegan meal. Additionally, one of our members is allergic to mushrooms, so we would like to request that mushrooms be excluded from the menu options."


<b>I am expecting for the text to output the following:</b>
    
"Hello, my name is Jane Doe and I am from the Software department at Fontys Applied University. I would like to organize a lunch event with my team this Friday, March <b>24th, 2023</b>, from <b>12:00 p.m.</b> until <b>1:00 p.m.</b> The event will take place at Fontys TQ, located at Achtseweg Zuid 153, 5651 GW Eindhoven. We have a total of <b>15</b> people attending, some of whom have allergies and special dietary preferences. Two of our members require a halal meal, and another requires a vegan meal. Additionally, one of our members is allergic to mushrooms, so we would like to request that mushrooms be excluded from the menu options."

In [29]:
lunch_eg_4 = "Hello, my name is Jane Doe and I am from the Software department at Fontys Applied University. I would like to organize a lunch event with my team this Friday, the twenty fourth of March, two thousand and twenty three, from twelve noon until one o'clock in the afternoon. The event will take place at Fontys TQ, located at Achtseweg Zuid one hundred fifty three, Five Thousand six hundred fifty one GW Eindhoven. We have a total of fifteen people attending, some of whom have allergies and special dietary preferences. Two of our members require a halal meal, and another requires a vegan meal. Additionally, one of our members is allergic to mushrooms, so we would like to request that mushrooms be excluded from the menu options."
print(wordtodigits.convert(lunch_eg_4))

Hello, my name is Jane Doe and I am from the Software department at Fontys Applied University. I would like to organize a lunch event with my team this Friday, the 20 fourth of March, 2020 three, from 12 noon until 1 o'clock in the afternoon. The event will take place at Fontys TQ, located at Achtseweg Zuid 150 three, 5651 GW Eindhoven. We have a total of 15 people attending, some of whom have allergies and special dietary preferences. 2 of our members require a halal meal, and another requires a vegan meal. Additionally, 1 of our members is allergic to mushrooms, so we would like to request that mushrooms be excluded from the menu options.


As stated the library does not work with time but we also see that it messes up by the street number: <b>three</b> is not converted to a word. This is because of the comma as we can see the below example.

In [28]:
lunch_eg_4 = "Hello, my name is Jane Doe and I am from the Software department at Fontys Applied University. I would like to organize a lunch event with my team this Friday the twenty fourth of March two thousand and twenty three from twelve noon until one o'clock in the afternoon. The event will take place at Fontys TQ located at Achtseweg Zuid one hundred fifty three Five Thousand six hundred fifty one GW Eindhoven. We have a total of fifteen people attending, some of whom have allergies and special dietary preferences. Two of our members require a halal meal, and another requires a vegan meal. Additionally, one of our members is allergic to mushrooms, so we would like to request that mushrooms be excluded from the menu options."
print(wordtodigits.convert(lunch_eg_4))

Hello, my name is Jane Doe and I am from the Software department at Fontys Applied University. I would like to organize a lunch event with my team this Friday the 20 fourth of March 2023 from 12 noon until 1 o'clock in the afternoon. The event will take place at Fontys TQ located at Achtseweg Zuid 158651 GW Eindhoven. We have a total of 15 people attending, some of whom have allergies and special dietary preferences. 2 of our members require a halal meal, and another requires a vegan meal. Additionally, 1 of our members is allergic to mushrooms, so we would like to request that mushrooms be excluded from the menu options.


As substitue, I formulated the sentece differently by adding the word <b>postcode</b>.

In [31]:
lunch_eg_4 = "Hello, my name is Jane Doe and I am from the Software department at Fontys Applied University. I would like to organize a lunch event with my team this Friday the twenty fourth of March two thousand and twenty three from twelve noon until one o'clock in the afternoon. The event will take place at Fontys TQ located at Achtseweg Zuid one hundred fifty three postcode Five Thousand six hundred fifty one GW Eindhoven. We have a total of fifteen people attending, some of whom have allergies and special dietary preferences. Two of our members require a halal meal, and another requires a vegan meal. Additionally, one of our members is allergic to mushrooms, so we would like to request that mushrooms be excluded from the menu options."
print(wordtodigits.convert(lunch_eg_4))

Hello, my name is Jane Doe and I am from the Software department at Fontys Applied University. I would like to organize a lunch event with my team this Friday the 20 fourth of March 2023 from 12 noon until 1 o'clock in the afternoon. The event will take place at Fontys TQ located at Achtseweg Zuid 153 postcode 5651 GW Eindhoven. We have a total of 15 people attending, some of whom have allergies and special dietary preferences. 2 of our members require a halal meal, and another requires a vegan meal. Additionally, 1 of our members is allergic to mushrooms, so we would like to request that mushrooms be excluded from the menu options.


And after taking out the comma's and phrasing the sentence, the context is able to convert number words to actual digits.

## 1.3. Security Form Full Context

In this section of the notebook I will be testing 1 cases of security form exmple validation we have created to see how well the library converts the number words.

### 1.3.1. Security Example

I am expecting for the text to output the following:
    
"I lost my USB stick, maybe I lost it, but I'm not sure. I used it just yesterday, I think it fell out of my coat pocket. It contains very sensitive information. The stick is protected with a password, but that is 0000."

In [None]:
security_eg_1 = "I lost my USB stick, maybe I lost it, but I'm not sure. I used it just yesterday, I think it fell out of my coat pocket. It contains very sensitive information. The stick is protected with a password, but that is zero zero zero zero."
print(wordtodigits.convert(security_eg_1))

I lost my USB stick, maybe I lost it, but I'm not sure. I used it just yesterday, I think it fell out of my coat pocket. It contains very sensitive information. The stick is protected with a password, but that is 0.


In [None]:
security_eg_1 = "I lost my USB stick, maybe I lost it, but I'm not sure. I used it just yesterday, I think it fell out of my coat pocket. It contains very sensitive information. The stick is protected with a password, but that is zero , zero , zero , zero."
print(wordtodigits.convert(security_eg_1))

I lost my USB stick, maybe I lost it, but I'm not sure. I used it just yesterday, I think it fell out of my coat pocket. It contains very sensitive information. The stick is protected with a password, but that is 0 , 0 , 0 , 0.


As we can see from the above contexts, when converting four zero's to a four digit number, the library does not understand the converging without adding it to one number.

## 1.4 Conclusion / Findings
The library is not able to convert a number word to a digit depending on the following:
- consits of a <b> , (comma)</b> right after the digit.
- Cannot convert time.
- Cannot convert date.
- Cannot to pin numbers with multiple zeros.

# 2. Working with text_to_num Library

## 2.1. Testing the Library
Here I will be testing the Library "Number to Text" with the example text given from the Library source to see if it functions.

In [41]:
sentence = (
...     "Huit cent quarante-deux pommes, vingt-cinq chiens, mille trois chevaux, "
...     "douze mille six cent quatre-vingt-dix-huit clous.\n"
...     "Quatre-vingt-quinze vaut nonante-cinq. On tolère l'absence de tirets avant les unités : "
...     "soixante seize vaut septante six.\n"
...     "Nombres en série : douze quinze zéro zéro quatre vingt cinquante-deux cent trois cinquante deux "
...     "trente et un.\n"
...     "Ordinaux: cinquième troisième vingt et unième centième mille deux cent trentième.\n"
...     "Décimaux: douze virgule quatre-vingt dix-neuf, cent vingt virgule zéro cinq ; "
...     "mais soixante zéro deux."
... )

In [43]:
print(alpha2digit(sentence, "fr", ordinal_threshold=0))

842 pommes, 25 chiens, 1003 chevaux, 12698 clous.
95 vaut 95. On tolère l'absence de tirets avant les unités : 76 vaut 76.
Nombres en série : 12 15 004 20 52 103 52 31.
Ordinaux: 5ème 3ème 21ème 100ème 1230ème.
Décimaux: 12,99, 120,05 ; mais 60 02.


In [44]:
sentence = (
    "Eight hundred and forty-two apples, twenty-five dogs, one thousand three horses, twelve thousand six hundred and ninety-eight nails."
 )

In [45]:
print(alpha2digit(sentence, "en", ordinal_threshold=0))

800 and 42 apples, 25 dogs, 1003 horses, 12600 and 98 nails.


This library is not able to convert the numeric words to digits but it can also understand different languages which include russian, english, french, spanish, portugese and german.

## 2.2 Lunch From Full Context
In this section of the notebook I will be testing 4 cases of lunch exmple validation we have created to see how well the library converts the number words.

### 1.2.1. First Lunch Exmple
<b> Context</b>

"I would like to arrange a lunch on __*March the twenty fourth*__ in the main hall. I expect about __*twenty five to thirty*__ people. Lunch starts at __*twelve thirty*__. Please also bring some vegetarian sandwiches."

<b>I am expecting for the text to output the following:</b>

"I would like to arrange a lunch on __*March 24*__ in the main hall. I expect about __*25 to 30*__ people. Lunch starts at __*12:30*__. Please also bring some vegetarian sandwiches."


In [10]:
lunch_eg_1 = 'I would like to arrange a lunch on March the twenty fourth in the main hall. I expect about twenty five to thirty people. Lunch starts at twelve thirty. Please also bring some vegetarian sandwiches.'
print(alpha2digit(lunch_eg_1, "en", ordinal_threshold=0))

I would like to arrange a lunch on March the 24th in the main hall. I expect about 25 to 30 people. Lunch starts at 12 30. Please also bring some vegetarian sandwiches.


Not only do we see that the library is able to convert numerical words to digtis but it also understood the data and time unlike the other wordtodigit library.

### 1.2.2. Lunch Example
<b>Context</b>

"Next Thursday I have an appointment with __*six*__ visitors. I'd like to offer them lunch. I have reserved the room __*one point two*__ for this. Preferably no fish sandwiches. Orange juice is always there, right?"

<b>I am expecting for the text to output the following:</b>

"Next Thursday I have an appointment with __*6*__ visitors. I'd like to offer them lunch. I have reserved the room __*1.2*__ for this. Preferably no fish sandwiches. Orange juice is always there, right?"

In [13]:
lunch_eg_2 = "Next Thursday I have an appointment with six visitors. I'd like to offer them lunch. I have reserved the room one point two for this. Preferably no fish sandwiches. Orange juice is always there, right?"
print(alpha2digit(lunch_eg_2, "en", ordinal_threshold=0))

Next Thursday I have an appointment with 6 visitors. I'd like to offer them lunch. I have reserved the room 1.2 for this. Preferably no fish sandwiches. Orange juice is always there, right?


As we can see, the above text is numbers are converted correctly.

### 1.2.3. Lunch Example
<b>Context</b>

"For an event with <b>ninety </b>students I would like to organize a meeting with some drinks and snacks. It is scheduled for <b>March thirtieth</b> in the auditorium. The drink is at <b>four pm</b>, but the event starts at <b>three pm</b>. It may cost a maximum of <b>five hundred euros</b>, will that work?"


<b>I am expecting for the text to output the following:</b>

"For an event with <b>90</b> students I would like to organize a meeting with some drinks and snacks. It is scheduled for <b>March 30 </b>in the auditorium. The drink is at <b>16:00</b>, but the event starts at <b>15:00</b>. It may cost a maximum of <b> 500 euros</b>, will that work?"

In [16]:
lunch_eg_3 = "For an event with ninety students I would like to organize a meeting with some drinks and snacks. It is scheduled for March thirtieth in the auditorium. The drink is at four pm, but the event starts at three pm. It may cost a maximum of five hundred euros, will that work?"
print(alpha2digit(lunch_eg_3, "en", ordinal_threshold=0))

For an event with 90 students I would like to organize a meeting with some drinks and snacks. It is scheduled for March 30th in the auditorium. The drink is at 4 pm, but the event starts at 3 pm. It may cost a maximum of 500 euros, will that work?


### 1.2.4. Lunch Example
<b>Context</b>

"Hello, my name is Jane Doe and I am from the Software department at Fontys Applied University. I would like to organize a lunch event with my team this Friday, __*the twenty fourth of March, two thousand and twenty three*__, from __*twelve noon until one o'clock*__ in the afternoon. The event will take place at Fontys TQ, located at Achtseweg Zuid __*one hundred fifty three, Five Thousand six hundred fifty one GW*__ Eindhoven. We have a total of __*fifteen*__ people attending, some of whom have allergies and special dietary preferences. Two of our members require a halal meal, and another requires a vegan meal. Additionally, one of our members is allergic to mushrooms, so we would like to request that mushrooms be excluded from the menu options."


<b>I am expecting for the text to output the following:</b>
    
"Hello, my name is Jane Doe and I am from the Software department at Fontys Applied University. I would like to organize a lunch event with my team this Friday, March <b>24th, 2023</b>, from <b>12:00 p.m.</b> until <b>1:00 p.m.</b> The event will take place at Fontys TQ, located at Achtseweg Zuid 153, 5651 GW Eindhoven. We have a total of <b>15</b> people attending, some of whom have allergies and special dietary preferences. Two of our members require a halal meal, and another requires a vegan meal. Additionally, one of our members is allergic to mushrooms, so we would like to request that mushrooms be excluded from the menu options."

In [20]:
lunch_eg_4 = "Hello, my name is Jane Doe and I am from the Software department at Fontys Applied University. I would like to organize a lunch event with my team this Friday, the twenty fourth of March, two thousand twenty three, from twelve noon until one o'clock in the afternoon. The event will take place at Fontys TQ, located at Achtseweg Zuid one hundred fifty three, Five Thousand six hundred fifty one GW Eindhoven. We have a total of fifteen people attending, some of whom have allergies and special dietary preferences. Two of our members require a halal meal, and another requires a vegan meal. Additionally, one of our members is allergic to mushrooms, so we would like to request that mushrooms be excluded from the menu options."
print(wordtodigits.convert(lunch_eg_4))

Hello, my name is Jane Doe and I am from the Software department at Fontys Applied University. I would like to organize a lunch event with my team this Friday, the 20 fourth of March, 2020 three, from 12 noon until 1 o'clock in the afternoon. The event will take place at Fontys TQ, located at Achtseweg Zuid 150 three, 5651 GW Eindhoven. We have a total of 15 people attending, some of whom have allergies and special dietary preferences. 2 of our members require a halal meal, and another requires a vegan meal. Additionally, 1 of our members is allergic to mushrooms, so we would like to request that mushrooms be excluded from the menu options.


In [21]:
print(alpha2digit(lunch_eg_4, "en", ordinal_threshold=0))

Hello, my name is Jane Doe and I am from the Software department at Fontys Applied University. I would like to organize a lunch event with my team this Friday, the 24th of March, 2023, from 12 noon until one o'clock in the afternoon. The event will take place at Fontys TQ, located at Achtseweg Zuid 153, 5651 GW Eindhoven. We have a total of 15 people attending, some of whom have allergies and special dietary preferences. 2 of our members require a halal meal, and another requires a vegan meal. Additionally, one of our members is allergic to mushrooms, so we would like to request that mushrooms be excluded from the menu options.


As we can see above the library is able to convert each numerical word to digit besides the time <b>one o'clock</b>

## 1.3. Security Form Full Context

In this section of the notebook I will be testing 1 cases of security form exmple validation we have created to see how well the library converts the number words.

### 1.3.1. Security Example

I am expecting for the text to output the following:
    
"I lost my USB stick, maybe I lost it, but I'm not sure. I used it just yesterday, I think it fell out of my coat pocket. It contains very sensitive information. The stick is protected with a password, but that is 0000."

In [24]:
security_eg_1 = "I lost my USB stick, maybe I lost it, but I'm not sure. I used it just yesterday, I think it fell out of my coat pocket. It contains very sensitive information. The stick is protected with a password, but that is zero zero zero zero."
print(alpha2digit(security_eg_1, "en", ordinal_threshold=0))

I lost my USB stick, maybe I lost it, but I'm not sure. I used it just yesterday, I think it fell out of my coat pocket. It contains very sensitive information. The stick is protected with a password, but that is 0000.


Unlike the other library this library is able to understand zero word to digits.

## 2.4. Conclusion / Findings
The library is able to convert a number word to a digit in all scenarious if the phrasing is correct, unlike the other library it can mostly do the following:
- consits of a <b> , (comma)</b> right after the digit.
- Cannot convert time.
- Cannot convert date.
- Cannot to pin numbers with multiple zeros.

# Overall Conclusion
To conclude the research into comparing both libraries and using different edge case examples of the Lunch & Security Form, it can be said and seen that the text_to_num Library is able to work best with the different edge cases in comparison to the wordtodigit library. 
Not only can it work with mostly time and date which is not the goal for the research but it can also work with commas. Overall it seems the best in working and has perks that it can also understand multiple languages.