# Example usage

## `count_keywords`

This is a function that counts keywords in a text. The user gets to specify the keywords (represented as a list of strings) to look for and the text (represented as a string object). For our demonstration, we will use the Unlimited Blade Works chant from the popular anime series Fate/Stay Night, with some adjustments to demonstrate the some features of the function.

In [10]:
from wordwright.count_keywords import count_keywords

ubw_chant = '''I am the Bone of my Sword.
               Steel is My Body and Fire is my Blood. 
               I've created over a Thousand Blades, 
               Unknown to Death, 
               Nor known to Life. 
               Have withstood Pain to create many Weapons.
               Yet those Hands will never hold Anything.
               So, as I Pray--
               Unlimited Blade Works.'''
count_keywords(ubw_chant, ['my', 'life', 'Pray', 'i\'ve'])

{'my': 3, 'life': 1, 'pray': 1, "i've": 1}

The function counts these words correctly. Moreover, notice that it forces lower case and ignores punctuation, with the exception of the apostrophes used for sentence contraption. This is because our function first preprocesses the text and the keywords by forcing lower case and removing all punctuation except for the apostrophes when evaluating word counts. This includes cases where a hyphen is used to join two words (ex: to-do is treated as todo). These rules extend to both the text and the keywords:

In [15]:
test = 'I can\'t seem to find the to-do list, cant you help me find the todo list?'
count_keywords(test, ['can\'t', 'todo'])

{"can't": 1, 'todo': 2}

Here we see that the word "can't" is treated as a separate word than "cant" (a count of 1), whereas "to-do" is treated as the same word as "todo" (a count of 2).

If the user were to specify the same word multiple times, the output would only show the count for that word once. The position of this word in the output is determined by that of the word's first appearance in the keyword list:

In [16]:
test = 'Where can I find a phone?'
count_keywords(test, ['find', 'phone', 'phonE!', 'phone'])

{'find': 1, 'phone': 1}

Another important thing is white space/punctuation only string. Because we evaluate word counts after preprocessing, our function treats these strings as empty:

In [18]:
test = 'abcdefg.'
count_keywords(test, [' ', '', '\'-['])

{'': 0}

We see that even with the presence of an apostrophe, if the string contains only punctuation, if will be treated as an empty string. And since all these three keywords are considered empty strings, the final output only shows the count for empty string once. This is also applicable for the text:

In [19]:
test = '     /....,.!'
count_keywords(test, ['', 'a', 'b'])

{'': 0, 'a': 0, 'b': 0}

Here the text itself contains only white spaces and punctuation, and therefore the function considers it an empty string, and the function evaluates all counts to be 0, including that of an empty string.

Finally, a word trapped inside punctuation would be extracted and evaluated as is. This applies to both the text and keywords:

In [20]:
test = '!y-o,u'
count_keywords(test, ['y.o.u!!!!!'])

{'you': 1}

We see that the word "you" is trapped within punctuation, but our function takes the alphabets out and pieces the word together.