### [Most Common Word](https://leetcode.com/problems/most-common-word/)

Given a paragraph and a list of banned words, return the most frequent word that is not in the list of banned words.  It is guaranteed there is at least one word that isn't banned, and that the answer is unique.

Words in the list of banned words are given in lowercase, and free of punctuation.  Words in the paragraph are not case sensitive.  The answer is in lowercase.

 

**Example:**
```
Input: 
paragraph = "Bob hit a ball, the hit BALL flew far after it was hit."
banned = ["hit"]
Output: "ball"
Explanation: 
"hit" occurs 3 times, but it is a banned word.
"ball" occurs twice (and no other word does), so it is the most frequent non-banned word in the paragraph. 
Note that words in the paragraph are not case sensitive,
that punctuation is ignored (even if adjacent to words, such as "ball,"), 
and that "hit" isn't the answer even though it occurs more because it is banned.
```

**Note:**
```
1 <= paragraph.length <= 1000.
0 <= banned.length <= 100.
1 <= banned[i].length <= 10.
The answer is unique, and written in lowercase (even if its occurrences in paragraph may have uppercase symbols, and even if it is a proper noun.)
paragraph only consists of letters, spaces, or the punctuation symbols !?',;.
There are no hyphens or hyphenated words.
Words only consist of letters, never apostrophes or other punctuation symbols.
```

In [1]:
class Solution:
    def mostCommonWord(self, paragraph, banned):
        """
        :type paragraph: str
        :type banned: List[str]
        :rtype: str
        """
        
        # Given a paragraph
        #   list of banned words
        
        # sort words in paragraph by their frequency
        #   find the most common word, that is not in list of banned words
        
        # paragraph may contain punctuations
        # words in paragraph are not case sensitive. so there could be mix of lowercase and upper case
        
        # brute force:
        #   split paragraph into words
        #   strip the leading and trailing punctuations [!?',;.]
        #   update: freq_map with word -> count
        
        #   sort the words by their frequency in reverse order
        #   find the first word that is not in the list of banned words (can use a set here as well.)
        
        # edge cases
        #   empty paragraph
        #   empty list of words
        
        # we have to convert the paragraph into lower case for comparisons
        paragraph = paragraph.lower()
        
        # convert banned words into a set for faster comparisons
        banned_words = set(banned)
        
        # allowed punctuations
        punctuations = set("!?',;.")
        counter = {}
        
        # count the frequency of individual words
        start = 0
        for index, char in enumerate(paragraph):
            if char.isspace() or char in punctuations:
                word = paragraph[start:index]
                if word:
                    counter[word] = counter.get(word, 0) + 1
                start = index + 1
            
        # make sure the last word is accounted for
        word = paragraph[start:]
        if word:
            counter[word] = counter.get(word, 0) + 1
            
        # find the most common word that is not in the banned words
        for word in sorted(counter, key=lambda x: counter[x], reverse=True):
            if word not in banned_words:
                return word
        

The first solution was done without using inbuilt library functions. When I referred to some posts in the discussion forum, found an interesting way to do this in a much concise form using regular expressions.

In [3]:
from collections import Counter
from typing import List

class Solution:
    
    def mostCommonWord(self, paragraph: str, banned: List[str]) -> str:
        
        # edge cases
        if not paragraph:
            return ""
         
        # convert banned_words into set for easy look up
        banned_words = set(banned)
        word_counter = Counter()
        
        words = re.compile('\w+').findall(paragraph.lower())
        word_counter = Counter(word for word in words if word not in banned_words)
        
        # we could iterate every word in the counter and find the most frequent word
        # or use the built in function.
        return max(word_counter, key = lambda word: word_counter[word])