# A program to print top 5 most used words in a paragraph.

We will be writing a python program that prints the top five most common words in a paragraph.

For example:

**Input**

```
When programmable computers were first created, they rapidly overtook humans in solving problems that could be described by a list of formal mathematical rules, such as crunching numbers. The main obstacle to computers and artificial intelligence proved to be the tasks that are easy for human beings but difficult to formalize as a set of mathematical rules. The tasks such as recognizing spoken words or differentiating objects in images require intuition and do not translate to simple mathematical rules.
```

**Output**

```
"to": 4 times.
"mathematical": 3 times.
"rules": 3 times.
"as": 3 times.
"the": 3 times.
```


### Simple and understandable solution

In [1]:
alphabets = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
current_word = ""
word_counter = {}
paragraph = input().lower() + ' '
for character in paragraph:
    if character in alphabets:
        current_word += character
    elif len(current_word)>1 or current_word == "a":
        if word_counter.get(current_word):
            word_counter[current_word] += 1
        else:
            word_counter[current_word] = 1
        current_word = ""
sorted_counter = sorted(word_counter.items(), key=lambda x:x[1], reverse = True)
for word, frequency in sorted_counter[:5]:
    print(f'"{word}": {frequency} times.')


When programmable computers were first created, they rapidly overtook humans in solving problems that could be described by a list of formal mathematical rules, such as crunching numbers. The main obstacle to computers and artificial intelligence proved to be the tasks that are easy for human beings but difficult to formalize as a set of mathematical rules. The tasks such as recognizing spoken words or differentiating objects in images require intuition and do not translate to simple mathematical rules.
"to": 4 times.
"mathematical": 3 times.
"rules": 3 times.
"as": 3 times.
"the": 3 times.


Let's break down the code into small chunks to understand it better.

In [None]:
alphabets = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
current_word = ""
word_counter = {}
paragraph = input().lower() + ' '

Till now, we have only initialized and assigned variables.
 
1. `alphabets` contains all the alphabets of English language. 

2. `current_word` will contain the word we are currently iterating. I will explain this later. 

3. `word_counter` is a dictionary that consists of word as a key and its frequency as value. 

4. `paragraph` will take input from the user and convert the entire paragraph into lower case using `lower()` method. You see a space ' ' being added to the variable `paragraph`. Try to figure out why. I will explain it later while explaining the rest of the code.

In [None]:
for character in paragraph:
    if character in alphabets:
        current_word += character

This `for` loop iterates through string `paragraph`. On each iteration, it takes out one character from the entire paragraph. Then `if` function checks if the character being iterated is in the `alphabets` list. If it is in the `alphabets`, it adds the character to the `current_word` variable else it skips the `if` block of code and goes directly to the `elif` block.

In [None]:
    elif len(current_word)>1 or current_word == "a":
        if word_counter.get(current_word):
            word_counter[current_word] += 1
        else:
            word_counter[current_word] = 1
        current_word = ""

The block of code under `elif` condition assigns a key:value pair to the `word_counter` dictionary.
If the word is already present in the dictionary it adds 1 to its value which represents the word's frequency and if the word is new, it assigns its value as 1.
After assigning the word to the dictionary, `current_word` is again emptied to continue the process for the next word.

Why `elif` and not `else`?
There might be instances where two punctuations are placed next to each other like:
"Hey, What are you doing there`**?"**`
In such cases if we had only used else without any condition, "" would be added to the dictionary `word_counter`. But now that we have elif condition which tests if the length of the current_word is greater than 1, it solves the problem. Characters are added to the current_word in the previous block only if the character is in `alphabets`, so even if we get hundreds of punctuation placed next to each other, len(current_word) will never be greater than 1.

Why did we add " " in the `paragraph` variable when we initialized and assigned it?
If the input paragraph ends without having any punctuation at the end, the program will assign the last word to `current_word` but it won't be able to add the `current_word` to `word_counter` as the for loop will end. So to undergo one more iteration, we add `" "` or `"."` at the end of the paragraph.

In [None]:
sorted_counter = sorted(word_counter.items(), key=lambda x:x[1], reverse = True)

`sorted()` function returns a sorted list for a given list, tuple, or dictionary. In case of dictionary, it creates a list with key:value pairs as tuples.

Here, we are sorting using values in our key:value pair so we are using `key=lambda x:x[1]` as an attribute. If we were sorting through keys, we would use `key=lambda x:x[0]` as an attribute.

Also, as we are trying to find the most common words, we need to sort the dictionary in a descending order hence `reverse = True`. 

If none of these attributes were given, the sorted function would sort through keys alphabetically in an ascending order.

In [None]:
for word, frequency in sorted_counter[:5]:
    print(f'"{word}": {frequency} times.')

It prints the word and its frequency in a pretty format using f strings.
We are just looping through the sorted dictionary for 5 times and printing key:value pair using f string and additional texts.

## Geeky and short solution
I won't be describing each and every line of this solution like I did to the previous solution. I will only explain what each line does and it's your job to find out why.

In [None]:
alphabets = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
word = [''.join([y for y in x if y in alphabets]) for x in input().lower().split()]
frequency = {k:word.count(k) for k in set(word)}
print(sorted(frequency.items(), key=lambda x: x[1], reverse=True)[:5])

Now we will look each line of the code.

In [None]:
alphabets = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']

It initializes and assigns all the alphabets to the `alphabets` variable.

In [None]:
word = [''.join([y for y in x if y in alphabets]) for x in input().lower().split()]

It creates a list of word from the input paragraph.

In [None]:
frequency = {k:word.count(k) for k in set(word)}

dictionary comprehension to count the number of words in the list word and assign word:frequency pair to the dictionary frequency

In [None]:
print(sorted(frequency.items(), key=lambda x: x[1], reverse=True)[:5])

It sorts the dictionary in descending order using values and prints the first 5 items of the sorted list.