# Exercise notebook 6 - Frequency distributions

This exercise notebook complements the notebook of **Frequency Distributions in Python Counting**. <br> Use this [notebook](https://github.com/dtaantwerp/dtaantwerp.github.io/blob/53c91013df4514a943a9fad441fb5f28dc6f6bab/notebooks/07_W2_Mon_Frequency_Distribution_in_Python_Counting_Text.ipynb) for a complete explanation of the theory.

Try to prepare the <ins>underlined exercises</ins> for the exercise session. 
<br>
<br>
<br>

We know it's helpful, but we ask you to only use the `Counter()` object if it is specified in the exercise.

1. <ins> Create a **character-based** frequency dictionary of the string below. 
- Make sure that uppercase and lowercase letters are interpreted as the same letter.


In [None]:
Dorian = '''The artist is the creator of beautiful things. To reveal art and
conceal the artist is art’s aim. The critic is he who can translate
into another manner or a new material his impression of beautiful
things.

The highest as the lowest form of criticism is a mode of autobiography.
Those who find ugly meanings in beautiful things are corrupt without
being charming. This is a fault.

Those who find beautiful meanings in beautiful things are the
cultivated. For these there is hope. They are the elect to whom
beautiful things mean only beauty.

There is no such thing as a moral or an immoral book. Books are well
written, or badly written. That is all.

The nineteenth century dislike of realism is the rage of Caliban seeing
his own face in a glass.'''

#SOLUTION

letters = {}

for char in Dorian.lower():
    if char in letters:
        letters[char] += 1
    else:
        letters[char] = 1


2. <ins> Create another frequency dictionary in which  you count **only the uppercase** letters in the text above.

In [None]:
#SOLUTION

uppercase = {}
for char in Dorian:
    if char.isupper():
        if char in uppercase:
            uppercase[char] += 1
        else:
            uppercase[char] = 1

uppercase

3. <ins> Create another character-level frequency dictionary based on the Dorian excerpt but this time:
- make sure that uppercase and lowercase characters are interpreted as the same,
- ignore punctuation,
- ignore whitespaces and linebreaks.

In [None]:
#SOLUTION 1
punct = """!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~’"""
freq_dic = {}

for char in Dorian.lower():
    if char not in punct and char != ' ' and char != '\n':
        if char in freq_dic:
            freq_dic[char] += 1
        else:
            freq_dic[char] = 1
      
        

In [None]:
#SOLUTION 2
result = {}
for char in Dorian.lower():
    if char in 'abcedefghijklmnopqrstuvwxyz':
        if char in result:
            result[char] += 1
        else:
            result[char] = 1
result


4. <ins> Write a piece of code that identifies all the letters of the alphabet that **do not** occur in the text.

In [None]:
#SOLUTION

letters = set(Dorian.lower())
alphabet = set('abcdefghijklmnopqrstuvwxyz')

absent = alphabet - letters

print(absent)

5. <ins> Can you recreate the piece of code you wrote in exercise 1, but this time using 'try' and 'except' statements?

In [None]:
#SOLUTION

result = {}
for char in Dorian.lower():
    try:
        result[char]+= 1
    except:
        result[char] = 1




6. <ins> Create a **word-based** frequency dictionary of the following string.

In [None]:
Seuss = """Congratulations! Today is your day. You're off to Great Places! You're off and away! 
            You have brains in your head. You have feet in your shoes. You can steer yourself Any direction you choose.
            You're on your own. And you know what you know. And YOU are the guy who'll decide where to go.
            You'll look up and down streets. Look 'em over with care.About some you will say, "I don't choose to go there."
            With your head full of brains and your shoes full of feet, You're too smart to go down any not-so-good street.
            And you may not find any You'll want to go down. In that case, of course, You'll head straight out of town."""

#SOLUTION

words = {}

for word in Seuss.lower().split():
    if word in words:
        words[word] += 1
    else:
        words[word] = 1


6. Below you'll find a list of strings, try to convert each item in the list to an integer. You'll notice that this will raise an error. Try to work your way around the error by using 'except'.

In [None]:
a_list = ['9', '5', 'N/A', '6', '8']
converted_list = []

#SOLUTION 

for item in a_list:
    try:  
        converted_list.append(int(item))
    except ValueError:
        converted_list.append(item)

converted_list


7. By making use of the Counter() object, count how often each character occurs in the Seuss excerpt. 

In [None]:
#SOLUTION

from collections import Counter
count = Counter()
count.update(Seuss.lower())
print(count)

8. Update the counter with the characters of the Dorian excerpt. 

In [None]:
#SOLUTION
count.update(Dorian.lower())

9. Print the five most common characters of the culminated count of the Dorian and Seuss excerpts.

In [None]:
#SOLUTION
print(count.most_common(5))

10. *A bit more difficult*: create a word-based frequency dictionary of the following text. 
- Lowercase the text
- Make sure that the punctuation is seperatad from the word, i.e. the word 'accent' in the first line should be interpreted as the word 'accent' and a comma and not "accent,".
- Once the preprocessing steps are completed you can create the frequency dictionary with the Counter object.
- Print the 10 most common tokens in the text. 

The output should look like this: 
`[(',', 22),
 ('the', 20),
 ('of', 9),
 ('and', 7),
 ('in', 7),
 (';', 6),
 ('has', 4),
 ('to', 4),
 ('a', 4),
 ('is', 4)]`

In [None]:
text = """The hour when history speaks with its free and venerable accent, has
not yet sounded for him; the moment has not come to pronounce a
definite judgment on this king; the austere and illustrious historian
Louis Blanc has himself recently softened his first verdict; Louis
Philippe was elected by those two almosts which are called the 221
and 1830, that is to say, by a half-Parliament, and a half-revolution;
and in any case, from the superior point of view where philosophy must
place itself, we cannot judge him here, as the reader has seen above,
except with certain reservations in the name of the absolute democratic
principle; in the eyes of the absolute, outside these two rights, the
right of man in the first place, the right of the people in the second,
all is usurpation; but what we can say, even at the present day, that
after making these reserves is, that to sum up the whole, and in
whatever manner he is considered, Louis Philippe, taken in himself, and
from the point of view of human goodness, will remain, to use the
antique language of ancient history, one of the best princes who ever
sat on a throne."""

#SOLUTION

all_words = []

for word in text.lower().split():
    if word[-1] not in 'abcdefghijklmnopqrstuvwxyz':
        all_words.append(word[0:-1])
        all_words.append(word[-1])
    else:
        all_words.append(word)

word_freqs = Counter(all_words)
word_freqs.most_common(10)

11. Write a piece of code that takes in a list of integers and returns a dictionary where the keys are the unique integers in the list and the values are their frequency in the list.

In [None]:
#SOLUTION

the_nrs = input('Give me some random numbers seperataed by a whitespace').split()
freq_count = {}

for item in the_nrs:
    if item in freq_count:
        freq_count[item] += 1
    else:
        freq_count[item] = 1

print(freq_count)

12. Convert frequency dictionary to relative frequency dictionary

- Create a character level frequency dictionary of the text below. 
- Make a copy of the frequency dictionary. Use copy() method; 
- Calculate the number of all characters in the text;
- Iterate over the frequency dictionary to update a count to relative frequency;
- Print relative frequency.

In [None]:
flowers = """I can buy myself flowers
Write my name in the sand
Talk to myself for hours
Say things you don't understand
I can take myself dancing
And I can hold my own hand
Yeah, I can love me better than you can"""

#SOLUTION

freq_dic = {}

for char in flowers.lower():
    if char in freq_dic:
        freq_dic[char] += 1
    else:
        freq_dic[char] = 1

print(freq_dic)

relative_frequency = freq_dic.copy()
size = len(flowers)

for c in freq_dic:
    relative_frequency[c] /= size
    
print(relative_frequency)

13. Recreate the frequency dictionary from te previous exercise using the Counter() object and then print the 5 least common letters.

In [None]:
#SOLUTION

freq_dic = Counter(flowers)

print(freq_dic.most_common()[-5:])

14. Write a piece of code that takes in a list of words and returns the length of the longest word in the list.

In [None]:
#SOLUTION

the_list = input('give me some random words seperated by a whitespace').split()
lenghts = []

for item in the_list:
    lenghts.append(len(item))

print(max(lenghts))
