# Handin Exercise 6

Create a module containing a class: TextComparer with the following methods:
1. `__init__(self, url_list)`
2. `download(url,filename)` that stores the file on disk and raises NotFoundException when url returns 404
3. `multi_download()` uses threads to download multiple urls as text and stores filenames on a property of the TextComparer class object (Hint: use the download() method and create the filenames from the url or from the response object)
4. `__iter__()` returns an iterator
5. `__next__()` returns the next filename (and stops when there are no more)
6. `urllist_generator()` returns a generator to loop through the urls
7. `avg_vowels(text)` - a rough estimate on readability returns average number of vowels in the words of the text
8. `hardest_read()` reads all text from files in `filenames` and returns the filename of the text with the highest vowel score (use all the cpu cores on the computer for this work.

### Ex 2
Create a notebook and import your module from above
1. Find 10 books on https://www.gutenberg.org/browse/scores/top and download them using an object of the class you just created before
2. Test the different methods of your class
3. Make a bar plot with a sorted list of books on x-axis and avg vowels on y-axis

In [26]:
from concurrent.futures import ThreadPoolExecutor
from concurrent.futures import ProcessPoolExecutor
import multiprocessing
import requests

class NotFoundException(Exception):
    pass

class TextComparer():
    
    def __init__(self, url_list):
        self.url_list = url_list
        self.filenames = []
        self.count = 0
    
    def download(self, url, filename):
        r = requests.get(url)
        if (r.status_code == 404):
            raise NotFoundException("URL returned 404")
        else:
            with open(filename, 'wb') as f:
                f.write(r.content)
            self.filenames.append(filename)
    
    def multi_download(self):
        with ThreadPoolExecutor(5) as ex:
            ex.map(self.download, self.url_list.keys(), self.url_list.values())
    
    def __iter__(self):
        return self
    
    def __next__(self):
        if (self.count <= len(self.filenames)):
            self.count += 1
            return self.filenames[self.count-1]
        else:
            return
    
    def urllist_generator(self):
        for url in self.url_list:
            yield url
    
    def avg_vowels(self, text):
        vowels = "AaEeIiOoUu"
        checker = [each for each in text if each in vowels]
        return len(checker)
    
    def hardest_read(self):
        score_dict = {}
        for filename in self.filenames:
            with open(filename) as file:
                content = file.read()
                score_dict[filename] = self.avg_vowels(content)
        highest_score = {
            "Name": filename,
            "Amount of Vowels": max(score_dict.values())
        }  
        
        #sorted(score_dict.items(), key=lambda x: x[1], reverse=True) - For at sorted dict going on values
        return highest_score

In [27]:
urls = {
    "https://www.gutenberg.org/files/1342/1342-h/1342-h.htm": "Pride and Prejudice, by Jane Austen",
    "https://www.gutenberg.org/files/84/84-h/84-h.htm": "Frankenstein, by Mary Wollstonecraft (Godwin) Shelley",
    "https://www.gutenberg.org/files/11/11-h/11-h.htm": "Alice’s Adventures in Wonderland, by Lewis Carroll",
    "https://www.gutenberg.org/files/2701/2701-h/2701-h.htm": "Moby-Dick; or The Whale, by Herman Melville",
    "https://www.gutenberg.org/files/1952/1952-h/1952-h.htm": "The Yellow Wallpaper, by Charlotte Perkins Gilman"
}

tc = TextComparer(urls)

tc.multi_download()

iterator = iter(tc)


try:
    element1 = next(iterator)
    element2 = next(iterator)
    element3 = next(iterator)
    element4 = next(iterator)
    element5 = next(iterator)
    print(element1)
    print(element2)
    print(element3)
    print(element4)
    print(element5)
except StopIteration as e:
    print(type(e))
    
print("There are ", tc.avg_vowels("There are vowels in this sentence"), " vowels in this sentence")

print(tc.hardest_read())

The Yellow Wallpaper, by Charlotte Perkins Gilman
Alice’s Adventures in Wonderland, by Lewis Carroll
Pride and Prejudice, by Jane Austen
Frankenstein, by Mary Wollstonecraft (Godwin) Shelley
Moby-Dick; or The Whale, by Herman Melville
There are  11  vowels in this sentence
{'Name': 'Moby-Dick; or The Whale, by Herman Melville', 'Amount of Vowels': 381862}
