## Task1: ##
Load the Dataset Word Anagrams provided to you as the ``words.zip” file
into the Jupyter Notebook. Note that this notebook requires Python 3.6 or higher.
Type ``ls words” to list all files under the unzipped words folder. 

In [1]:
import zipfile
import collections


In [2]:
zipfile.ZipFile('Words.zip').extractall('.')

In [3]:
ls words


words.txt


## Task2: ##
Load/Read a dictionary of English words from the file 'words/words.txt' into
a Python List as wordlist.
After that, you need to do some cleaning on the wordlist. These are as follows:

(a) Remove all the leading and trailing spaces including the `\n’ character. This involves the
use of the strip() method

(b) Make them i.e., words all lowercase

(c) Remove all duplicates

(d) Finally, sort them in lexicographic order means in alphabetical order, and save the result
into a new List called wordclean.

To do the above steps, you can re-use the class coding demo file available on the Blackboard
course website.

In [4]:
#step1: loading data
word = open('words/words.txt','r')
wordlist = word.readlines()
wordlist[:10]

['A\n',
 'a\n',
 'aa\n',
 'aal\n',
 'aalii\n',
 'aam\n',
 'Aani\n',
 'aardvark\n',
 'aardwolf\n',
 'Aaron\n']

In [5]:
len(wordlist)

235886

In [6]:
#step2:cleaning data
#removing item (a)
wordlist = [w.strip() for w in wordlist]

In [7]:
#(b)making all words lowercase
wordlist = [w.lower() for w in wordlist]

In [8]:
#(c)Removing duplicates by having set, as set will not accept repetetive elements
wordlist = list(set(wordlist))


In [9]:
#(d) sorting in alphabetical order
wordlist =sorted(wordlist)

In [10]:
wordclean = wordlist
wordclean[:10]

['a',
 'aa',
 'aal',
 'aalii',
 'aam',
 'aani',
 'aardvark',
 'aardwolf',
 'aaron',
 'aaronic']

## Task 3: ##
From the generated List wordclean, separate words into classes of words
with the same length. This involves creating of a Dictionary called words_bylength from the
wordclean List where key is the word length and the value is the set of words having the same
word length. 

In [11]:
words_bylength ={}

for word in wordclean:
    length = len(word)
    if length not in words_bylength:
        words_bylength[length] = set()
    words_bylength[length].add(word)

words_bylength


{1: {'a',
  'b',
  'c',
  'd',
  'e',
  'f',
  'g',
  'h',
  'i',
  'j',
  'k',
  'l',
  'm',
  'n',
  'o',
  'p',
  'q',
  'r',
  's',
  't',
  'u',
  'v',
  'w',
  'x',
  'y',
  'z'},
 2: {'aa',
  'ab',
  'ad',
  'ae',
  'ah',
  'ai',
  'ak',
  'al',
  'am',
  'an',
  'ao',
  'ar',
  'as',
  'at',
  'aw',
  'ax',
  'ay',
  'ba',
  'be',
  'bo',
  'bu',
  'by',
  'ca',
  'ce',
  'da',
  'de',
  'di',
  'do',
  'ea',
  'ed',
  'eh',
  'el',
  'em',
  'en',
  'er',
  'es',
  'eu',
  'ex',
  'ey',
  'fa',
  'fe',
  'fi',
  'fo',
  'fu',
  'ga',
  'ge',
  'gi',
  'go',
  'ha',
  'he',
  'hi',
  'ho',
  'hu',
  'hy',
  'id',
  'ie',
  'if',
  'in',
  'io',
  'is',
  'it',
  'ji',
  'jo',
  'ju',
  'ka',
  'ko',
  'la',
  'li',
  'lo',
  'lu',
  'ly',
  'ma',
  'me',
  'mi',
  'mo',
  'mr',
  'mu',
  'my',
  'na',
  'ne',
  'ni',
  'no',
  'nu',
  'od',
  'oe',
  'of',
  'og',
  'oh',
  'ok',
  'om',
  'on',
  'or',
  'os',
  'ow',
  'ox',
  'pa',
  'pi',
  'po',
  'pu',
  'ra',
  're',
  '

## Task4: ##
For each class of words of the same length, find all anagrams. This involves
creating of another Dictionary called anagrams_bylength from the words_bylength
Dictionary where key is the value of each word in the words_bylength Dictionary and the
value is the all-possible anagrams for that corresponding key.
For example, from the above output words_bylength[2] = [aa, ab, ad, ae,........] where say, ab
will be the key of the Dictionary anagrams_bylength and the value will be the all-possible
anagrams i.e., [ab, ba].

# Note to find all anagrams, you can re-use the anagram_fast(myword) method as shown in
class. Also, to get the same output as mine, you’ve to consider the following condition:
len(anagram_fast(myword)) > 1 as well.

In [12]:
from collections import defaultdict

def anagram_fast(myword, word_list):
    """Find all anagrams of myword from the given word_list."""
    sorted_word = "".join(sorted(myword))
    return [word for word in word_list if "".join(sorted(word)) == sorted_word]

def generate_anagrams_bylength(words_bylength):
    """Generate a dictionary mapping words to their anagrams for each word length."""
    anagrams_bylength = defaultdict(dict)
    
    for length, words in words_bylength.items():
        anagrams = {}
        for word in words:
            anagrams[word] = anagram_fast(word, words)
        
        # Only store words with more than one anagram
        anagrams_bylength[length] = {k: v for k, v in anagrams.items() if len(v) > 1}
    
    return dict(anagrams_bylength)


# Generate anagrams by length
anagrams_bylength = generate_anagrams_bylength(words_bylength)

# Print result
import pprint
pprint.pprint(anagrams_bylength)

{1: {},
 2: {'ab': ['ba', 'ab'],
     'ad': ['ad', 'da'],
     'ae': ['ae', 'ea'],
     'ah': ['ha', 'ah'],
     'ak': ['ka', 'ak'],
     'al': ['la', 'al'],
     'am': ['am', 'ma'],
     'an': ['na', 'an'],
     'ar': ['ar', 'ra'],
     'as': ['sa', 'as'],
     'at': ['at', 'ta'],
     'aw': ['aw', 'wa'],
     'ay': ['ay', 'ya'],
     'ba': ['ba', 'ab'],
     'da': ['ad', 'da'],
     'de': ['de', 'ed'],
     'di': ['id', 'di'],
     'do': ['od', 'do'],
     'ea': ['ae', 'ea'],
     'ed': ['de', 'ed'],
     'eh': ['eh', 'he'],
     'em': ['em', 'me'],
     'en': ['en', 'ne'],
     'er': ['re', 'er'],
     'es': ['es', 'se'],
     'ey': ['ey', 'ye'],
     'fi': ['if', 'fi'],
     'fo': ['of', 'fo'],
     'go': ['go', 'og'],
     'ha': ['ha', 'ah'],
     'he': ['eh', 'he'],
     'ho': ['ho', 'oh'],
     'id': ['id', 'di'],
     'if': ['if', 'fi'],
     'in': ['in', 'ni'],
     'is': ['is', 'si'],
     'it': ['it', 'ti'],
     'ka': ['ka', 'ak'],
     'ko': ['ko', 'ok'],
     'la': ['la',