## Project Background:
In chapter two, review Python lists, dicts, and comprehensions. We're now going to set them to work in a simple practical project. Finding anagrams in the dictionary of English words. As you know, two words are anagrams of each other when their letters can be rearranged to turn one word into the other. For instance, Elvis and lives are anagrams of each other. In this project, I will also introduce you to a few useful Python idioms. These are especially useful for expressive constructs that occur often in the real world and that will become a second nature to you as you get more experience with Python.

An example may be loading words from a text file into a Python list. So we are first going to see how to load an English dictionary into Python, then we are going to create a Python dict of anagrams. Last, I'm going to give you a challenge. Find a number of anagrams of each length. Let's begin.

## Project Goals:
* Load a dictionary of English words into a Python list
* Create a Python dict of anagrams, indexed by anagrammed word
* Challenge: Group dictionary words by their length and then find the total number of anagrams in each group


## Loading the Dictionary
We begin by loading a dictionary of English words into Python. To do so we open a Python notebook from the shell as we did before. And we find the location in the file system where we have our exercise files. We select the exercise file for this video. It begins as empty. This exercise file directory contains a file named "Words" that contains English words from the nineteen thirty four Webster Dictionary. We're going to use it to find our anagrams.

Let's start by taking a look at this file. The way you open files in Python is with the keyword open. Followed by the file name and the mode in which you're going to be accessing the file. In this case we're going to read it. Now word is a file object that can be used in many ways. For instance, we may use the method readlines which returns a list of all the lines in the file. We use the Python notebook word completion feature to go faster.

In [2]:
word = open('words','r')

In [3]:
word

<_io.TextIOWrapper name='words' mode='r' encoding='UTF-8'>

In [4]:
wordlist = word.readlines()

So let's see the first few lines using the slice annotation for lists, that we have learned in chapter 2. Here we go. How many words do we have? We can just count the elements in the list. This list that we just obtained is almost usable in our quest to study anagrams.

In [5]:
wordlist[:10]

['A\n',
 'a\n',
 'aa\n',
 'aal\n',
 'aalii\n',
 'aam\n',
 'Aani\n',
 'aardvark\n',
 'aardwolf\n',
 'Aaron\n']

In [6]:
len(wordlist)

235886

 However, we should get rid of the new line characters at the end of each string. They are shown as \n. These are not necessary for our work. We should also convert all the words to lowercase since we won't care for capitals in the context of an anagram.

In [7]:
'Aaron\n'.strip()

'Aaron'

In [8]:
'Aaron'.lower()

'aaron'

We can do both operations on a single word using string methods. Strip and lower. Say for instance we operate on Aaron ended by \n. Strip will take out the new line and lower will give us a lowercase version. So we can create the clean list of words using a handy list comprehension and chaining the methods strip and lower, on the strings.

In [9]:
wordclean = [word.strip().lower() for word in wordlist]

In [10]:
wordclean[:10]

['a',
 'a',
 'aa',
 'aal',
 'aalii',
 'aam',
 'aani',
 'aardvark',
 'aardwolf',
 'aaron']

Let's have a look at the first few. OK, this is exactly what we need. However we see now that our list has duplicates such as the letter A which appears twice in this list. One way to do so is to turn the list into a Python set which is a container that can have only one instance of any given object. And then convert back to a list. So we may create a words, wordunique = set a wordclean and then wrap that with a list constructor.

In [11]:
wordunique = list(set(wordclean))

In [12]:
wordunique

['synchondoses',
 'welwitschia',
 'tertial',
 'taleful',
 'unprovoke',
 'tunu',
 'shufflecap',
 'bacterially',
 'standardization',
 'stylopization',
 'servant',
 'bridged',
 'mudhopper',
 'ungifted',
 'wobbegong',
 'indecently',
 'pectinibranchian',
 'nitride',
 'celative',
 'unpeacefully',
 'cyprinoid',
 'manzas',
 'pharyngopneusta',
 'counterdemand',
 'amoebobacterieae',
 'spiritlike',
 'quinovin',
 'warrantee',
 'droughty',
 'yaply',
 'dumpage',
 'schoolcraft',
 'unsystematized',
 'pranky',
 'strickless',
 'epichirema',
 'windshock',
 'salicional',
 'unjapanned',
 'sanguinely',
 'hymenaeus',
 'decrepitness',
 'male',
 'padfoot',
 'chromatophile',
 'megatherian',
 'fluoranthene',
 'shoreweed',
 'backgammon',
 'indulgentially',
 'salutatorian',
 'intershop',
 'bantam',
 'loriot',
 'mains',
 'masdevallia',
 'lyperosia',
 'overthrust',
 'semasiologist',
 'perfected',
 'promotorial',
 'whils',
 'fussiness',
 'petalodont',
 'uncourtierlike',
 'domatium',
 'waxiness',
 'opisthocoelian',
 '

Well, this seems to have worked. However we've lost the alphabetical ordering of words. So we need to sort it now. We can do it in place with the method sort. We're almost back to the beginning but now we have a clean list of lowercase words. After you do this kind of reading and parsing operations a few times You'll find it convenient to do it in a single step, which you can do with a list comprehension. Indeed, it's possible to embed the file reading operation within a comprehension.

In [13]:
wordunique.sort()

In [14]:
wordunique[:10]

['a',
 'aa',
 'aal',
 'aalii',
 'aam',
 'aani',
 'aardvark',
 'aardwolf',
 'aaron',
 'aaronic']

We'll exploit the fact that one can iterate through the lines of a file by way of a for loop. We could write wordclean is wordstrip, lower for word in open. Words. Readmode. If we look at this, we see that the result is just as we had at the beginning.  If we want to be very concise, we can even embed this expression within a set constructor to remove duplicates.

In [15]:
wordclean = [word.strip().lower() for word in open('words','r')]

In [16]:
wordclean[:10]

['a',
 'a',
 'aa',
 'aal',
 'aalii',
 'aam',
 'aani',
 'aardvark',
 'aardwolf',
 'aaron']

If we want to be very concise, we can even embed this expression within a set constructor to remove duplicates. A list constructor, to go back to a list. And a sorted methods, to return a sorted list of the words. 

In [17]:
wordclean = sorted(list(set([word.strip().lower() for word in open('words','r')])))

In [18]:
wordclean[:10]

['a',
 'aa',
 'aal',
 'aalii',
 'aam',
 'aani',
 'aardvark',
 'aardwolf',
 'aaron',
 'aaronic']

Using a list comprehension to go through the lines of a file, as we just did, is one example of a Python idiom. These are language constructs that encapsulate a specially convenient or expressive solutions to common problems. The fact that they're used often means that they will be familiar to you.

** bold Wordlist =[transform(word) for word in open(filename,'r')if condition (word)]**

So it will be easy for you to detect bugs. And it will be easier for others to look at your code, to understand it. There is still room for you to choose the idioms that you like the best, and those that you don't.