# Data Types and Structures in Python

Instructor notes:
- This document contains solutions.  Remove solutions before assigning to students.
- This work flow could be given as a 2 hour lab where students work through the material and small tasks, and then attempt the culminating task.
- This work flow could also be split so the instructor covers the material over a lecture or two, and then assigns the tasks as homework.
- This material should be presented within the first 2 weeks of DATA 271.  A solid understanding and comfort with this material is imperative for understanding future topics in the course.

Notes for the student:
- This document contains examples and small tasks ("appetizers") for you to make sure you understand the examples.  The culminating task ("main course") at the end of the document is more complex, and uses most of the topics you have will have worked through.

- plus-equal (+=)operator provides a convenient way to add a a value to an existing variable and assign the new value back to the same variable 
- character methods, relate to strings
- sorted()
- lambda

## Learning Goals
- Become familiar with Python's data types and structures.
- Become familiar with common functions and methods for these data structures.
- Understand indexing and slicing.
- Write short programs to solve small problems using these data structures, functions and methods.

## Overview

A data structure is an abstract description of a way of organizing data to allow certain operations on it to be performed efficiently. For example, a binary tree is a data structure. Theoreticians describe data structures and prove their properties in order to show that certain algorithms or problems can be solved efficiently under certain assumptions.

A data type is a (potentially infinite) class of concrete objects that all share some property. For example, "integer" is a data type containing all of the infinitely many integers, "string" is a data type containing all of the infinitely many strings, and "32-bit integer" is a data type containing all integers expressible in thirty-two bits. 

A nice analogy from Stack Overflow is that a data type is like an atom, while data structures are like molecules, meaning data types can't be further reduced, whereas a data structure may consist of multiple fields of different data.

The basic Python data structures include list, set, tuples, and dictionary. Each of the data structures is unique in its own way, and we will investigate these properties in this activity. Data structures are “containers” that organize and group data according to type.

The data structures differ based on mutability and order. Mutability refers to the ability to change an object after its creation. Mutable objects can be modified, added, or deleted after they’ve been created, while immutable objects cannot be modified after their creation. Order, in this context, relates to whether the position of an element can be used to access the element.

Lists, sets, and dictionaries are **mutable**.
String and tuples are **immutable**.


Python has six data types:
- numeric (int, float, etc.)
- string
- list
- tuple
- set
- dictionary

These are all **objects** and have properties and methods.  


## Type and Converting Types

If you would like to know the type of a value, you can use type().

In [3]:
type(7)

int

In [4]:
type(7.1)

float

In [5]:
type('Hello World')

str

You can convert values from one type to another using built-in Python functions.

In [7]:
s = '32'
type(s)

str

In [8]:
t = int(s)
type(t)
# this creates a new variable called t that is an int, and does not change original variable s.

int

### Task: 
Convert 3.99 to an int.  What do you get?
Consider '3.14159'.  What is its type?  Convert it to a float.  What do you get?

### Solution

In [125]:
int(3.99)

3

In [126]:
type(3.14159)

float

In [127]:
float(3.14159)

3.14159

## Strings

A string is a sequence of characters and we can acccess the characters using bracket notation with an integer inside the bracket.  In Python, indexing starts at $0$.  Strings are immutable.
We can't modify a string once we make it.  An empty string can be created with either single or double quotes ('' or "").

In [11]:
fruit = 'pineapple'
letter = fruit[2]
print(letter)

n


### Slicing
We can slice a string, too.  For example, if we want to print the "ap" in the middle of the word pineapple, we can give the index where the slice should start and the index one beyond where it should end.  If these numbers are the same, you will get an empty string. If you start at the beginning or go all the way to the end, you do not need the second index.

Indexing is a good thing to pay attention to because we will generalize it when working in data frames.

In [12]:
print(fruit[4:6])

ap


In [13]:
print(fruit[:6]) # from first to 5th position

pineap


In [14]:
print(fruit[4:]) # from 4th to last position

apple


In [15]:
print(fruit[-1]) # notice the negative index.  This will print the last character

e


### Task:
Consider the string "A quick brown fox jumps over the lazy dog."
- print the word dog using slicing and positive indices
- print the word dog using slicing and negative indcies
- print "fox jumps over the lazy dog."

### Solution


In [131]:
phrase = "A quick brown fox jumps over the lazy dog."
print(phrase[38:41])
print(phrase[-2:-5])
print(phrase[14:])

dog

fox jumps over the lazy dog.


### Methods 
Python has built in functions and methods that work on strings.  
Methods are functions which are built into the object and are available to any instance of the object.
For a whole list, do a quick google search or type dir(str) or dir(fruit) which will list the methods available for strings.  For today, we will highlight
- len() which returns the length of a string
- .upper() which will convert the string to uppercase
- .lower() which will conver the string to lowercase

Calling a function is done with the function name, parentheses and whatever argument(s) it takes.
Calling a method is similar to calling a function, but the syntax is different.  It is the variable_name.method_name (the period is a delimiter).  Calling a method is called an invocation (we are invoking the method on the variable).

In [16]:
# len is a function
len(fruit) # how long is the word pineapple

9

We are invoking upper on fruit.

In [24]:
# upper is a method
print(fruit.upper())


PINEAPPLE


In [25]:
# note, the orginal fruit variable is unchanged
print(fruit)

pineapple


Remember, string are immutable, so if we want the uppercase word PINEAPPLE, we will need to save it to a variable.

In [26]:
newfruit = fruit.upper()
print(newfruit)

PINEAPPLE


We are invoking find on fruit to find the index of the letter e in the word pineapple.

In [27]:
index = fruit.find('e')
print(index)

3


We can also find substrings as well as a single character.  If find() returns $-1$, it means the substring was not found.

In [28]:
index = fruit.find('ine')
print(index)

1


### Task:

- Consider again the string "A quick brown fox jumps over the lazy dog."
- What is the index of the word brown?
- Is the find method case-sensitive?
- What happens if you try to find the word coyote in this string?

### Solution

In [133]:
phrase = "A quick brown fox jumps over the lazy dog."
index = phrase.find('brown')
index

8

In [135]:
index = phrase.find('Brown')
index
# this shows find() is case sensitive.  

-1

In [137]:
index = phrase.find('coyote')
index

-1

### Task: 
Consider the Python code below which stores a string.  Use find() and string slicing to extract the portion of the string after the colon and then use float to convert the extracted string to a floating point number.

In [None]:
str = 'K-ARGT-Russe: 5.43'
# add your code here

### Solution


In [138]:
str = 'K-ARGT-Russe: 5.43'
index = str.find(' ') # find space
number = float(str[index + 1:])
number

5.43

## Lists

A list is a sequence of values (potentially of different types).  Lists are created with the square bracket notation.  Values are called elements, and elements are accessed with bracket notation (just as they are for strings).  In general, we can view lists as a mapping between indices and elements.  Lists are **mutable.**

In [31]:
l1 = ['fox', 7, 3.1, [1,2]] # this list has a string, an int, a float, and a list as its elements
print(l1[1])

7


In [32]:
l1[1] = 'apple' # reassign the second element of the list
print(l1)

['fox', 'apple', 3.1, [1, 2]]


### Traversing a List
It can be convenient to use the functions range() and len() to move through a list and update elements.  For example, consider the list of numbers below and return a new list which is the square of each number.

In [36]:
l2 = [1, 2, 3, 4]
for i in range(len(l2)):
    l2[i] = l2[i] ** 2
print(l2)    # note, l2 is the list of squares and the original list has been overwritten.

[1, 4, 9, 16]


### Task:
Use the functions range() and len() to take the following list and return a new list which is each number divided by 10.  Have each element be an integer, not a float.

In [None]:
l2 = [100, 200, 300, 400]
# add your code here

### Solution

In [142]:
l2 = [100, 200, 300, 400]
for i in range(len(l2)):
    l2[i] = l2[i]/10
    l2[i] = int(l2[i])
print(l2)    
type(l2[1])

[10, 20, 30, 40]


int

### Methods

Lists have a number methods such as append(), sort(), copy(), insert(), pop(), remove(), reverse() among others.  For example, the method pop() takes the index of the item you want to delete as an argument and returns the element removed from the list.  The method remove() takes as an argument the element you want to remove (not the index).  These methods **modify** the list.  If you might need the original values, make sure to make a copy first.

Lists also have a number of built in functions that save you from writing a loop.  Examples include len(), max(), min(), sum().


In [50]:
l3 = [47, 3, 7, 8]
l3.sort()
print(l3)


[3, 7, 8, 47]


In [51]:
l3.append(54)
print(l3)

[3, 7, 8, 47, 54]


In [52]:
max(l3)

54


### Lambda 
Using lambda (a Python keyword to generate an anonymous function) ultimately means you don't have to write (define) an entire function. Lambda functions are created, used, and immediately destroyed - so they don't clutter your code with more code that will only ever be used once.  Lambda syntax is as follows: lambda input_variable(s): tasty one liner.  The function is called right as it is created.


In [93]:
# example lambda function to divide two numbers
(lambda x, y: x / y)(10, 2)

5.0

### Sorting Lists and Using Lambda
The function sorted() returns a list of sorted elements and if we want to sort in a particular way or if we want to sort a complex list of elements (e.g., nested lists or a list of tuples) we can invoke the key argument.
The idea behind the key argument is that it should take in a set of instructions that will essentially point the 'sorted()' function at those list elements which should be used to sort by. When it says key=, what it really means is: As I iterate through the list, one element at a time, I'm going to pass the current element to the function specifed by the key argument and use that to create a transformed list which will inform me on the order of the final sorted list.

With no key specified, sorted() will return elements in ascending order.


In [94]:
mylist = [3, 6, 3, 2, 4, 8, 23]  # an example list
sorted(mylist)

[2, 3, 3, 4, 6, 8, 23]

If we instead wanted to separate out even and odd numbers in the list, we can use a key and a lambda function.  Our lambda function checks to see if a number is even (no remainder when dividing by 2).  It might seem strange that the odd numbers are returned before the even numbers, but the statement x % 2 == 0 returns either a 0 for false or a 1 for true.  You might also notice that the even numbers are not sorted in ascending order.  This is because the function sorted() only sorts once, so the those numbers remain in their original order relative to each other.

In [95]:
sorted(mylist, key = lambda x: x % 2 == 0)

[3, 3, 23, 6, 2, 4, 8]

As another example, if we want to sort a list of tuples by the second element, we can use the second element as the key.  The lambda function takes the second element of each list for the sorting.

In [108]:
mylist2 = [[3, 5, 8], [6, 2, 8], [2, 9, 4], [6, 8, 5]]
sorted(mylist2, key=lambda x: x[1])

[[6, 2, 8], [3, 5, 8], [6, 8, 5], [2, 9, 4]]

### Task
- Sort the list by length of string, where the list is ['cca', 'aaac', 'd', 'bb'].  
- Sort the list by the last letter of the string using a lambda function.

### Solution

In [105]:
mylist3 = ['aaac', 'ccb', 'd', 'ba']
sorted(mylist3, key = len)

['d', 'ba', 'ccb', 'aaac']

In [106]:
sorted(mylist3, key=lambda x: x[-1])

['ba', 'ccb', 'aaac', 'd']

### Split and Join
These methods conveniently allow us to convert a string to a list and vice versa.  (They can work on other data structures, but we focus on stings and lists here.)

In [122]:
# turn string into list of words
phrase = 'It is a capital mistake to theorize before one has data'
listofwords = phrase.split(' ') # split on the space
listofwords

['It',
 'is',
 'a',
 'capital',
 'mistake',
 'to',
 'theorize',
 'before',
 'one',
 'has',
 'data']

In [124]:
# turn list of words into a string with a delimiter like a space
list4 = ['Live', 'what', 'you', 'love']
delimiter = ' ' # put space between words
quote = delimiter.join(list4)
quote

'Live what you love'

### Task
Create a list of 50 random integers between 1 and 100.  Create a second list from the first list containing only numbers in the original list which are divisible by $3$.  Repeat the experiment 3 times.  Then calculate the average difference in length between the two lists.  Hint: first import the random library (write import random).  To make a list of 5 random numbers between 1 and 10, this code will work:

In [46]:
import random 
randomlist = [random.randint(0,10) for x in range(0,5)]
print(randomlist)

[3, 4, 9, 4, 3]


### Solution

In [45]:
import random 

numberexperiments = 3 
# make list to hold difference for each experiment
difference = []
for i in range(0, numberexperiments): 
    # one experiment
    randomlist = [random.randint(0,100) for x in range(0,50)]
    # print(randomlist)
    divisiblelist = [x for x in randomlist if x %3 == 0] # % is the modulus and returns the remainder after division
    difference.append(len(randomlist) - len(divisiblelist))

# calculate the mean
sum(difference)/len(difference)

33.333333333333336

## Dictionaries

A dictionary is similar to a list, but more general.  In a list, the index positions must be integers, but in a dictionary there is my freedom for the type of the indices.  We think of a dictionary as mapping between keys and values.  There is no instrinsic ordering.  For example, we could consider our favorite Hollywood actors (keys) and their respective ages (values).  We could also consider a literal dictionary, where keys are English words and values are the Spanish equivalent.  In this example, both keys and values are strings.  We create an empty dictionary using dict() or {} (empty curly braces}.  We use the square brackets to add items.

- the dictionary method values() returns the values in a type that can be converted to a list.
- the dictionary method fromkeys() creates a dictionary from a given sequence of keys and values.  It can take two parameters: keys and values (optional).
- the dictionary method keys() has no parameters and returns a view object that displays the keys.
- the dictionary method items() returns the key-value pairs of the dictionary as tuples in a list.
- the in operator works on dictionaries.

Dictionaries can be quite useful as a set of counters.  For example, if you want to count the number of times each word appears in a text, you could create a dictionary and the first time you see a word, you could add it to the dictionary with the corresponding value of one.  If you see the word again, you will increment the value.  An advantage of this implementation is that we don't have to know ahead of time what words we will see.

In [89]:
fruitdict = {'a': 'apple', 'b': 'banana', 'c': 'cantaloupe'}
print(fruitdict.keys())

dict_keys(['a', 'b', 'c'])


In [92]:
print(fruitdict.items())

dict_items([('a', 'apple'), ('b', 'banana'), ('c', 'cantaloupe')])


In [159]:
list(fruitdict.values())

['apple', 'banana', 'cantaloupe']

In [162]:
'b' in fruitdict

True

In [163]:
'banana' in fruitdict # this is checking to see if banana is a key, not a value!

False

In [164]:
'banana' in list(fruitdict.values()) # this checks to see if banana is a value

True

### Sorted Dictionaries 
We can also use a key to sort a dictionary and use a lambda function to sort on the value instead of the key.  The item() function retrieves a dictionary's keys and values.

In [187]:
actor = {'Keanu': 58, 'Hugh' : 54, 'Jason': 43, 'Mark' : 51}
sorted_by_name = sorted(actor) # sort on name, this returns a list
sorted_by_age = sorted(actor.items(), key = lambda x: x[1]) # sort on age -- tell sorted with lambda function the key is the second
# this returns a list of tuples

list

## Tuples

A tuples is a sequence of values.  The values can be any type and they are indexed by integers.  The important difference between tuples and lists is that tuples are **immutable** whereas lists, as we have seen, are **mutable**.  Tuples are comparable, so we can sort lists of tuples and use them as key values in Python dictionaries.  Tuples are created with parentheses and then listing values inside, separated with a comma.  An empty tuple is created with ().  Elements are accessed with square brackets, just like for lists.

In [165]:
tup1 = ('fox', 7, 3.1, [1,2])

In [166]:
tup1[0]

'fox'

In [167]:
# we can't reassign values in a tuple because it is immutable
tup1[0] = 'coyote'

TypeError: 'tuple' object does not support item assignment

The comparison operator with tuples works by comparing the first element of each.  If they are equal, it goes on to the next element and continues until it finds an element that differs.  Once it finds an element that differs, it does not continue.  Consider the comparison below-- Python looks at the first element of each tuple, and since $0<1$ is true, it's done and never considers the next elements.

In [169]:
(0, 7, 2000) < (1, 0, 0)

True

### Sorting Tuples and DSU
The comparability of tuples can be used for sorting tasks.  This is part of a pattern termed DSU which stands for decorate-sort-undecorate.  The idea is to "decorate" a sequence by creating a list of tuples with a sort key before the element in the sequence (for example, if the sequence is a list of words, we would "decorate" this by creating a tuple (word length, word) for each.  We then sort the tuples.  We then "undecorate" by extracting the sorted elements of the sequence.

In [174]:
quote = 'The future belongs to those who believe in the beauty of their dreams'
words = quote.split() # split Eleanor Roosevelt's quote into a list of words
t = list() # create an empty list
for word in words:
    t.append((len(word), word)) # for each word, store a tuple with the word length and the word in the list t

t.sort(reverse = True) # sort the list with the longest word first

result = list() # create an empty list and populate it with words in order of length
for length, word in t:
    result.append(word)
    
print(result)    
    

['belongs', 'believe', 'future', 'dreams', 'beauty', 'those', 'their', 'who', 'the', 'The', 'to', 'of', 'in']


A unique syntactic feature of Python is that we can have tuples on the **left** side of an assignment.  This allows us to assign more than one variable at a time.  This also gives us a shortcut to swap values of variables.

In [176]:
tup2 = ('apple', 'banana')
x, y = tup2
print(x)

apple


In [177]:
y, x = x, y
print(x)

banana


### Sorting Dictionaries Revisited

We mentioned above that dictionaries have a method called items() which returns a list of tuples.  Each tuple is of the form (key, value).  Dictionaries have no ordering, so we do not expect this list to be in order.  However, we can sort this list, and this gives us a way to sort the contents of a dictionary by key.  Compare this to the sorting above.  This works for sorting by the key, but not the value.

In [178]:
actor = {'Keanu': 58, 'Hugh' : 54, 'Jason': 43, 'Mark' : 51}
t = list(actor.items()) # returns a list of tuples of key-value pairs
t

[('Keanu', 58), ('Hugh', 54), ('Jason', 43), ('Mark', 51)]

In [180]:
t.sort()
t

[('Hugh', 54), ('Jason', 43), ('Keanu', 58), ('Mark', 51)]

## Sets

A set is, mathematically speaking, is a collection of different things.  Each element is listed only once.  This can be useful in coding if we want to eliminate of duplicate entries in a list.

In [181]:
fruitlist = ('apple', 'banana', 'apple', 'pear')

In [183]:
fruitlist_unique = list(set(fruitlist)) # this turns a list to a set to eliminate duplicates and then turns that set back to a list.  
fruitlist_unique

['apple', 'banana', 'pear']

## Reading files

When you open a file, you are asking the operating system to find the file by its name and make sure it exists. If the open is successful (if the file exists and you have the proper permissions to read the file), the operating system returns a file handle.  This is not the actual data in the file, but a handle that can be used to read the data. Python has a built in function open().
If the file is relatively small compared to the size of your main memory, you can read the whole file into one string using the read method on the file handle.
In your culminating task below, you will be working with a .txt file and reading its contents into a string.  This code opens the file (which needs to be in the same directory as your Jupyter notebook or .py file), reads it and saves its contents to the string called multiline, and then closes the file.

We will learn more about reading files later in this course.

In [188]:
fhand = open('PrideandPredjudice.txt', 'r') # file handle, open in read mode
multiline = fhand.read()
fhand.close()


### Task (Main Course): 
In this task, we will analyze a multiline string and generate a unique word count.  We will use the first chapter of Jane Austen's book Pride and Predjudice.  We provide a preprocessed .txt file of Chapter 1 for your use in the course repository.  This task relies on using
- file reading
- characters, strings, lists, and dictionaries as well as their methods
- sets to create a list of unique items
- sorting on a key
- lambda functions


The steps are
- get the multiline text and save it to a Python variable called multiline 
- eliminate all new lines and all special characters using string methods (e.g., replace())
- find all unique words and their occurences in the string (you need to split the string into words and then create a list that contains only the unique words, then count the number of times the unique word appeared in the list and store the key and value in a dictionary.  Recall that set is useful for generating lists of unique items.)


Specific questions to answer are:
- How many words are in Chapter 1? (You can count "Chapter 1' in your word count).
- How many **unique** words are in Chapter 1?
- What are the 25 most used words?
- What word has the highest frequency and what is its frequency?
- What if words are not case sensitive?  Repeat the exercise with that assumption.  (Hint: strings have methods like upper() and lower() that convert cases.)


In [53]:
pwd

'/Users/kamilalarripa'

In [54]:
%cd '/Users/kamilalarripa/Desktop' # navigate to where the txt file is located

/Users/kamilalarripa/Desktop


In [81]:
fhand = open('PrideandPredjudice.txt', 'r') # file handle, open in read mode
multiline = fhand.read()
fhand.close()
multiline


'It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife.\n\n\n\n\nHowever little known the feelings or views of such a man may be on his first entering a neighbourhood, this truth is so well fixed in the minds of the surrounding families, that he is considered the rightful property of some one or other of their daughters.\n\n\n\n\n"My dear Mr. Bennet," said his lady to him one day, "have you heard that Netherfield Park is let at last?"\n\n\n\n\nMr. Bennet replied that he had not.\n\n\n\n\n"But it is," returned she; "for Mrs. Long has just been here, and she told me all about it."\n\n\n\n\nMr. Bennet made no answer.\n\n\n\n\n"Do you not want to know who has taken it?" cried his wife impatiently.\n\n\n\n\n"You want to tell me, and I have no objection to hearing it."\n\n\n\n\nThis was invitation enough.\n\n\n\n\n"Why, my dear, you must know, Mrs. Long says that Netherfield is taken by a young man of large fortune from the north

In [82]:
multiline = multiline.replace('\n', "") # eliminate new lines

In [76]:
multiline

'It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife.However little known the feelings or views of such a man may be on his first entering a neighbourhood, this truth is so well fixed in the minds of the surrounding families, that he is considered the rightful property of some one or other of their daughters."My dear Mr. Bennet," said his lady to him one day, "have you heard that Netherfield Park is let at last?"Mr. Bennet replied that he had not."But it is," returned she; "for Mrs. Long has just been here, and she told me all about it."Mr. Bennet made no answer."Do you not want to know who has taken it?" cried his wife impatiently."You want to tell me, and I have no objection to hearing it."This was invitation enough."Why, my dear, you must know, Mrs. Long says that Netherfield is taken by a young man of large fortune from the north of England; that he came down on Monday in a chaise and four to see the place, and was so

In [83]:
# remove special characters and punctuation
cleaned_multiline = "" # define empty string
for char in multiline:
    # keep spaces and numbers and replace all other charcaters with a space
    if char == " ":
        cleaned_multiline += char # += adds a value to existing variable and assigns new value back to same variable
    elif char.isalnum():  # using the isalnum() method of strings.
        cleaned_multiline += char
    else:
        cleaned_multiline += " "
cleaned_multiline

'It is a truth universally acknowledged  that a single man in possession of a good fortune  must be in want of a wife However little known the feelings or views of such a man may be on his first entering a neighbourhood  this truth is so well fixed in the minds of the surrounding families  that he is considered the rightful property of some one or other of their daughters  My dear Mr  Bennet   said his lady to him one day   have you heard that Netherfield Park is let at last  Mr  Bennet replied that he had not  But it is   returned she   for Mrs  Long has just been here  and she told me all about it  Mr  Bennet made no answer  Do you not want to know who has taken it   cried his wife impatiently  You want to tell me  and I have no objection to hearing it  This was invitation enough  Why  my dear  you must know  Mrs  Long says that Netherfield is taken by a young man of large fortune from the north of England  that he came down on Monday in a chaise and four to see the place  and was so

In [86]:
# generate list of words by splitting string
list_of_words = cleaned_multiline.split()
len(list_of_words)

852

In [87]:
# Use set to get unique words
unique_words_as_list = list(set(list_of_words))
len(unique_words_as_list) # number of unique words

340

In [90]:
# Create a dictionary with unique words as keys
unique_words_as_dict = dict.fromkeys(list_of_words)
len(list(unique_words_as_dict.keys())) # the number of unique words
# populate number of occurences of each word in next step

340

In [91]:
# populate values in dictionary by looping through words
for word in list_of_words:
    if unique_words_as_dict[word] is None:
        unique_words_as_dict[word] = 1
    else:
        unique_words_as_dict[word] += 1
unique_words_as_dict

{'It': 3,
 'is': 12,
 'a': 20,
 'truth': 2,
 'universally': 1,
 'acknowledged': 1,
 'that': 15,
 'single': 3,
 'man': 4,
 'in': 11,
 'possession': 2,
 'of': 29,
 'good': 3,
 'fortune': 3,
 'must': 7,
 'be': 11,
 'want': 3,
 'wife': 4,
 'However': 1,
 'little': 3,
 'known': 1,
 'the': 17,
 'feelings': 1,
 'or': 5,
 'views': 1,
 'such': 5,
 'may': 5,
 'on': 3,
 'his': 11,
 'first': 1,
 'entering': 1,
 'neighbourhood': 3,
 'this': 1,
 'so': 8,
 'well': 1,
 'fixed': 1,
 'minds': 1,
 'surrounding': 1,
 'families': 1,
 'he': 11,
 'considered': 1,
 'rightful': 1,
 'property': 1,
 'some': 2,
 'one': 5,
 'other': 2,
 'their': 1,
 'daughters': 4,
 'My': 3,
 'dear': 8,
 'Mr': 10,
 'Bennet': 6,
 'said': 1,
 'lady': 1,
 'to': 22,
 'him': 4,
 'day': 1,
 'have': 7,
 'you': 24,
 'heard': 2,
 'Netherfield': 2,
 'Park': 1,
 'let': 1,
 'at': 2,
 'last': 2,
 'replied': 3,
 'had': 3,
 'not': 9,
 'But': 6,
 'it': 11,
 'returned': 1,
 'she': 6,
 'for': 12,
 'Mrs': 2,
 'Long': 2,
 'has': 5,
 'just': 1,
 'been

In [113]:
# sort the dictionary based on descending frequency of count
top_words = sorted(unique_words_as_dict.items(), key=lambda x: x[1], reverse=True) # sort words based on frequency as key value, and return with highest frequency first

In [114]:
# return top 25 words
top_words[:25] 

[('of', 29),
 ('you', 24),
 ('to', 22),
 ('a', 20),
 ('the', 17),
 ('and', 17),
 ('I', 17),
 ('that', 15),
 ('is', 12),
 ('for', 12),
 ('in', 11),
 ('be', 11),
 ('his', 11),
 ('he', 11),
 ('it', 11),
 ('them', 11),
 ('Mr', 10),
 ('my', 10),
 ('not', 9),
 ('will', 9),
 ('so', 8),
 ('dear', 8),
 ('was', 8),
 ('are', 8),
 ('must', 7)]

### References
- The Gutenberg Project.  https://www.gutenberg.org/files/1342/old/pandp12p.pdf
- Python for Everybody: Exploring Data in Python 3 by Charles Severance
- Data Wrangling with Python by Tirthajyoti Sarkar and Shubhadeep Roychowdhury (the word counting exercise came from this text).
