# Section 2.1.9 Dictionaries

### 1. What is a dictionary?

Hold onto your hats, because things are about to get confusing.

A **string** is a sequence of characters that is *immutable*. You can reference a character in a **string** using an *index number*. Round brackets -- () -- are used with **strings**. The order in which the characters appear in a **string** is important.

A **list** is a sequence of elements that is *mutable*. You can reference an element in a **list** using an *index number*. Square brackets -- [] -- are used with **lists**. The order in which the elements appear in a **list** is important.

A **dictionary** is a set of values that is *mutable*. You can reference a value in a **dictionary** by using a *key*. A *key* can be more than just a number. Curly brackets -- {} -- are used with **dictionaries**. The order in which the *keys* and *values* appear in a **dictionary** is unimportant.

In essence, a **dictionary** is a set of *key-value* pairs. The *key* is mapped to the *value*. You find a *value* in a **dictionary** by using the *key*.

#### Creating a New Dictionary

The **dict()** built-in function can be used to create a new **dictionary**.

In [7]:
# Method 1 to create a new, empty dictionary.

dictionary = dict()
print(dictionary)

# Method 2 to create a new, empty dictionary.

dictionary2 = {}
print(dictionary2)

{}
{}


#### Adding New Keys and Values to a Dictionary

You can add *keys* and *values* to a **dictionary** one at a time:

In [5]:
# Takes the dictionary named "dictionary" and adds the key "pork" and assigned the value
# "bacon" to it. Then prints the ENTIRE dictionary.

dictionary["pork"] = "bacon"
print(dictionary)

{'larry': 'bacon', 'pork': 'bacon'}


In [6]:
# Takes the dictionary named "dictionary" and adds the key "cat" which points to the 
# list (or value) which includes Pippin, Jasper ... Then prints the ENTIRE dictionary.

dictionary["cat"] = ["Pippin", "Jasper", "Ben", "Riley", "Jasmine"]
print(dictionary)

{'larry': 'bacon', 'pork': 'bacon', 'cat': ['Pippin', 'Jasper', 'Ben', 'Riley', 'Jasmine']}


In [9]:
# Create a new dictionary AND add keys:values at the same time.

dictionary3 = {1: "dog", 2:"bird", 3:"fish"}
print(dictionary3)

{1: 'dog', 2: 'bird', 3: 'fish'}


In [15]:
# Add a dictionary entry with integer and string combinations. Then prints the ENTIRE dictionary.

dictionary2[123] = ["fleece"]
dictionary2["major"] = 658
dictionary2[435] = 875
print(dictionary2)

print(dictionary2[123][0].upper())

{123: ['fleece'], 'major': 658, 435: 875}
FLEECE


Or, you can add *keys* and *values* to a **dictionary** in groups.

In [16]:
# Add more than one key:value pairs to the dictionary at the same time.

dictionary2 = {234:"bob", 83746:4748749, "susan":"sheep"}
print(dictionary2)

{234: 'bob', 83746: 4748749, 'susan': 'sheep'}


Technically, the order in which the *key-value* pairs appear in a **dictionary** is random and unpredictable, so it's just a coincidence they appeared in the same order as we entered them in the above examples. Do not be surprised if you experience something different.

### 2. Searching a Dictionary

If you want to find a specific *value* in a **dictionary** and you know its *key*, you can use a similar method to finding values in **strings** and **lists**.

In [17]:
print(dictionary2[234])

bob


If the *key* you specified isn't in the **dictionary**, you'll get an error message.

If you don't know the *key*, you can search for it using the **in** operator.

In [18]:
"pine" in dictionary2

False

In [19]:
234 in dictionary2

True

You can also use the **len()** function to determine how many *key-value* pairs exist in a **dictionary**. However, if you think about it, the result may not actually be that useful.

Unfortunately, there's no way to search a **dictionary** for a specific *value*. Instead, you have to get creative by using the **value()** function to export the *values* to a **list**. Then you can search the **list**.

In [22]:
values = list()
values = dictionary2.values()
print(values)
type(values)

dict_values(['bob', 4748749, 'sheep'])


dict_values

In [21]:
"bob" in values

True

**Fun Fact:** 

*The in operator works differently for a dictionary than it does for a list. When the in operator is looking through a list for the value you specified, it searches each entry one-by-one. If your list is really big, it may take a long time to find the value and may even slow down your computer.*

*The in operator uses hash tables when it's searching through a dictionary. Hash tables allow the computer to search for any value within a dictionary at the same speed. Hash tables are really interesting, very valuable, and extremely confusing! If you want to learn more about hash tables, checkout the Wikipedia article --> https://en.wikipedia.org/wiki/Hash_table.*

### 3. Dictionaries as Counters

One common use of **dictionaries** is to use one to count the number of instances a specific word appears in a specific piece of text. This can be done with a *for loop*. We'll use the same text file from the Files section in this example.

In [23]:
text_file = open("SenseAndSensibility.txt", 'r')
word_count = {}

for line in text_file:
    line = line.split()
    for word in line:
        if word not in word_count:
            word_count[word] = 1
        else:
            word_count[word] = word_count[word] + 1
            
print(word_count)



#### The Get Method

While we can simply use the **print()** function to display the count (i.e., value) of each word (i.e., key), we can also include the **get()** method.

You can leave the second argument in the **get()** method blank, and it will return None if the *key* isn't found. Or you can specify something for it to return if the *key* doesn't exist.

Now we can modify our word count script to use the **get()** method instead. Because we can tell the **get()** function to return a 0, which would NOT increase the counter if added to it, we can eliminate the entire *if statement*.

**Note:** *In the above example, we used something called a *nested for loop*. The *outer loop* iterated through each line in the text file, while the *inner loop* iterated through each word in the line.*

### 4. Looping through Dictionaries

Python automatically knows what to do if you put a **dictionary** in a *for loop*. 

The variable key represents each of the *keys* in the **dictionary**. That key can be printed with its associated *value* as the loop iterates. The result looks a lot better than simply printing the dictionary as we did previously.

In our word count example, we know that each *key* is a **string** which represents a word. And that each *value* is an **integer** which is the count of the number of times that word appears.

Because the *value* is an **integer**, we can apply all sorts of integer-related operators to it.

We can also get creative and use a list to put the *keys* in alphabetical order.

### 5. Advanced Text Parsing

What have you noticed about our word count list so far?

1. It includes punctuation as part of certain words.
2. It considers an upper and lower case letter as different.

**Pop Quiz:** *What can we do to eliminate this problem?*

Remember, when we look at lines and words, they're **strings**. That means we can apply all sorts of *string methods* to these lines and words.

We can remove all punctuation using the **translate()** function to 'translate' all punctuation marks into nothing. Luckily, Python has a built-in list of punctuation marks as part of the *string methods module*.

In [24]:
import string
string.punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

Then, we can change all upper case letters to lower case by using the **lower()** function.

In [29]:
text_file = open("SenseAndSensibility.txt", 'r')
word_count2 = {}

for line in text_file:
    line = line.translate(line.maketrans('','',string.punctuation)).lower().split()
    #line = line.lower()
    #line = line.split()
    for word in line:
        if word not in word_count2:
            word_count2[word] = 1
        else:
            word_count2[word] = word_count2[word] + 1

print(word_count2)



In [30]:
text_file = open("SenseAndSensibility.txt", 'r')
word_count2 = {}

for line in text_file:
    line = line.translate(line.maketrans('','',string.punctuation)).lower().split()
    for word in line:
        word_count2[word] = word_count2.get(word,0) + 1

print(word_count2)



If you're like me, you're probably wondering where the heck the **maketrans()** function came from. It's actually a built-in table that's part of the *string method module*.

As per the Python Standard Library, instructions for using the **translate()** function are as follows:

*str.translate(table)*
Return a copy of the string in which each character has been mapped through the given translation table. The table must be an object that implements indexing via __getitem__(), typically a mapping or sequence. When indexed by a Unicode ordinal (an integer), the table object can do any of the following: return a Unicode ordinal or a string, to map the character to one or more other characters; return None, to delete the character from the return string; or raise a LookupError exception, to map the character to itself.

You can use **str.maketrans()** to create a translation map from character-to-character mappings in different formats.

And the instructions for using the **maketrans()** function is as follows:

*static str.maketrans(x[, y[, z]])*
This static method returns a translation table usable for str.translate().

If there is only one argument, it must be a dictionary mapping Unicode ordinals (integers) or characters (strings of length 1) to Unicode ordinals, strings (of arbitrary lengths) or None. Character keys will then be converted to ordinals.

If there are two arguments, they must be strings of equal length, and in the resulting dictionary, each character in x will be mapped to the character at the same position in y. If there is a third argument, it must be a string, whose characters will be mapped to None in the result.

What does this all mean in plain English?

It means, just follow the example and you'll be fine!

### 6. Tips for Datasets

As you can see, there is a lot to think about when it comes to parsing and analyzing datasets. Here are a few tips that you might find useful in the future:

1. **Reduce the Data** -- Make your dataset as small as possible. Or parse and analyze it in chunks, rather than all at once. It is much easier to find an error in a small amount of data than in a huge amount of data.

2. **Cross-check Results** -- It's always useful to cross-check your results with what you *think* is suppose to be happening. For example, instead of printing an entire dataset, look at summaries. Or, if you're getting errors, check that the various values are of the correct types.

3. **Write Self-checks** -- A common method used by professional programmers is to write self-checks or sanity-checks into the code. These checks essentially make sure there's nothing funny going on. For example, if you know there are x number of words in a dataset, yet the sum of the values in your word count dictionary is greater than that number, there might be a problem.

4. **Make the Output Pretty** -- In other words, don't create error messages that make no sense, or are too vague. I think we've all experienced those types of error messages when using commercial software and know how annoying it is. If you're writing the code, make sure your error message are detailed enough to help you find and solve the problem.

### Now, let's re-explore files and dictionaries using regular expressions.