Part of learning the “Art of Python” or “Thinking Pythonically” is realizing that
Python often has built-in capabilities for many common data analysis problems.
Over time, you will see enough example code and read enough of the documentation
to know where to look to see if someone has already written something that makes
your job much easier.


A <b>dictionary</b> is an unordered associative array that is a mapping between a set of indices (which are
called keys) and a set of values. Each key maps to a value.  In a list, the index positions have to
be integers but in a dictionary, the indices can be (almost) any type.

The association of a key and a value is called a key-value pair or sometimes an item.
As an example, we’ll build a dictionary that maps from English to Spanish

In [2]:
#use a constructor to instantiate a dictionary object
shares = dict()

shares["GOOG"] = (100, 490.10)
mystocks = dict(shares)
print(mystocks)

print(mystocks["GOOG"])

numShares, price = shares["GOOG"]
print(f"number of shares: {numShares} at price %.2f" %price)

shares["IBM"] = (20, 91.50)
mystocks["IBM"] = shares["IBM"]  #add it to stocks I own

for s in mystocks:
    print (mystocks[s])
    
portfolio = list(mystocks)  #dictionary keys as a list
print(f"my portfolio: {portfolio}")

del mystocks["IBM"]
print(f"my portfolio: {mystocks}")

{'GOOG': (100, 490.1)}
(100, 490.1)
number of shares: 100 at price 490.10
(100, 490.1)
(20, 91.5)
my portfolio: ['GOOG', 'IBM']
my portfolio: {'GOOG': (100, 490.1)}


In [5]:
word = 'brontosaurus'
#key is the letter, value is count
d = dict()
for c in word:
    if c not in d:
        d[c] = 1
    else:
        d[c] = d[c] + 1
    print(d)
print(d)

{'b': 1}
{'b': 1, 'r': 1}
{'b': 1, 'r': 1, 'o': 1}
{'b': 1, 'r': 1, 'o': 1, 'n': 1}
{'b': 1, 'r': 1, 'o': 1, 'n': 1, 't': 1}
{'b': 1, 'r': 1, 'o': 2, 'n': 1, 't': 1}
{'b': 1, 'r': 1, 'o': 2, 'n': 1, 't': 1, 's': 1}
{'b': 1, 'r': 1, 'o': 2, 'n': 1, 't': 1, 's': 1, 'a': 1}
{'b': 1, 'r': 1, 'o': 2, 'n': 1, 't': 1, 's': 1, 'a': 1, 'u': 1}
{'b': 1, 'r': 2, 'o': 2, 'n': 1, 't': 1, 's': 1, 'a': 1, 'u': 1}
{'b': 1, 'r': 2, 'o': 2, 'n': 1, 't': 1, 's': 1, 'a': 1, 'u': 2}
{'b': 1, 'r': 2, 'o': 2, 'n': 1, 't': 1, 's': 2, 'a': 1, 'u': 2}
{'b': 1, 'r': 2, 'o': 2, 'n': 1, 't': 1, 's': 2, 'a': 1, 'u': 2}


## Dictionaries with Lists and Tuples

Dictionaries have a method called <b>items</b> that returns a list of tuples, where each
tuple is a key-value pair

As you should expect from a dictionary, the items are in no particular order.
However, since the list of tuples is a list, and tuples are comparable, we can now
sort the list of tuples. Converting a dictionary to a list of tuples is a way for us to
output the contents of a dictionary sorted by key.

In [3]:
# dictionary value as a list data type
d = dict()
d["f"] = ["a","b"]
d["a"] = [1,2,3,4]
print(d)

l = sorted(d)
print(type(l))
print(l)

{'f': ['a', 'b'], 'a': [1, 2, 3, 4]}
<class 'list'>
['a', 'f']


Write a program that parses the romeo.txt file and generates an index of each word found in the file.

In [7]:
import string

def indexGenerator(fhandle):
    d = dict()
    lineNum = 1
    for line in fhandle:
        
        # scrub the line
        line = line.strip()
        line = line.translate(line.maketrans("","",string.punctuation))
        
        words = line.split()
        for word in words:
            if word in d:
                d[word].append(lineNum)
            else:
                d[word] = [lineNum]
        lineNum += 1
    return d

# create raw index dictionary
fhand = open("romeo.txt")
indx = indexGenerator(fhand)

# sort in key order
l = list(indx.keys())
l.sort()
for key in l:
    print(key, indx[key])
fhand.close()

        

{'first': ['a', 'b']}
{'first': ['a', 'b', 'c']}
Arise [3]
But [1]
It [2]
Juliet [2]
Who [4]
already [4]
and [2, 3, 4]
breaks [1]
east [2]
envious [3]
fair [3]
grief [4]
is [2, 2, 4]
kill [3]
light [1]
moon [3]
pale [4]
sick [4]
soft [1]
sun [2, 3]
the [2, 2, 3]
through [1]
what [1]
window [1]
with [4]
yonder [1]


In [2]:
d = {'c':10, 'b':1, 'a':22}
t = list(d.items())
print(t)

t.sort()
print(t)

[('c', 10), ('b', 1), ('a', 22)]
[('a', 22), ('b', 1), ('c', 10)]


## Multiple assignment with dictionaries

Combining items, tuple assignment, and for, you can see a nice code pattern for
traversing the keys and values of a dictionary in a single loop

In [3]:
d = {'c':10, 'b':1, 'a':22}
for key, val in list(d.items()):
    print(val, key)

10 c
1 b
22 a


## The most common words

We can augment our program to use this technique to print the ten most
common words in a text file.

In [6]:
import string
fhand = open('romeo-full.txt')
counts = dict()
for line in fhand:
    line = line.translate(str.maketrans('', '', string.punctuation))
    line = line.lower()
    words = line.split()
    for word in words:
        if word not in counts:
            counts[word] = 1
        else:
            counts[word] += 1
        
# Sort the dictionary by value
lst = list()
for key, val in list(counts.items()):
    lst.append((val, key))
lst.sort(reverse=True)

for key, val in lst[:10]:
    print(key, val)

61 i
42 and
40 romeo
34 to
34 the
32 thou
32 juliet
30 that
29 my
24 thee


## Sequences: strings, lists, and tuples

Mutable objects support additional operations that allow in-place modification of the object. Strings and tuples are immutable sequence types: such objects cannot be modified once created. The following operations are defined on mutable sequence types (where x is an arbitrary object):

https://docs.python.org/2/library/stdtypes.html#mutable-sequence-types

In many contexts, the different kinds of sequences (strings, lists, and tuples) can
be used interchangeably. So how and why do you choose one over the others?
To start with the obvious, strings are more limited than other sequences because
the elements have to be characters. They are also immutable. If you need the
ability to change the characters in a string (as opposed to creating a new string),
you might want to use a list of characters instead.
Lists are more common than tuples, mostly because they are mutable. But there
are a few cases where you might prefer tuples:

1. In some contexts, like a return statement, it is syntactically simpler to create
a tuple than a list. In other contexts, you might prefer a list.
2. If you want to use a sequence as a dictionary key, you have to use an immutable
type like a tuple or string.
3. If you are passing a sequence as an argument to a function, using tuples
reduces the potential for unexpected behavior due to aliasing.

Because tuples are immutable, they don’t provide methods like sort and reverse,
which modify existing lists. However Python provides the built-in functions sorted
and reversed, which take any sequence as a parameter and return a new sequence
with the same elements in a different order.

https://docs.python.org/3/howto/sorting.html

In [1]:
import string
fhand = open('romeo-full.txt')
counts = dict()
for line in fhand:
    line = line.translate(str.maketrans('', '', string.punctuation))
    line = line.lower()
    words = line.split()
    for word in words:
        if word not in counts:
            counts[word] = 1
        else:
            counts[word] += 1

# Sort the dictionary by value
lst = list()
for key, val in list(counts.items()):
    lst.append((val, key))

lst.sort(reverse=True)

for key, val in lst[:10]:
    print(key, val)


61 i
42 and
40 romeo
34 to
34 the
32 thou
32 juliet
30 that
29 my
24 thee
