# Hash table

Previously we learned about sequence objects such as: *string*, *list*, *tuple*. Now we are going to learn about a new object which implements the notion of **hash table**.

Why do we need this new object? 

Sequences are a great tool but they have one big limitation. The execution time to find one specific value inside is linear:

In [None]:
'x' in range(100)

In [None]:
%timeit -n100 'x' in range(100)

In [None]:
%timeit -n100 'x' in range(100_000)

In [None]:
%timeit -n100 'x' in range(1_000_000)

This is a real problem because the membership test is a very useful and common procedure. So we would like to have something which is not dependent of the number of elements.

## Another limitation

We would like, for example to associate a key to an element like here the name alice with the age 35. We cannot do that with a list or any other sequence object. 

We will get an error if we try:

In [None]:
t = []
t['alice'] = 35

The hash table structure is the answer to these two limitations and in python it is implemented by the 'dictionary' object.

# Dictionary

## Creation

<p>Lists are created by using square brackets **[ ]**</p>
<p>Dictionaries are created by using curly brackets **{ }**</p>

In [None]:
d = {} # en empty dictionary

The simplest way to create a dictionary with some value is:

In [None]:
d = {'keyname': 'keyvalue'}
print(d)

Following the previous example, we can create a hash table, a python dictionary, using the name of the person as *key* and the age as *value*:

In [None]:
d = {'alice': 35, 'bob': 18}

In [None]:
print(d)

**Note: To go deeper**

A dictionary object can also be created with the function *dict* similar to the *list* function used to create a list.

In [None]:
d2 = dict([('alice', 35), ('jane', 24), ('bob',18)])
print(d2)

or:

In [None]:
d3 = dict(bob=18, alice=35, jane=24)
print(d3)

## Accessing elements

To access an element of the dictionary:

In [None]:
key = 'alice'
value = d['alice']

print('The name of the person is used as key:', key)
print('The value associated to that key is:', d[key])

## Adding an element

Adding an element to a dictionary is done by creating a new *key* and affecting a value to it.

In [None]:
print(d)

In [None]:
d['jane'] = 24

In [None]:
print(d)

It is also possible to add element using the method **update**

In [None]:
d2 = {'tom': 54, 'david': 87}

d.update(d2)
print(d)

**note:**

It is not possible to use the operator *+* to concatenate dictionaries. 

In [None]:
{'alice': 35} + {'bob': 18}

Key have to be unique; you cannot have two keys with the same name. If you try to add a key with a name already used you will overwrite the value of the previous one.

In [None]:
print(d)
d['alice'] = 12
print(d)

## Equality between dictionaries

To be equal, all the elements which compose the first dictionay must be present in the second, and only those elements. 

The **position** (ordering) is not important. 

In [None]:
d1 = {'alice': 12, 'bob': 18, 'jane': 24, 'tom': 54, 'david': 87}
d2 = {'tom': 54, 'david': 87}
d3 = {'bob': 18, 'alice': 35, 'jane': 24}
d4 = {'alice': 35, 'bob': 18, 'jane': 24}


print('dictionary 1:', d1)
print('dictionary 2:', d2)
print('dictionary 3:', d3)
print('dictionary 4:', d4)

print()
print('Dictionary 1 and dictionary 2 are not equal:', d1 == d2)
print('Dictionary 1 and dictionary 3 are not equal:', d1 == d3)
print('Dictionary 3 and dictionary 4 are equal:', d3 == d4)

## Useful dictionary methods

Dictionaries have their own methods. Two of the most useful are *keys* and *values* which, as their name suggest, extract all the keys and all the values in an iterator.

In [None]:
d.keys()

In [None]:
d.values()

## Dictionaries are iterable (it is possible to use loops with them)

It is possible to perform some operations on a dictionary by iterating on the elements using the *keys* method:

In [None]:
for key in d.keys():
    print(key)

**note: To go depper**

You can iterate on a dictionary directly. This is equivalent to asking for the keys. It saves some typing but at the cost of readability.

In [None]:
for k in d:
    print(k)

In [None]:
keys = list(d3.keys())
print('Keys from dictionary 3:', keys)

# We can sort them.
keys.sort()
print(keys)

## Presence (or not) of an element inside a dictionary

It is possible to test if a *key* is present in the dictionary (or not) using the keyword **in**

In [None]:
'alice' in d

In [None]:
'mark' in d

or not:

In [None]:
'mark' not in d

**Warning:**

You cannot test for the presence of values:

In [None]:
12 in d

But you can use the method *values* which will return a list of the values:

In [None]:
12 in d.values()

## Composite dictionary

It is possible to have composite objects in a dictionary. A key has to be a simple object (string, integer, float). It cannot be a complex one like a numpy array but the value can be any valid python object:

In [None]:
import numpy as np

d = {'key1': 1}
d['key2'] = 2
d['key3'] = {'subdic': 3}
d['key4'] = np.arange(10)

for key in d:
    print(key, ':', d[key])
    
print()
print('A composite dictionary:', d)
    

## dictionary and string formatting

Strings and dictionarys are closely linked and using dictionaries increases the capacity of the print function.

In [None]:
d = {'a': 1, 'b':2, 'c': 3}

print('1st:', d['a'], '2nd:', d['b'], '3rd:', d['c'])

print('1st:', d['a'], ', 2nd:', d['b'], ', 3rd:', d['c'])

print('1st: {}, 2nd: {}, 3rd: {}'.format(d['a'], d['b'],d['c']))

print('1st: {0}, 2nd: {2}, 3rd: {1}'.format(d['a'], d['c'], d['b']))

print('1st: {var[a]}, 2nd: {var[b]}, 3rd: {var[c]}'.format(var=d))

print('1st: {first}, 2nd: {second}, 3rd: {third}'.format(first=1, second=2, third=3))


The format method allows you to manipulate your string in a very powerful way. You can find more documentation [here](https://pyformat.info/).

When you want to save information in a file, this is also how you will format your output:

In [None]:
f = open('myfile.txt', 'w')

line1 = '1st: {var[a]}, 2nd: {var[b]}, 3rd: {var[c]}\n'.format(var=d)
line2 = '1st: {first}, 2nd: {second}, 3rd: {third}\n'.format(first=1, second=2, third=3)
line3 = '1st: {var[a]:04d}, 2nd: {var[b]:3.4f}, 3rd: {var[c]}\n'.format(var=d)

f.write(line1)
f.write(line2)
f.write(line3)
f.close()

In [None]:
%more myfile.txt

<div style='background:#B1E0A8; padding:10px 10px 10px 10px;'>
<H2> Challenges </H2>
<li>
    Read the file *agelist.txt* and copy the data into a dictionary similar to the one above.
    <br>
    Hint: Splice the data to avoid using the first line. 
</li>
</div>

In [None]:
with open('data/agelist.txt') as f:
    data = f.readlines()

print(data)

In [None]:
data[1].strip()

In [None]:
data[1].strip().split()

In [None]:
with open('data/agelist.txt') as f:
    data = f.readlines()

data = data[1:]  # Remove the header
d = {}

for i, line in enumerate(data):
    line = line.strip().split()
    d[line[0]] = int(line[1])
print(d)

<div style='background:#B1E0A8; padding:10px 10px 10px 10px;'>
<H2> Challenges </H2>
<li>
    Create a list containing the *inflammations* csv file-names that are present in the *data* directory as items.
    <br>
    Hint: Use the "glob" library.
</li>
<br>
<li>
Create a dictionary which has the file names (from the list created above) as keys and the data within the files as values.
</li>
Hint: You can use the function 'loadtxt' provided by the library 'numpy'
</div>

### Solution Challenge 1

### Solution Challenge 2