### [PYTHON-DATA-STRUCTURES](https://docs.python.org/3/tutorial/datastructures.html)

## Maps:

The defining characteristic of a map is it's Key-Value structure. You can look-up a key in order to get what's stored as it's value. Maps are also called dictionaries for a similar reason.
* A `map` is a set-based data structure kind of like an `array` is a list-based data structure.
* $map = <key: value>$... A group of `keys` is a `set`. Thus, each key only exists once in a map.

**So what are maps used for other than dictionaries?**
<br>We can use a map for a lot of things that have unique names. We can store a bunch of data as values for each unique name or item we care about, using maps. In Python, the map concept appears as a built-in data type called a dictionary. A dictionary contains key-value pairs.

**Exercise:**
```
"""Time to play with Python dictionaries!
You're going to work on a dictionary that
stores cities by country and continent.
One is done for you - the city of Mountain 
View is in the USA, which is in North America.

You need to add the cities listed below by
modifying the structure.
Then, you should print out the values specified
by looking them up in the structure.

Cities to add:
Bangalore (India, Asia)
Atlanta (USA, North America)
Cairo (Egypt, Africa)
Shanghai (China, Asia)"""

locations = {'North America': {'USA': ['Mountain View']}}
```

In [1]:
locations = {'North America': {'USA': ['Mountain View']}}

In [2]:
locations['Asia'] = {'India':['bangalore']}
locations['North America']['USA'].append('Atlanta')
locations['Africa'] = {'Cairo':['Egypt']}
locations['Asia']['China'] = ['Shanghai']
print('Done')

Done


In [3]:
locations

{'North America': {'USA': ['Mountain View', 'Atlanta']},
 'Asia': {'India': ['bangalore'], 'China': ['Shanghai']},
 'Africa': {'Cairo': ['Egypt']}}

In [4]:
print(f"2\n{list(locations['Asia'].keys())[0]} \
- {list(locations['Asia'].values())[0][0]}\n{list(locations['Asia'].keys())[1]} \
- {list(locations['Asia'].values())[1][0]}")

2
India - bangalore
China - Shanghai


In [5]:
list(locations['Asia'].keys())

['India', 'China']

```
"""Print the following (using "print").
1. A list of all cities in the USA in
alphabetic order.
2. All cities in Asia, in alphabetic
order, next to the name of the country.
In your output, label each answer with a number
so it looks like this:
1
American City
American City
2
Asian City - Country
Asian City - Country"""
```

In [6]:
for key, value in locations.items():
    if key == 'North America':
        details = sorted(locations[key]['USA'])
        print(f'1\n{details[0]}\n{details[1]}')
    if key == 'Asia':
        cities = list(locations[key].values())
        countries = list(locations[key].keys())
        print(f'2\n{cities[0][0]} - {countries[0]}')
        print(cities[1][0], '-', countries[1])
        

1
Atlanta
Mountain View
2
bangalore - India
Shanghai - China


## Hashing

Using a Data structure that employs a hash function allows us to do look-ups in constant time $O(1)$. Let that sink... lists do look-ups in linear time. Stacks and Queues for the head objects do look-ups in constant time. Priority queues do so in constant time for the element with the top priority, but any other element is found in linear-time search. The ability to do constant-time look-ups will make almost any algorithm instantly faster.

### Hash Functions
* The purpose of a hash function is to transform some value into one that can be stored and retrieved easily. <br>We give it some value, it converts the value based on some formula and spits out a coded version of the value that's often the index in an array.


* One common pattern in hash functions is to take the last few digits of a big or random number, divide it by some consistent number and using the remainder from that division to find a place to store that number in our array.


* When numbers are big and random, we need some way to convert those numbers to array indices quickly. That's how the constant-time look-up works. you give your big number to a hash function which spits out a hash code that turns into the index of an array. Then we can go to the array and gt our original value in constant time.


* Why use the last few digits of a big number? In most cases the last few digits are the most random.

### Collision:

As it turns out there's a flaw in the system. There are times when a hash function will spit out the same hash value for 2 different inputs. This situation is called a collision.
* There are 2 main ways to fix a collision... the 1st is to change the value in your hash function or to change the hash function completely so we have more than enough slots to store potential values.


* We can also keep the original hash function, but change the structure of the array. So, instead of storing one hash value in each slot, we can store a list that contains all values hashed at that spot. These lists are generally called **`Buckets`**.


* Down side of the `Bucket` approach is that we still need to iterate through a list of items everytime we want a value from that bucket. Therefore ideally a bucket shouldn't have more than 3 items.


* Hash functions have a constant time best case and average case of $O(n)$, but because of the bucket system, in the worst case it's linear time $O(n)$

### Note:

There's no one perfect way to build a hash function, we need to consider all of these and build a system that makes the most sense of our data and limitations.

Often we'd have to choose between a hash function that spreads out the values nicely but uses a lot of space and one that uses less buckets but might do some searching within each bucket.

Hashing questions are popular because there's often not a perfect solution. We're expecte to talk about the upsides and downsides of whatever approach we use. So we must do our best to optimize the hash function and ensure we;re communicating with the interviewer well.

### Load Factor:

When we're talking about hash tables, we can define a "load factor":
```
Load Factor = Number of Entries / Number of Buckets
```
The purpose of a load factor is to give us a sense of how "full" a hash table is. For example, if we're trying to store 10 values in a hash table with 1000 buckets, the load factor would be 0.01, and the majority of buckets in the table will be empty. We end up wasting memory by having so many empty buckets, so we may want to rehash, or come up with a new hash function with less buckets. We can use our load factor as an indicator for when to rehash— **as the load factor approaches 0, the more empty, or sparse, our hash table is**.

On the flip side, the closer our load factor is to 1 (meaning the number of values equals the number of buckets), the better it would be for us to rehash and add more buckets. **Any table with a load value greater than 1 is guaranteed to have collisions**.

### Exercise:
```
"""Write a HashTable class that stores strings
in a hash table, where keys are calculated
using the first two letters of the string."""

The Hash Value Function Formula is:
Hash Value = (ASCII Value of First Letter * 100) + ASCII Value of Second Letter 

You can assume that the string will have at least two letters, and the first two characters are uppercase letters (ASCII values from 65 to 90). You can use the Python function `ord()` to get the ASCII value of a letter, and `chr()` to get the letter associated with an ASCII value.

You'll create a HashTable class, methods to store and lookup values, and a helper function to calculate a hash value given a string. You cannot use a Python dictionary— only lists! And remember to store lists at each bucket, and not just the string itself. For example, you can store "UDACITY" at index 8568 as ["UDACITY"].
```

In [7]:
ord('A')

65

In [8]:
chr(65)

'A'

In [9]:
ord('a')

97

In [15]:
chr(97)

'a'

In [16]:
class HashTable(object):
    def __init__(self):
        self.table = [None]*10000

    def store(self, string):
        """Input a string that's stored in 
        the table."""
        
        hash_value = self.calculate_hash_value(string)
        
        # If the Hash-table bucket contains some string,
        # Append the new string, otherwise create the bucket
        # With the present string
        
        if self.table[hash_value]:
            self.table[hash_value].append(string)
        else:
            self.table[hash_value] = [string]

    def lookup(self, string):
        """Return the hash value if the
        string is already in the table.
        Return -1 otherwise."""
        
        hash_value = self.calculate_hash_value(string)
        try:
            assert string in self.table[hash_value]
            return hash_value
        except:
            return -1

    def calculate_hash_value(self, string):
        """Helper function to calulate a
        hash value from a string.
        This function implements the Hash value
        Function Formula from above"""
        
        value = 0
        
        for i in string[:2]:
            if string.index(i) > 0:
                value+=ord(i)
            else:
                value+=ord(i)*100
        return value

In [18]:
# Setup
hash_table = HashTable()

# Let's print the total length of the hash table
print(len(hash_table.table))

10000


In [19]:
# Test calculate_hash_value
# Should be 8568
print(hash_table.calculate_hash_value('UDACITY'))

8568


In [21]:
# Test lookup edge case
# Should be -1 as Udacity is not yet stoed in hash_table
print(hash_table.lookup('UDACITY'))

-1


In [22]:
# Test store
hash_table.store('UDACITY')

# Should be 8568
print(hash_table.lookup('UDACITY'))

8568


In [23]:
# Test store edge case
hash_table.store('UDACIOUS')

# Should be 8568
print(hash_table.lookup('UDACIOUS'))

8568


In [24]:
print(hash_table.table[8568])

['UDACITY', 'UDACIOUS']
