# Hash Tables



## 1 Plant Information System


### 1.1 OSISoft PI

[OSISoft PI System](https://www.osisoft.cn/)

The PI System enables digital transformation through trusted, high-quality operations data. Collect, enhance, and deliver data in real time in any location. Empower engineers and operators. Accelerate the work of analysts and data scientists. Support new business opportunities.

PI System 通过可信的高质量运营数据实现数字化转型。在任何位置实时收集、增强和提供数据。为工程师和操作员提供支持。加快分析师和数据科学家的工作。支持新的业务机会。

![](./img/ds/osisoft.png)

Each analysis task needs to **search** the measuring points：


* <p style="font-size:20px; color:blue">A few or dozens of measuring points are taken from tens of thousands of measuring points.</p>

### 1.2 Building Energy  Management System 

![](./img/ds/buildingsystem.png)

![](./img/ds/buildingcode.png)


### 1.3  The Simple Example: Measurement Tags of VCC

The table store the Measurement Tags of VCC,every the Tag recored has the uniqe tagID

Refrigerant 134a is the working fluid in an ideal vapor-compression refrigeration cycle that
communicates thermally with a cold region at 0°C and a warm region at 26°C. 

Saturated vapor enters the compressor at 0°C and saturated liquid leaves the condenser at 26°C.

The mass flow rate of the refrigerant is 0.08 kg/s.



![](./img/vcr/ivcr-ts.jpg)


In [None]:
%%file ./data/VCC1_Tag.csv
TagID,Tag,Desc,Unit,Value
600,CompressorIPortM,压缩机入口流量,kg/s,0.08
616,CompressorOPortP,压缩机出口压力,MPa,0.6854
613,CompressorOPortT,压缩机出口温度,°C,29.27
714,CondenserOPortT,冷凝器出口温度,°C,26
708,CondenserOPortX,冷凝器出口干度,-,0
814,ExpansionValveOPortT,膨胀阀出口温度,°C,26
808,ExpansionValveOPortX,膨胀阀出口干度,°C,0
914,EvaporatorValveOPortT,蒸发器出口温度,°C,0
908,EvaporatorValveOPortX,蒸发器出口干度,-,1

**Data Stucture of Tags**

```python

tag=(id,(tag,desc,value)) # tuple 

VCC1_TagTable=[]  # list

VCC1_TagTable=[(id,(tag,desc,unit,value)),...]
```

In [None]:
import  csv
filename="./data/VCC1_Tag.csv"
csvfile = open(filename, 'r',encoding="utf-8")
csvdata = csv.DictReader(csvfile)
VCC1_TagTable=[]
for line in csvdata:
    id = int(line['TagID']) # convert to int
    tag=line['Tag']
    desc=line['Desc']
    unit=c=line['Unit']
    value=float(line['Value'])
    VCC1_TagTable.append((id,(tag,desc,unit,value)))
csvfile.close()  

In [None]:
for item in  VCC1_TagTable:
    print(item)

Get the tags of Compressor through tagID by the Linear Search

In [None]:
CompressorTagIDList=[600,616,614,914,908]
for tagid in CompressorTagIDList:
    for item in VCC1_TagTable:
        if tagid==item[0]:
            print(item[1])       

The Linear Search will perform  $𝑂(N)$  

Here are the data structures, the complexities of their key lookup operations:

| Data structure   |  lookup  |
| ----------------- |:--------:|
| Array         |  O(N)    |
| Sorted array    |   O(logN) |
 
If there is a data structure that can do better. And it turns out that there is: **the hash table**, one of the best and most useful data structures 

In Python, the type <font color="blue">dict</font> dictionaries use <b>hashing</b> to do <b>the lookup in time</b> 

* that is nearly `independent` of the `size` of the dictionary


The basic idea behind hashing is

* **convert the key to an integer, and then use that integer to index into a list**

which can be done in `constant` time. 

**Hash functions** : any function that can be used to map data of `arbitrary` size to `fixed-size` values.

* `CurTagID%ListSize`(除留余数法 k mod m - 关键字k除以表长度m的余数)
![](./img/ds/hash1.png)

**Hash value** : The values returned by a hash function are called 
    
* `Index_TagID=CurTagID%ListSize`

**Hash table**: the data structure that maps keys to values with hashing

* `VCC1_TagTable=[None for i in range(ListSize)]`

>散列表通过把`关键码`值映射到表中`一个位置`来访问记录，以加快查找的速度。这个映射函数叫做散列函数，存放记录的数组叫做散列表 


For example we use the remainder `key%ListSize` as the index into the list

In [None]:
import  csv
filename="./data/VCC1_Tag.csv"
csvfile = open(filename, 'r',encoding="utf-8")
csvdata = csv.DictReader(csvfile)

# set the size of the store list
ListSize=30;
# the store table 
VCC1_TagTable=[None for i in range(ListSize)]
for line in csvdata:
    id = int(line['TagID'])
    tag=line['Tag']
    desc=line['Desc']
    unit=line['Unit']
    value=float(line['Value'])
    # convert the key to an integer:  address in the list
    address= id%ListSize
    # put the record in the address of the list
    VCC1_TagTable[address]=(id,(tag,desc,unit,value))
    print(id,address)
csvfile.close() 

In [None]:
for  i,item in  enumerate(VCC1_TagTable):
    print(i,item)

Search one tag from TagID with the `unique` Index_TagID

* It is done in **constant** time that is `independent` of the `size` of VCC1_TagList

The complexity is $O(1)$

* 如果每个关键字保留一个位置，就可以**直接寻址**，时间复杂度是O(1)。

In [None]:
CompressorTagIDList=[600,616,613,914,908]
for tagid in CompressorTagIDList:
    address=tagid%ListSize
    print(address,VCC1_TagTable[address])   

## 2 Collision 

### 2.1 Collision 

If the space of possible `outputs` is **smaller** than the space of possible `inputs`, 

* a hash function is a `many`-to-`one` mapping. 

the different keys are mapped to the same hash value,it is called a <b>collision</b>. 

>散列冲突：在散列表中，不同的关键字值对应到同一个存储位置的现象


**For exmple**

* the input sizes of key is :9

* the output sizes of hash value:ListSize is 5

The hash function : `id%ListSize`


In [None]:
import  csv
filename="./data/VCC1_Tag.csv"
csvfile = open(filename, 'r',encoding="utf-8")
csvdata = csv.DictReader(csvfile)

# set the size of the store list
ListSize=5;
# the store table 
for line in csvdata:
    id = int(line['TagID'])
    # convert the key to an integer: address of the list
    address= id%ListSize
    print(id,address)
csvfile.close()  

**Many Collision!**
```
613 3
708 3
808 3
908 3
```
```
714 4
814 4
914 4
```

**If Collision**

In [None]:
import  csv
filename="./data/VCC1_Tag.csv"
csvfile = open(filename, 'r',encoding="utf-8")
csvdata = csv.DictReader(csvfile)

# set the size of the store list
ListSize=5;
# the store table 
VCC1_TagTable=[None for i in range(ListSize)]
for line in csvdata:
    id = int(line['TagID'])
    tag=line['Tag']
    desc=line['Desc']
    unit=line['Unit']
    value=float(line['Value'])
    # convert the key to an integer:address of the list
    address= id%ListSize
    # put the record in the index of the list
    VCC1_TagTable[address]=(id,(tag,desc,unit,value))
csvfile.close() 

In [None]:
CompressorTagIDList=[600,616,613,914,908]
for tagid in CompressorTagIDList:
    address=tagid%ListSize
    print(VCC1_TagTable[address]) 

you will see 
```
(908, ('EvaporatorValveOPortX', '蒸发器出口干度', '-', 1.0))
(908, ('EvaporatorValveOPortX', '蒸发器出口干度', '-', 1.0))
```
because the Collision 
```
613 3
708 3
808 3
908 3
```

If the space of possible outputs is bigger than the space of possible inputs

* **no collisions**?

For example: `ListSize=20`

In [None]:
import  csv
filename="./data/VCC1_Tag.csv"
csvfile = open(filename, 'r',encoding="utf-8")
csvdata = csv.DictReader(csvfile)

# set the size of the store list
ListSize=20;
# the store table 
VCC1_TagTable=[None for i in range(ListSize)]
for line in csvdata:
    id = int(line['TagID'])
    tag=line['Tag']
    desc=line['Desc']
    unit=line['Unit']
    value=float(line['Value'])
    # convert the key to an integer: address of the list
    address= id%ListSize
    # put the record in the index of the list
    VCC1_TagTable[address]=(id,(tag,desc,unit,value))
    print(id, address)
csvfile.close() 

**collisions**

```
714 14
814 14
914 14

708 8
808 8
908 8
```


### 2.2 Handle the collision 

The paths to handle the collision in Hash Table

1. **minimizes collisions**: 

   * the `good` hash function： produces : **uniform distribution** every output in the range is equally probable, which `minimizes` the probability of `collisions`(散列函数设计要点:均匀性好,减少元素冲突次数)
    
  *  the `sweet spot` size of hash table


2. **collision resolution**: Separate Chainingg(分离链接法), Open Addressing(开放地址法） 


### 2.3 Choice of hash table size

Assuming you have a good hash function, by making the hash table large enough,

Let’s think about the extremes:

* You create a hash table with 1,000,000 buckets and you add 1,000 items to it. The chances of a collision are extremely low, and this will perform amazingly.

we can **reduce** the number of collisions sufficiently to allow us to treat the complexity as O(1).

* 一个足够大的数组，**可以**为每个关键字保留一个位置，就可以**直接寻址**，时间复杂度是O(1)。

It will **waste a lot of space**. Therefore, you need to find the `“sweet spot”` for the size of the hash table vs. the number of items you plan to put into it. 

Choice of hash table size depends in part on choice of hash function, and collision resolution strategy

But a good general **rule of thumb** is:

* The hash table should be an array with length about **1.3** times the maximum number of keys that will actually be in the table, and
Size of hash table array should be a **prime** number



## 3 Separate Chaining

### 3.1 Separate Chaining(分离链接法)
There are different ways through which a collision can be resolved. We will look at a method called **Separate Chaining(分离链接法)**, 

**Chain hashing** avoids collision. The idea is to make each cell of hash table point to a linked list of records(`bucket`) that have same hash function value.

* 将散列到同一个值的所有元素保留到一个`链表`中

**bucket(桶)**:  a linked list of records with same hash function value

The hash table is a list of `hash buckets`. 

For Example:
```
keys :   [36,18,72,43,6,10,5,15]
tab size : 8
hash function : key % tab size
```
![](./img/ds/hashtable_separatechaining.gif)



### 3.2 Hash Table in Python

The basic idea is to represent the hash table by a list where **each item** is a list of **key/value** pairs that have the `same` hash index

```python
[

    [bucket of the same hash value1],

    [bucket of the same hash value2]
,...
]
```

the every key/value pair in bucket is the tuple:
```python
(key, value)
```

In [None]:
keyvalues = [(36, "赵"), (18, "钱"), (72, "孙"), (43, "李"), (6, "周"), (10, "吴"), (5, "郑"), (15, "王")]
num_buckets=8

buckets=[[] for i in range(num_buckets)]

print("Key","The address in buckets","\n"+20*"-")
for item in keyvalues:
    #hash function: key % num_buckets
    address= item[0] % num_buckets
    buckets[address].append(item)
    print(item[0],address)

print("\nNo.","Bucket","\n"+20*"-")   
for  i,bucket in  enumerate(buckets):
    print(i,bucket)

### 3.3 Search

In [None]:
key=10
hashvalue=key % num_buckets
for item in buckets[hashvalue]:
    if item[0]==key:
        print(key,item[1])  

## 4 The Class of Separate Chaining

* **key** is integer or string

* **hash function**: Key % numBuckets and djb2 

> Hash function for string
>
>* http://www.cse.yorku.ca/~oz/hash.html
>
>**djb2**
>
>This algorithm (k=33) was first reported by `Dan Bernstein` many years ago in comp.lang.c.
>
>The magic of number 33 (why it works better than many other constants, prime or not) has never been adequately explained.
>```python
> hash = 5381
> for c in dictKey:
>     hash = ((hash *33) + hash) + ord(c)
> hash % numBuckets
>```


In [None]:
class hashTable:
    """A dictionary with integer and string keys"""
    
    def __init__(self, numBuckets):
        """Create an empty dictionary
           buckets is initialized to a list of numBuckets empty lists.
        """
        self.numBuckets=numBuckets
        self.buckets=[[] for i in range(self.numBuckets)] 
            
    def getHashValue(self, dictKey):
        if isinstance(dictKey, int):
            return dictKey % self.numBuckets
        if isinstance(dictKey,str):
            # djb2 hash algorithm by Dan Bernstein
            hash = 5381
            for c in dictKey:
                hash = ((hash *33) + hash) + ord(c)
            return hash % self.numBuckets

    
    def addEntry(self, dictKey, dictVal):
        """Assumes dictKey an int.  Adds an entry."""
        hashBucket = self.buckets[self.getHashValue(dictKey)]
        for i in range(len(hashBucket)):
            if hashBucket[i][0] == dictKey:
                hashBucket[i] = (dictKey, dictVal) #if one was found,replace
                return
        hashBucket.append((dictKey, dictVal)) # append a new entry (dictKey, dictVal) to the bucket if none was found.
        
    def getValue(self, dictKey):
        """Returns entry associated with the key dictKey"""
        hashBucket = self.buckets[self.getHashValue(dictKey)]
        for e in hashBucket:
            if e[0] == dictKey: # key
                return e[1]     # the tuple of value 
        return None
    
    def __str__(self):
        result = '{'
        for b in self.buckets:
            for e in b:
                result = result + str(e[0]) + ':' + str(e[1]) + ','
        return result[:-1] + '}' #result[:-1] omits the last comma


### 4.1  Init the hash table

```python
def __init__(self, numBuckets):
   """
   The instance variable buckets is initialized to a list of numBuckets empty lists
   """
        self.numBuckets = numBuckets
        self.buckets=[[] for i in range(self.numBuckets)] 
```

### 4.2  hash function

```python
def getHashValue(self, dictKey):
        if isinstance(dictKey, int):
            return dictKey % self.numBuckets
        if isinstance(dictKey,str):
            # djb2 hash algorithm by Dan Bernstein
            hash = 5381
            for c in dictKey:
                hash = ((hash *33) + hash) + ord(c)
            return hash % self.numBuckets

```


### 4.3 addEntry

By making each bucket a list, we handle collisions by storing all of the values that hash to the same bucket in the list</b>. 

```python
def addEntry(self, dictKey, dictVal):
    """
     To store or look up an entry with key **dictKey
    """ 
    hashBucket = self.buckets[self.getHashValue(dictKey)] # hashing the location `hashBucket` list in  the list of self.buckets 
    for i in range(len(hashBucket)):
        if hashBucket[i][0] == dictKey:# the item in each bucket is tuple: (dictKey, dictVal)
            hashBucket[i] = (dictKey, dictVal) #if one was found,replace
            return
         hashBucket.append((dictKey, dictVal)) # append a new entry (dictKey, dictVal) to the bucket if none was found.
```      
   
we use the hash function  to convert dictKey into an integer, 
```python  
 hashBucket = self.buckets[self.getHashValue(dictKey)] # hashing the location `hashBucket` list in  the list of self.buckets 
```    
and use that integer to index into buckets 
```python
   hashBucket[i]
```
to find the hash bucket associated with **dictKey**: if <b>a value is to be stored</b>,then  

* if one was found:  <b>replace</b> the value in the existing entry,  

* if none was found: <b>append</b> a new entry to the bucket


### 4.4 getValue

```python 

def getValue(self, dictKey)
```
We then search that bucket (which is a list) linearly to see if there is an entry with the key dictKey.

```python 
 for e in hashBucket:
            if e[0] == dictKey: // key
                return e[1]     // value
```

If we are doing <b>a lookup</b> and there is an entry with the key, we simply return the value stored with that key. 

If there is no entry with that key, we return None. 




### 4.5 Measurement Tags of VCC

#### 4.5.1 Integer keys
The hash table for Measurement Tags of VCC


In [None]:
import  csv
filename="./data/VCC1_Tag.csv"
csvfile = open(filename, 'r',encoding="utf-8")
csvdata = csv.DictReader(csvfile)
Entrys=[]
for line in csvdata:
    id = int(line['TagID'])
    tag=line['Tag']
    desc=line['Desc']
    unit=line['Unit']
    value=float(line['Value'])
    Entrys.append((id,(tag,desc,unit,value))) 
csvfile.close()  

**hash table smaller sise ,collisions**

* numBucket=5

In [None]:
numBuckets=5
# numBuckets 5 <entries 
D = hashTable(numBuckets)
for item in Entrys:
    D.addEntry(item[0],item[1])

print('The hashTable(integer key) is:')
print(D)

print('\n', 'The hase buckets are:')
for i,hashBucket in enumerate(D.buckets):
    print('BucketID',i,'  ', hashBucket)


**one, two, or three tuples** depending upon <b>the number of collisions</b> that occurred

In [None]:
CompressorTagIDList=[600,616,613,914,908]
for tagid in CompressorTagIDList:
    thebucket=D.getValue(tagid)   
    print(tagid,thebucket)

In [None]:
tagid=808
thebucket=D.getValue(tagid)  
print(tagid,thebucket)

#### 4.5.2 String keys

In [None]:
import  csv
filename="./data/VCC1_Tag.csv"
csvfile = open(filename, 'r',encoding="utf-8")
csvdata = csv.DictReader(csvfile)
Entrys=[]
for line in csvdata:
    tag=line['Tag']
    desc=line['Desc']
    unit=line['Unit']
    value=float(line['Value'])
    Entrys.append((tag,(desc,unit,value))) 
csvfile.close()  

In [None]:
numBuckets=5
# numBuckets 5 <entries 10
D = hashTable(numBuckets)
for item in Entrys:
    D.addEntry(item[0],item[1])

print('The hashTable(String key) is:')
print(D)

print('\n', 'The hase buckets are:')
for i,hashBucket in enumerate(D.buckets):
    print('BucketID',i,'  ', hashBucket)


In [None]:
tagid='CompressorOPortP'
thebucket=D.getValue(tagid)  
print(tagid,thebucket)

### 4.6 Analysis Example 

#### 4.6.1 Tags with StringID

In [None]:
%%file ./data/VCC_TagStringID.csv
Tag,Desc,Unit,Value
CompressorIPortM,压缩机入口流量,kg/s,0.08
CompressorOPortP,压缩机出口压力,MPa,0.6854
CompressorOPortT,压缩机出口温度,°C,29.27
CondenserOPortT,冷凝器出口温度,°C,26
CondenserOPortX,冷凝器出口干度,-,0
ExpansionValveOPortT,膨胀阀出口温度,°C,26
ExpansionValveOPortX,膨胀阀出口干度,°C,0
EvaporatorValveOPortT,蒸发器出口温度,°C,0
EvaporatorValveOPortX,蒸发器出口干度,-,1

In [None]:
import csv
def tags_key_str(filename):
    csvfile = open(filename, 'r', encoding="utf-8")
    csvdata = csv.DictReader(csvfile)
    Entrys = []
    for line in csvdata:
        tag = line['Tag']
        desc = line['Desc']
        unit = line['Unit']
        value = float(line['Value'])
        Entrys.append((tag, (desc, unit, value)))
    csvfile.close()
    return Entrys

In [None]:
Entrys= tags_key_str("./data/VCC_TagStringID.csv")
for item in Entrys:
    print(item)

#### 4.6.2 hash_table 

* hash_table : return `value ` from (desc, unit, value)

In [None]:
class hash_table(hashTable):

    def getValue(self, dictKey):
        """Returns the value of entry associated with the key dictKey"""
        hashBucket = self.buckets[self.getHashValue(dictKey)]
        for e in hashBucket:
            if e[0] == dictKey:  #  key
                return e[1][2]   # the value :(desc, unit, value)
        return None

In [None]:
def thehashtable(entrys):
    numBuckets = 5
    tagtable=hash_table(numBuckets)
    for item in entrys:
        tagtable.addEntry(item[0], item[1])
    return tagtable

#### 4.6.3 Analysis module with TagID

Class `Tag` to get data with `tagid`

```python
class Tag:
    def __init__(self, tagid=None):
        """ create the tag object"""
        self.tagid = tagid
        self.v = None
```

In [None]:
import CoolProp.CoolProp as cp

class Tag:
    def __init__(self, tagid=None):
        """ create the tag object"""
        self.tagid = tagid
        self.v = None

class Port:
    def __init__(self, dictPort):
        """ create the Port object"""
        self.__dict__.update({'p': Tag(), 't': Tag(), 'x': Tag(), 'mdot': Tag(), "h": Tag()})

        self.GetTaglist=[]
        for key in dictPort:
            setattr(self,key,Tag(dictPort[key]))
            self.GetTaglist.append(getattr(self,key))
           
    def get_state(self):
        if self.t.v is not None and self.x.v is not None:
            self.h.v = cp.PropsSI('H', 'T', self.t.v+273.15, 'Q',
                                self.x.v, 'R134a')/1000
        
        if self.p.v is not None and self.t.v is not None:
            self.h.v = cp.PropsSI('H', 'P', self.p.v*1.0e6, 'T',
                                self.t.v+273.15, 'R134a')/1000

class Compressor:
    """ compression of the refrigerant"""

    def __init__(self, dictDev,taghashtable):
        """  Initializes """
        self.taghashtable = taghashtable
        self.iPort = Port(dictDev['iPort'])
        self.oPort = Port(dictDev['oPort'])
        self.Wc = None
 
    def get_data(self):
        """ get data from external data sources"""
        for item in self.iPort.GetTaglist:
            item.v = self.taghashtable.getValue(item.tagid)
        for item in self.oPort.GetTaglist:
            item.v = self.taghashtable.getValue(item.tagid)
        
    def cal_performance(self):
        """  energy   """
        self.iPort.get_state()
        self.oPort.get_state()
        self.Wc = self.iPort.mdot.v * (self.oPort.h.v - self.iPort.h.v)
  
    def __str__(self):
        result = '\nWc(kW): {:>.2f}'.format(self.Wc)
        return result

In [None]:
if __name__ == "__main__":
    """
    'CompressorIPortM', '压缩机入口流量', 'kg/s', 0.08)
    'CompressorOPortP', '压缩机出口压力', 'MPa', 0.6854)
    'CompressorOPortT', '压缩机出口温度', '°C', 29.27)
    'EvaporatorValveOPortT', '蒸发器出口温度', '°C', 0.0)
    'EvaporatorValveOPortX', '蒸发器出口干度', '-', 1.0)
    """
    filename = "./data/VCC_TagStringID.csv"
    entrys = tags_key_str(filename)
    curTagTable = thehashtable(entrys)
    
    dictCompTags = {"iPort": {"t": 'EvaporatorValveOPortT',  "x": 'EvaporatorValveOPortX', "mdot": 'CompressorIPortM'},
                    "oPort": {"p": 'CompressorOPortP', "t": 'CompressorOPortT'}
                    }
    curcomp = Compressor(dictCompTags, curTagTable)
    curcomp.get_data()
    curcomp.cal_performance()
    print(curcomp)

## 5 Hash in C


* key is integer
* value is char

### 5.1 intDict in C

* intDict.h/c

* mainintDict.c

In [None]:
%%file ./demo/include/intDict.h
#ifndef INTDICT_H
#define INTDICT_H

typedef struct _node
{
	int key;
	char value;
	struct _node *next;
} Node;

typedef struct _hashtable
{
	int numBuckets;
	Node **buckets; //the linked list stack
} Hashtable;

// Create hash table
Hashtable *createHash(int numBuckets);

// free hash table
void *freeHash(Hashtable *hTable);

// hash function for int keys
int inthash(int key, int numBuckets);

// Add Entry to table - keyed by int
void addEntry(Hashtable *hTable, int key, char value);

// Lookup  by int key
Node *searchEntry(Hashtable *hTable, int key);

// Get by int key
char getValue(Hashtable *hTable, int key);

#endif


In [None]:
%%file ./demo/src/intDict.c

#include <stdio.h>
#include <stdlib.h>
#include "intDict.h"

// Create hash table
Hashtable *createHash(int numBuckets)
{
	Hashtable *table = (Hashtable *)malloc(sizeof(Hashtable *));
	if (!table)
	{
		return NULL;
	}

	table->buckets = (Node **)malloc(sizeof(Node) * numBuckets);
	if (!table->buckets)
	{
		free(table);
		return NULL;
	}

	table->numBuckets = numBuckets;
	// initialize the head pointer of the bucket stack to NULL
	for (int i = 0; i < table->numBuckets; i++)
		table->buckets[i] = NULL;

	return table;
}

void *freeHash(Hashtable *hTable)
{
	Node *b, *p;
	for (int i = 0; i < hTable->numBuckets; i++)
	{
		b = hTable->buckets[i];
		while (b != NULL)
		{
			p = b->next;
			free(b);
			b = p;
		}
	}
	free(hTable->buckets);
	free(hTable);
}

// hash function for int key
int inthash(int key, int numBuckets)
{
	return key % numBuckets;
}

// Lookup by int key
Node *searchEntry(Hashtable *hTable, int key)
{
	Node *p;
	int addr = inthash(key, hTable->numBuckets);
	p = hTable->buckets[addr];

	while (p && p->key != key)
		p = p->next;

	return p;
}

// Add Entry to table - keyed by int
void addEntry(Hashtable *hTable, int key, char value)
{
	int addr;
	Node *p, *entry;
	p = searchEntry(hTable, key);
	if (p)
	{
		return;
	}
	else
	{ /*
          add a new item on the top of the linked list stack 
          and a pointer to the top element.  
       */
		addr = inthash(key, hTable->numBuckets);
		entry = (Node *)malloc(sizeof(Node));
		entry->key = key;
		entry->value = value;
		entry->next = hTable->buckets[addr];
		hTable->buckets[addr] = entry;
	}
}

// Get by int key
char getValue(Hashtable *hTable, int key)
{
	Node *p;
	p = searchEntry(hTable, key);
	if (p)
	{
		return p->value;
	}
}


In [None]:
%%file ./demo/src/mainintDict.c
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include "intDict.h"

int main()
{
	int key[8]={36,18,72,43,6,10,5,15};
	char value[8]={'A','B','C','D','E','F','G','H'};

   	int numBuckets = 8;
	int numEntries = 8;
	Hashtable *hTable;

	hTable = createHash(numBuckets);
	for (int i = 0; i < numEntries; i++)
	{
		addEntry(hTable, key[i], value[i]);
		printf("(%d %c)\n", key[i], value[i]);
	}

	printf("\nThe buckets(the linked list stack) are: \n");
	for (int i = 0; i < hTable->numBuckets; i++)
	{
		Node *b, *p;
		b = hTable->buckets[i];
		printf("bucket %d :", i);
		if (b)
		{
			for (p = b; p != NULL; p = p->next)
				printf(" (%d %c) ", p->key, p->value);
			printf("\n");
		}
		else
			printf("\n");
	}

	printf("\nHash search:");
    int curkey=18;
    char curval = getValue(hTable,curkey);
	printf("%d -> %c \n", curkey,curval);

	freeHash(hTable);
	return 0;
}


In [None]:
!gcc -o ./demo/bin/mainintDict ./demo/src/mainintDict.c ./demo/src/intDict.c -I./demo/include

In [None]:
!.\demo\bin\mainintDict 

###  5.2 The Linked list

* add a new item on the top of the linked list(`Stack`)

![](./img/ds/linked-list-stack.png)

In [None]:
%%file ./demo/src/demoLinkedlist_stack.c
#include <stdio.h>
#include <stdlib.h>

typedef struct _node
{
	int val;
	struct _node *next;
} node;

void push(node **head, int val)
/* add a new item on the top of the linked list*/
{
	node *new_node=(node *)malloc(sizeof(node));
	new_node->val = val;
	new_node->next = *head;
	*head = new_node;
}

void print_list(node *head)
{
	node *current = head;
	while (current != NULL)
	{
		printf("%d\n", current->val);
		current = current->next;
	}
}

int main()
{
	node *test_list = NULL;
	push(&test_list, 8);
	push(&test_list, 88);
    push(&test_list, 98);
	print_list(test_list);
}

In [None]:
!gcc -o ./demo/bin/demoLinkedlist_stack ./demo/src/demoLinkedlist_stack.c 

In [None]:
!.\demo\bin\demoLinkedlist_stack

* Adding an item to the end of the list(`queue`)


![](./img/ds/linked-list-queue.png)

In [1]:
%%file ./demo/src/demoLinkedlist_queue.c
#include <stdio.h>
#include <stdlib.h>

typedef struct _node
{
	int val;
	struct _node *next;
} node;

void push(node **head, node **tail, int val)
/*Adding an item to the end of the list*/
{
	node *new_node  = (node *)malloc(sizeof(node));
	new_node->val = val;
	new_node->next = NULL;
	if (*head == NULL)
	{
		*head = new_node;
        *tail=new_node;
	}
	else
	{
        (*tail)->next = new_node;
        *tail = new_node;
    }		
}

void print_list(node *head)
{
	node *current = head;
	while (current != NULL)
	{
		printf("%d\n", current->val);
		current = current->next;
	}
}

int main()
{
	node *test_list = NULL;
    node *tail = NULL;

    push(&test_list, &tail, 8);
    push(&test_list, &tail,88);
    push(&test_list, &tail,98);
    print_list(test_list);
}

Overwriting ./demo/src/demoLinkedlist_queue.c


In [2]:
!gcc -o ./demo/bin/demoLinkedlist_queue ./demo/src/demoLinkedlist_queue.c 

In [3]:
!.\demo\bin\demoLinkedlist_queue 

8
88
98


### 5.3 Unordered Map(C++11)

Unordered maps are associative containers that store elements formed by the combination of a key value and a mapped value, and which allows for fast retrieval of individual elements based on their keys.

In an unordered_map, the key value is generally used to uniquely identify the element, while the mapped value is an object with the content associated to this key. Types of key and mapped value may differ.

Internally, the elements in the unordered_map are not sorted in any particular order with respect to either their key or mapped values, but organized into buckets depending on their hash values to allow for fast access to individual elements directly by their key values (with a constant average time complexity on average).

In [4]:
%%file ./demo/src/demo1_unordered_map.cpp

#include <iostream>
#include <string>
#include <tuple>
#include <unordered_map>
 
using namespace std;
typedef tuple<string,string,string,float> tupTag;
 
int main()
{  
    unordered_map<int, tupTag> tags;
    tags[600] =(tupTag){"CompressorIPortM","压缩机入口质量流量","kg/s",0.08 };
    cout << "Tag 600:  " <<get<0>(tags[600]) <<"\t"<< get<1>(tags[600])
         << "\t"<<get<2>(tags[600])<< "\t"<<get<3>(tags[600])<<endl;
    return 0;
}

Overwriting ./demo/src/demo1_unordered_map.cpp


In [5]:
!g++ -fexec-charset=GBK -o ./demo/bin/demo1_unordered_map.exe ./demo/src/demo1_unordered_map.cpp 

In [6]:
!.\demo\bin\demo1_unordered_map 

Tag 600:  CompressorIPortM	压缩机入口质量流量	kg/s	0.08


## Further Reading

* 严蔚敏，李冬梅，吴伟民. 数据结构（C语言版），人民邮电出版社（第2版）,2015年2月  

* Mark Allen Weiss. Data Structures and Algorithm Analysis in C


* [Redis: an in-memory database. The data model is key-value,](https://github.com/redis/redis)