Extendible Hash Table For Databases

Makefile and main function

Makefile : The makefile is used to compile the program. It contains the following commands:
- make : Compiles the program.
- make run : Runs the program.
- make clean : Deletes the executable file and the object files.
- make remove_db : Deletes the database file.
src/main.c : The main function that we have created is used to test all the functions that we have implemented. It creates multiple databases and tests all the possible functions. Many of the test setting are confgurable at the top of the src/main.c file.

Functions, structs and design choices

General

In order to have a better understanding of what is going on below, we will first explain some general things about the project.

The structure that we created was made with the following things in mind:

<a id="reliability_subsection"></a> Reliability  We wanted to make sure that the data as well as the table are secure. So while we use the hashtable directly from the memory of the system, we also keep an exact copy of it in the disk. This way, if the system crashes, we can recover the data from the disk.
Performance  We also kept in mind that this structure may be used for tons of data so no slow downs are allowed. For this reason we keep the hashtable in the system's memory and only update it to the disk when it is necessary. This way we ensure the fast speed of both reading and writing data for the majority of the actions.

Another optimization that we made is to always keep the metadata of the file pinned in the memory. This way we avoid the overhead of pinning and unpinning it every time we want to use it. ( It is used alot XD )
Space efficiency  We wanted to make sure that we use the minimum amount of space possible. To achieve this, we try to utilize the allocated blocks to the maximum. For example, when writing the hash table to the disk, we first write in the pre allocated blocks that we already had the old hash table written on. If there is not enough space in these blocks, we allocate new ones and write the hash table there.

This forces as to use a list structure for the hash table blocks so that we can easily add new blocks to the end of the list as well as to keep track of the blocks that we have already used as they are among all the data blocks of the file.

`<a id="user_function_subsection"></a>` User Functions

NOTE : Any function that returns an int returns HT_OK on success and HT_ERROR on failure.

HT_Init   This function is used to initialize the BF layer which we will use for writting to the disk. It also creates a table in the memory with the information of the files that are open. These information include the file descriptor, the name of the file and a pointer to the hash table of the file in the memory. The table is global to the file with all HT functions because we want only these functions to have access to it.

The structure of the file table is the following:
```
typedef struct {
 int file_desc;
 char* filename;
 HashTableCell* hash_table;
}
```
HT_CreateIndex   This function is used for the creation of a new Hash Table file. It creates a new file with the given name and then initializes the metadata of the file which are stored in the first block of the file.

The file metadata structure is the following:
```
typedef struct {
 // General info
 FileType type;
 int number_of_records_per_block;
 int global_depth;

 // Hash table info
 int first_hash_table_block_id;
 int number_of_hash_table_blocks;
 int cells_per_hash_block;
} HT_info;
```
In any case of failure (ex. File is already open) an error message is printed in stderr and the the function returns HT_ERROR
HT_OpenIndex   This function is responsible for loading an Index with a specific name from the disk. For later use, it pins the first block of the file which contains the metadata of the file and it leaves it pinned until the file is closed. Also if the file contains a hash table, it loads it in the memory from the file using the LoadTableFromDisk. Finally, it returns the index of the file in the file table

Along with all the above, the function performs some checks to see if the file is already open or if the file is a hash table file. If any of these checks fail, the function returns HT_ERROR.
HT_CloseFile   This function is used to close an open Hash Table file with a given index in the file table. It sets as dirty the metadata block of the file and unpins it. After that, it closes the file and removes it from the file table, freeing the memory that was used.
<a id="function_ht_insertentry"></a> HT_InsertEntry   This function inserts a new record in the Hash Table file with a given index in the file table. It uses a FNV-1a hash_function to hash the record ID

Of this hash value only the N most significant bits are used, where N is the global depth of the hash table. This way we can use the same hash function for all the hash tables and we can change the global depth of the hash table without having to rehash all the records.

There are 3 cases when inserting a record:
- Case 1 : There is no block for the hash value so we allocate a new one and insert the record there.
- Case 2 : The block for the hash value has space for the record so we insert it there.
- Case 3 : The block for the hash value is full so we split it with SplitBlock and rehash all the records that were in this block using RehashRecords. Τhe new record gets rehashed as well and inserted in the right block. When splitting a block, we may have to double the size of the hash table using DoubleHashTable. We repeatidly do this until until we either can insert the record in a block or MAX_SPLITS amout of splits have reached.
 
 In the latter case we give up because this can lead to an infinite loop of splits if all the records hash in the same value repeaditly. This is a very rare case but we have to take it into account.
After inserting the record, we set as dirty the block that we inserted the record in and we unpin it. If anything goes wrong, we return HT_ERROR.
HT_PrintAllEntries   The function is used to print all the records of a file with record.id equal to the given id. If the id is NULL then it prints all the records of the file. This is done by iterating through all the blocks of the hash table.
HashStatistics   Given a name, print some statistics for the file.

In case the file is not open , the function opens it to get the statistics and then closes it again.

The statistics include the following:
- Number of blocks that the file has
 - how many of them are used for the hash table
 - how many for the records
 - how many for metadata
- Maximum amout of records that a block contains
- Minimum amount of records that a block contains
- Average amount of records stored in a block among all the blocks.

`<a id="internal_function_subsection"></a>` Internal Functions

<a id="function_splitblock"></a> SplitBlock   This function splits buddies into two blocks. It is used when we have to split a block because it is full. This function also calls DoubleHashTable if the global depth of the hash table is equal to the local depth of the block that we are splitting. In other words we double the size of the hash table if the block that fills up does not have any buddies. 

After the allocation of the new block we call the RehashRecords function to rehash all the records of the old block to the old and new one. Finally, we update the metadata of the file and we set as dirty the old and the new block.
<a id="function_rehashrecords"></a> RehashRecords  

This function is responsible for rehashing all the records of a block. It is used when we split a block with SplitBlock and we rehash all the records of the old block to the old and new one.

An important technical note is that this function takes as input two pointers to the block data in order to avoid opening and closing the blocks again.
<a id="function_doublehashtable"></a> DoubleHashTable  

This function is used to double the size of the hash table. It is used when we need to split a block (SplitBlock) but the block does not have any buddies. In this case we have to double the size of the hash table so that we can split the block.

This function is also used to create a hash table if the old_hash_table pointer is NULL but we recomend using the function CreateHashTable for that operation.

NOTE : As said above (Reliability section), this function updates both the memory hash table and the disk hash table.
<a id="function_createhashtable"></a> CreateHashTable  

This function is used to create a hash table both on the disk and the memory. It is used when we add the first entry to a file (HT_InsertEntry) and we need to create a hash table for the file.

TECHNICAL NOTE : This is not an actual function. It actually calls the function DoubleHashTable with the arguments a little bit changed.
<a id="function_loadtablefromdisk"></a> LoadTableFromDisk   This is a helper function responsible for loading the hash table stored in the file to the main mamory when the file gets opened.

IMPORTANT: This function will return NULL if any error occurs and an error message will be shown. BUT in the current state of the program this is wanted whenever a new file is opened. This is because we want to initailize the pointer to the hash table to NULL in the new file. So be careful when reading errors from this function.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
build		build
examples		examples
include		include
lib/linux		lib/linux
src		src
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Extendible Hash Table For Databases

Makefile and main function

Functions, structs and design choices

General

`<a id="user_function_subsection"></a>` User Functions

`<a id="internal_function_subsection"></a>` Internal Functions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Extendible Hash Table For Databases

Makefile and main function

Functions, structs and design choices

General

<a id="user_function_subsection"></a> User Functions

<a id="internal_function_subsection"></a> Internal Functions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`<a id="user_function_subsection"></a>` User Functions

`<a id="internal_function_subsection"></a>` Internal Functions

Packages