-
Makefile : The makefile is used to compile the program. It contains the following commands:
-
make : Compiles the program.
-
make run : Runs the program.
-
make clean : Deletes the executable file and the object files.
-
make remove_db : Deletes the database file.
-
-
src/main.c : The main function that we have created is used to test all the functions that we have implemented. It creates multiple databases and tests all the possible functions. Many of the test setting are confgurable at the top of the src/main.c file.
In order to have a better understanding of what is going on below, we will first explain some general things about the project.
The structure that we created was made with the following things in mind:
-
<a id="reliability_subsection"></a>Reliability<br>We wanted to make sure that the data as well as the table are secure. So while we use the hashtable directly from the memory of the system, we also keep an exact copy of it in the disk. This way, if the system crashes, we can recover the data from the disk. -
Performance
<br>We also kept in mind that this structure may be used for tons of data so no slow downs are allowed. For this reason we keep the hashtable in the system's memory and only update it to the disk when it is necessary. This way we ensure the fast speed of both reading and writing data for the majority of the actions.Another optimization that we made is to always keep the metadata of the file pinned in the memory. This way we avoid the overhead of pinning and unpinning it every time we want to use it. ( It is used alot XD )
-
Space efficiency
<br>We wanted to make sure that we use the minimum amount of space possible. To achieve this, we try to utilize the allocated blocks to the maximum. For example, when writing the hash table to the disk, we first write in the pre allocated blocks that we already had the old hash table written on. If there is not enough space in these blocks, we allocate new ones and write the hash table there.This forces as to use a list structure for the hash table blocks so that we can easily add new blocks to the end of the list as well as to keep track of the blocks that we have already used as they are among all the data blocks of the file.
NOTE : Any function that returns an int returns
HT_OKon success andHT_ERRORon failure.
-
HT_Init
<br>This function is used to initialize the BF layer which we will use for writting to the disk. It also creates a table in the memory with the information of the files that are open. These information include the file descriptor, the name of the file and a pointer to the hash table of the file in the memory. The table is global to the file with all HT functions because we want only these functions to have access to it.The structure of the file table is the following:
typedef struct { int file_desc; char* filename; HashTableCell* hash_table; }
-
HT_CreateIndex
<br>This function is used for the creation of a new Hash Table file. It creates a new file with the given name and then initializes the metadata of the file which are stored in the first block of the file.The file metadata structure is the following:
typedef struct { // General info FileType type; int number_of_records_per_block; int global_depth; // Hash table info int first_hash_table_block_id; int number_of_hash_table_blocks; int cells_per_hash_block; } HT_info;
In any case of failure (ex. File is already open) an error message is printed in stderr and the the function returns
HT_ERROR
-
HT_OpenIndex
<br>This function is responsible for loading an Index with a specific name from the disk. For later use, it pins the first block of the file which contains the metadata of the file and it leaves it pinned until the file is closed. Also if the file contains a hash table, it loads it in the memory from the file using the LoadTableFromDisk. Finally, it returns the index of the file in the file tableAlong with all the above, the function performs some checks to see if the file is already open or if the file is a hash table file. If any of these checks fail, the function returns
HT_ERROR.
-
HT_CloseFile
<br>This function is used to close an open Hash Table file with a given index in the file table. It sets as dirty the metadata block of the file and unpins it. After that, it closes the file and removes it from the file table, freeing the memory that was used.
-
<a id="function_ht_insertentry"></a>HT_InsertEntry<br>This function inserts a new record in the Hash Table file with a given index in the file table. It uses a FNV-1a hash_function to hash the record IDOf this hash value only the N most significant bits are used, where N is the global depth of the hash table. This way we can use the same hash function for all the hash tables and we can change the global depth of the hash table without having to rehash all the records.
There are 3 cases when inserting a record:
-
Case 1 : There is no block for the hash value so we allocate a new one and insert the record there.
-
Case 2 : The block for the hash value has space for the record so we insert it there.
-
Case 3 : The block for the hash value is full so we split it with SplitBlock and rehash all the records that were in this block using RehashRecords. Τhe new record gets rehashed as well and inserted in the right block. When splitting a block, we may have to double the size of the hash table using DoubleHashTable. We repeatidly do this until until we either can insert the record in a block or
MAX_SPLITSamout of splits have reached.In the latter case we give up because this can lead to an infinite loop of splits if all the records hash in the same value repeaditly. This is a very rare case but we have to take it into account.
After inserting the record, we set as dirty the block that we inserted the record in and we unpin it. If anything goes wrong, we return
HT_ERROR.
-
-
HT_PrintAllEntries
<br>The function is used to print all the records of a file with record.id equal to the given id. If the id isNULLthen it prints all the records of the file. This is done by iterating through all the blocks of the hash table.
-
HashStatistics
<br>Given a name, print some statistics for the file.In case the file is not open , the function opens it to get the statistics and then closes it again.
The statistics include the following:
- Number of blocks that the file has
- how many of them are used for the hash table
- how many for the records
- how many for metadata
- Maximum amout of records that a block contains
- Minimum amount of records that a block contains
- Average amount of records stored in a block among all the blocks.
- Number of blocks that the file has
-
<a id="function_splitblock"></a>SplitBlock<br>This function splits buddies into two blocks. It is used when we have to split a block because it is full. This function also calls DoubleHashTable if the global depth of the hash table is equal to the local depth of the block that we are splitting. In other words we double the size of the hash table if the block that fills up does not have any buddies.<br>After the allocation of the new block we call the RehashRecords function to rehash all the records of the old block to the old and new one. Finally, we update the metadata of the file and we set as dirty the old and the new block.
-
<a id="function_rehashrecords"></a>RehashRecords<br>This function is responsible for rehashing all the records of a block. It is used when we split a block with SplitBlock and we rehash all the records of the old block to the old and new one.
An important technical note is that this function takes as input two pointers to the block data in order to avoid opening and closing the blocks again.
-
<a id="function_doublehashtable"></a>DoubleHashTable<br>This function is used to double the size of the hash table. It is used when we need to split a block (SplitBlock) but the block does not have any buddies. In this case we have to double the size of the hash table so that we can split the block.
This function is also used to create a hash table if the
old_hash_tablepointer isNULLbut we recomend using the function CreateHashTable for that operation.NOTE : As said above (Reliability section), this function updates both the memory hash table and the disk hash table.
-
<a id="function_createhashtable"></a>CreateHashTable<br>This function is used to create a hash table both on the disk and the memory. It is used when we add the first entry to a file (HT_InsertEntry) and we need to create a hash table for the file.
TECHNICAL NOTE : This is not an actual function. It actually calls the function DoubleHashTable with the arguments a little bit changed.
-
<a id="function_loadtablefromdisk"></a>LoadTableFromDisk<br>This is a helper function responsible for loading the hash table stored in the file to the main mamory when the file gets opened.IMPORTANT: This function will return NULL if any error occurs and an error message will be shown. BUT in the current state of the program this is wanted whenever a new file is opened. This is because we want to initailize the pointer to the hash table to NULL in the new file. So be careful when reading errors from this function.