# Lightweight DBMS with B+ Tree Index - Report

## Introduction

This report documents the implementation and performance analysis of a lightweight database management system (DBMS) that uses a B+ Tree index for efficient data storage and retrieval. The B+ Tree is a self-balancing tree structure that enhances performance in disk-based and memory-based data management by optimizing search, insertion, and deletion operations.

The main components of our implementation include:
1. A B+ Tree implementation with insertion, deletion, search, and range query operations
2. A brute force approach for comparison
3. A database manager for creating and managing tables
4. Performance analysis tools for comparing the B+ Tree with the brute force approach
5. Visualization tools for the B+ Tree structure

In [None]:
# Import necessary libraries
import sys
import os
import random
import time
import matplotlib.pyplot as plt
import numpy as np

# Import our modules
from database.bplustree import BPlusTree
from database.bruteforce import BruteForceDB
from database.performance_analyzer import PerformanceAnalyzer
from database.visualizer import BPlusTreeVisualizer
from database.db_manager import Database
from database.table import Table

## Implementation

### B+ Tree Implementation

Our B+ Tree implementation consists of the following classes:
- `BPlusTreeNode`: Base class for B+ Tree nodes
- `BPlusTreeLeafNode`: Leaf node that stores keys and values
- `BPlusTreeInternalNode`: Internal node that stores keys and child nodes
- `BPlusTree`: Main class that provides the B+ Tree functionality

The B+ Tree supports the following operations:
- Insertion: Insert a key-value pair into the tree
- Deletion: Delete a key-value pair from the tree
- Search: Find a value by key
- Range Query: Find all key-value pairs within a range

Let's create a small B+ Tree and visualize it:

In [None]:
# Create a B+ Tree
tree = BPlusTree(order=4)

# Insert some key-value pairs
for i in range(10):
    tree.insert(i, f"value_{i}")

# Visualize the tree
visualizer = BPlusTreeVisualizer(tree)
visualizer.visualize('b_plus_tree_example')

# Display the image
from IPython.display import Image
Image(filename='b_plus_tree_example.png')

### Database Implementation

Our database implementation consists of the following classes:
- `Table`: Represents a table in the database
- `Database`: Manages tables and provides persistence

Let's create a database and some tables:

In [None]:
# Create a database
db = Database('test_db')

# Create a users table
users_schema = {
    'id': 'int',
    'name': 'str',
    'email': 'str',
    'age': 'int'
}
users_table = db.create_table('users', users_schema, 'id')

# Insert some users
users_table.insert({'id': 1, 'name': 'Alice', 'email': 'alice@example.com', 'age': 30})
users_table.insert({'id': 2, 'name': 'Bob', 'email': 'bob@example.com', 'age': 25})
users_table.insert({'id': 3, 'name': 'Charlie', 'email': 'charlie@example.com', 'age': 35})

# Create a posts table
posts_schema = {
    'id': 'int',
    'user_id': 'int',
    'title': 'str',
    'content': 'str'
}
posts_table = db.create_table('posts', posts_schema, 'id')

# Insert some posts
posts_table.insert({'id': 1, 'user_id': 1, 'title': 'First Post', 'content': 'Hello, world!'})
posts_table.insert({'id': 2, 'user_id': 1, 'title': 'Second Post', 'content': 'This is my second post.'})
posts_table.insert({'id': 3, 'user_id': 2, 'title': 'Bob\'s Post', 'content': 'This is Bob\'s post.'})

# Save the database
db.save()

# List tables
print(f"Tables in the database: {db.list_tables()}")

# Query some data
print("\nUser with ID 2:")
print(users_table.select(2))

print("\nPosts by user with ID 1:")
for post in posts_table.select_where(lambda p: p['user_id'] == 1):
    print(post)

## Performance Analysis

We'll now compare the performance of the B+ Tree with a brute force approach for various operations:

In [None]:
# Create a performance analyzer
analyzer = PerformanceAnalyzer(b_plus_tree_order=4)

# Define the data sizes to benchmark
sizes = [100, 500, 1000, 5000, 10000]

# Run the benchmarks
results = analyzer.run_benchmarks(sizes, num_samples=3)

# Plot the results
analyzer.plot_results(sizes, save_path='performance_results.png')

# Display the image
Image(filename='performance_results.png')

### Analysis of Results

Let's analyze the performance results:

1. **Insertion Time**: The B+ Tree insertion time is generally higher than the brute force approach for small data sizes due to the overhead of maintaining the tree structure. However, as the data size increases, the difference becomes less significant.

2. **Search Time**: The B+ Tree significantly outperforms the brute force approach for search operations, especially as the data size increases. This is because the B+ Tree has a time complexity of O(log n) for search operations, while the brute force approach has a time complexity of O(n).

3. **Range Query Time**: The B+ Tree also outperforms the brute force approach for range queries, especially for large data sizes. This is because the B+ Tree can efficiently find all keys within a range by traversing the leaf nodes.

4. **Deletion Time**: The B+ Tree deletion time is generally higher than the brute force approach due to the overhead of maintaining the tree structure. However, the difference is not as significant as for insertion operations.

5. **Random Operations**: For a mix of random operations, the B+ Tree generally outperforms the brute force approach, especially as the data size increases.

6. **Memory Usage**: The B+ Tree generally uses more memory than the brute force approach due to the overhead of storing the tree structure. However, the difference is not significant for small data sizes.

## Visualization

Let's visualize the B+ Tree structure of our database tables:

In [None]:
# Visualize the users table index
users_visualizer = BPlusTreeVisualizer(users_table.index)
users_visualizer.visualize('users_table_index')

# Display the image
Image(filename='users_table_index.png')

In [None]:
# Visualize the posts table index
posts_visualizer = BPlusTreeVisualizer(posts_table.index)
posts_visualizer.visualize('posts_table_index')

# Display the image
Image(filename='posts_table_index.png')

## Conclusion

In this project, we implemented a lightweight database management system with a B+ Tree index. We compared the performance of the B+ Tree with a brute force approach and found that the B+ Tree generally outperforms the brute force approach for search and range query operations, especially as the data size increases.

The main advantages of the B+ Tree are:
1. Efficient search operations with a time complexity of O(log n)
2. Efficient range queries by traversing the leaf nodes
3. Self-balancing structure that maintains good performance as the data size increases

The main disadvantages are:
1. Higher memory usage due to the overhead of storing the tree structure
2. Higher insertion and deletion times due to the overhead of maintaining the tree structure

Overall, the B+ Tree is a good choice for database indexing, especially for applications that require efficient search and range query operations.

## Future Improvements

Some potential improvements to our implementation include:
1. Support for secondary indexes
2. Support for transactions
3. Support for more complex queries (e.g., joins)
4. Improved persistence with write-ahead logging
5. Better memory management to reduce the memory overhead of the B+ Tree