## Problem



- As a senior backend engineer at Jovian, you are tasked with developing a fast in-memory data structure to manage profile information (username, name and email) for 100 million users.

-  It should allow the following operations to be performed efficiently:


> 1. **Insert** the profile information for a new user.

> 2. **Find** the profile information of a user, given their username

> 3. **Update** the profile information of a user, given their usrname

> 5. **List** all the users of the platform, sorted by username

> You can assume that usernames are unique.


## The Method

1. State the problem clearly. Identify the input & output formats.

2. Come up with some example inputs & outputs. Try to cover all edge cases.

3. Come up with a correct solution for the problem. State it in plain English.

4. Implement the solution and test it using example inputs. Fix bugs, if any.

5. Analyze the algorithm's complexity and identify inefficiencies, if any.

6. Apply the right technique to overcome the inefficiency. Repeat steps 3 to 6.

## 1. State the problem clearly. Identify the input & output formats.

#### Problem

> We need to create a data structure which can store 100 million records and perform insertion, search, update and list operations efficiently.

#### Input

- user profiles
    
    - the username, 
    
    - name and 
    
    - email of a user.

A Python _class_ would be a great way to represent the information for a user.

In [1]:
class User:
    def __init__(self, username, name, email):
        self.username = username
        self.name = name
        self.email = email
        print('User created!')
        
user2 = User('john', 'John Doe', 'john@doe.com')
user2

User created!


<__main__.User at 0x7b549c501c70>

We can access the properties of the object using the `.` notation.

In [2]:
user2.name

'John Doe'

In [3]:
user2.email, user2.username

('john@doe.com', 'john')

You can also define custom methods inside a class.

In [4]:
class User:
    def __init__(self, username, name, email):
        self.username = username
        self.name = name
        self.email = email

    def introduce_yourself(self, guest_name):
        print(f"Hi {guest_name}, I'm {self.name}! Contact me at {self.email} .")

user3 = User('jane', 'Jane Doe', 'jane@doe.com')
user3.introduce_yourself('David')

Hi David, I'm Jane Doe! Contact me at jane@doe.com .


In [5]:
class User:
    def __init__(self, username, name, email):
        self.username = username
        self.name = name
        self.email = email

    def __repr__(self):
        return f"User (username='{self.username}', name='{self.name}', email='{self.email}')"

    def __str__(self):
        return self.__repr__()


user4 = User('jane', 'Jane Doe', 'jane@doe.com')
user4

User (username='jane', name='Jane Doe', email='jane@doe.com')

## 2. Come up with some example inputs & outputs.

In [6]:
peter = User('peter', 'peter', 'peter@example.com')
joseph = User('joseph', 'joseph', 'joseph@example.com')
simon = User('simon', 'peter', 'simon@example.com')
marion = User('marion', 'william', 'marion@example.com')
joy = User('joy', 'gloria', 'joy@example.com')
samuel = User('samuel', 'sam', 'samuel@example.com')
stephen = User('stephen', 's', 'stephen@example.com')

users = [peter, joseph, simon, marion, joy, samuel, stephen]

We can access different fields within a user profile object using the `.` (dot) notation.

In [7]:
samuel.username, samuel.email, samuel.name

('samuel', 'samuel@example.com', 'sam')

We can also view a string representation of the object, since defined the `__repr__` and `__str__` methods

In [8]:
print(samuel)

User (username='samuel', name='sam', email='samuel@example.com')


In [9]:
users

[User (username='peter', name='peter', email='peter@example.com'),
 User (username='joseph', name='joseph', email='joseph@example.com'),
 User (username='simon', name='peter', email='simon@example.com'),
 User (username='marion', name='william', email='marion@example.com'),
 User (username='joy', name='gloria', email='joy@example.com'),
 User (username='samuel', name='sam', email='samuel@example.com'),
 User (username='stephen', name='s', email='stephen@example.com')]

## 3. Come up with a correct solution. State it in plain English.

we store the `User` objects in a list sorted by usernames.

The various functions can be implemented as follows:


1. **Insert**: Loop through the list and add the new user at a position that keeps the list sorted.

2. **Find**: Loop through the list and find the user object with the username matching the query.

3. **Update**: Loop through the list, find the user object matching the query and update the details

4. **List**: Return the list of user objects.

We can use the fact usernames, which are are strings can be compared using the `<`, `>` and `==` operators in Python.

In [10]:
'biraj' < 'hemanth'

True

## 4. Implement the solution and test it using example inputs.

The code for implementing the above solution is also fairly straightfoward.

In [11]:
class UserDatabase:
    def __init__(self):
        self.users = []

    def insert(self, user):
        i = 0
        while i < len(self.users):
            # Find the first username greater than the new user's username
            if self.users[i].username > user.username:
                break
            i += 1
        self.users.insert(i, user)

    def find(self, username):
        for user in self.users:
            if user.username == username:
                return user

    def update(self, user):
        target = self.find(user.username)
        target.name, target.email = user.name, user.email

    def list_all(self):
        return self.users

We can create a new database of users by _instantiating_ and object of the `UserDatabase` class.

In [12]:
database = UserDatabase()

database.insert(samuel)
database.insert(joy)
database.insert(stephen)

user = database.find('joy')
user

User (username='joy', name='gloria', email='joy@example.com')

Let's try changing the information for a user

In [13]:
database.update(User(username='joy', name='mj', email='mj@example.com'))

user = database.find('mj')
user

In [14]:
database.list_all()

[User (username='joy', name='mj', email='mj@example.com'),
 User (username='samuel', name='sam', email='samuel@example.com'),
 User (username='stephen', name='s', email='stephen@example.com')]

Let's verify that a new user is inserted into the correct position.

In [15]:
database.insert(simon)
database.list_all()

[User (username='joy', name='mj', email='mj@example.com'),
 User (username='samuel', name='sam', email='samuel@example.com'),
 User (username='simon', name='peter', email='simon@example.com'),
 User (username='stephen', name='s', email='stephen@example.com')]

### 5. Analyze the algorithm's complexity and identify inefficiencies

1. Insert: **O(N)**

2. Find: **O(N)**

3. Update: **O(N)**

4. List: **O(1)**

## 6. An Optimized solution

### A Python-Friendly Treemap

We are now ready to return to our original problem statement.

> **QUESTION 1**: As a senior backend engineer at Jovian, you are tasked with developing a fast in-memory data structure to manage profile information (username, name and email) for 100 million users. It should allow the following operations to be performed efficiently:
>
> 1. **Insert** the profile information for a new user.
> 2. **Find** the profile information of a user, given their username
> 3. **Update** the profile information of a user, given their usrname
> 5. **List** all the users of the platform, sorted by username
>
> You can assume that usernames are unique.



We can create a generic class `TreeMap` which supports all the operations specified in the original problem statement in a python-friendly manner.

In [16]:

# Binary Search tree class (to store key value pairs)
class BSTNode():
    '''
    (key ---> 'jadhesh',
    Value ---> User(username='jadhesh', name='Jadhesh Verma', email='jadhesh@example.com'))

    '''

    def __init__(self, key, value=None):
        self.key = key
        self.value = value
        self.left = None
        self.right = None
        self.parent = None

    # Insert nodes into the tree
    def insert(self, key, value):
        if self is None:
            self = BSTNode(key, value)
        elif key < self.key:
            self.left = BSTNode.insert(self.left, key, value)
            self.left.parent = self
        elif key > self.key:
            self.right = BSTNode.insert(self.right, key, value)
            self.right.parent = self
        return self

    # Display the tree
    def display_keys(self, space='\t', level=0):
        # print(node.key if node else None, level)

        # If the node is empty
        if self is None:
            print(space*level + '|')
            return

        # If the node is a leaf
        if self.left is None and self.right is None:
            print(space*level + str(self.key))
            return

        # If the self has children
        BSTNode.display_keys(self.right, space, level+1)
        print(space*level + str(self.key))
        BSTNode.display_keys(self.left, space, level+1)

    # Height of the tree
    def tree_height(self):
        if self is None:
            return 0
        return 1 + max(BSTNode.tree_height(self.left), BSTNode.tree_height(self.right))

    # Find a node in the tree
    def find(self, key):
        if self is None:
            return None
        if key == self.key:
            return self
        if key < self.key:
            return BSTNode.find(self.left, key)
        if key > self.key:
            return BSTNode.find(self.right, key)

    # Update the tree
    def update(self, key, value):
        target = BSTNode.find(self, key)
        if target is not None:
            target.value = value

    # List key value pairs
    def list_all(self):
        if self is None:
            return []
        return BSTNode.list_all(self.left) + [(self.key, self.value)] + BSTNode.list_all(self.right)

    # Find if tree is balanced
    def is_balanced(self):
        if self is None:
            return True, 0
        balanced_l, height_l = BSTNode.is_balanced(self.left)
        balanced_r, height_r = BSTNode.is_balanced(self.right)
        balanced = balanced_l and balanced_r and abs(height_l - height_r) <= 1
        height = 1 + max(height_l, height_r)
        return balanced, height

    @staticmethod
    def make_balanced_bst(data, lo=0, hi=None, parent=None):
        if hi is None:
            hi = len(data) - 1
        if lo > hi:
            return None

        mid = (lo + hi) // 2
        key, value = data[mid]

        root = BSTNode(key, value)
        root.parent = parent
        root.left = BSTNode.make_balanced_bst(data, lo, mid-1, root)
        root.right = BSTNode.make_balanced_bst(data, mid+1, hi, root)

        return root

    def balance_bst(self):
        '''
        Sorts the tree and returns a sorted list --> list_all
        makes a balanced BST from a sorted list ---> make_balances_bst
        '''
        return BSTNode.make_balanced_bst(BSTNode.list_all(self))
    
    def tree_size(self):
        if self is None:
            return 0
        return 1 + BSTNode.tree_size(self.left) + BSTNode.tree_size(self.right)


In [17]:
class TreeMap():
    def __init__(self):
        self.root = None

    def __setitem__(self, key, value):
        node = BSTNode.find(self.root, key)
        if not node:
            self.root = BSTNode.insert(self.root, key, value)
            self.root = BSTNode.balance_bst(self.root)
        else:
            BSTNode.update(self.root, key, value)


    def __getitem__(self, key):
        node = BSTNode.find(self.root, key)
        return node.value if node else None

    def __iter__(self):
        return (x for x in BSTNode.list_all(self.root)) # this creates a generator

    def __len__(self):
        return BSTNode.tree_size(self.root)

    def display(self):
        return BSTNode.display_keys(self.root)

**Exercise**: What is the time complexity of `__len__`? Can you reduce it to **O(1)**. Hint: Modify the `BSTNode` class.

Let's try using the `TreeMap` class below.

In [18]:
users

[User (username='peter', name='peter', email='peter@example.com'),
 User (username='joseph', name='joseph', email='joseph@example.com'),
 User (username='simon', name='peter', email='simon@example.com'),
 User (username='marion', name='william', email='marion@example.com'),
 User (username='joy', name='mj', email='mj@example.com'),
 User (username='samuel', name='sam', email='samuel@example.com'),
 User (username='stephen', name='s', email='stephen@example.com')]

In [19]:
treemap = TreeMap()

In [20]:
treemap.display()

|


In [21]:
aakash = User('aakash', 'Aakash Rai', 'aakash@example.com')
biraj = User('biraj', 'Biraj Das', 'biraj@example.com')
hemanth = User('hemanth', 'Hemanth Jain', 'hemanth@example.com')
jadhesh = User('jadhesh', 'Jadhesh Verma', 'jadhesh@example.com')
siddhant = User('siddhant', 'Siddhant Sinha', 'siddhant@example.com')
sonaksh = User('sonaksh', 'Sonaksh Kumar', 'sonaksh@example.com')
vishal = User('vishal', 'Vishal Goel', 'vishal@example.com')

users = [aakash, biraj, hemanth, jadhesh, siddhant, sonaksh, vishal]

In [22]:
treemap['aakash'] = aakash
treemap['jadhesh'] = jadhesh
treemap['sonaksh'] = sonaksh

In [23]:
treemap.display()

	sonaksh
jadhesh
	aakash


In [24]:
treemap['jadhesh']

User (username='jadhesh', name='Jadhesh Verma', email='jadhesh@example.com')

In [25]:
len(treemap)

3

In [26]:
treemap['biraj'] = biraj
treemap['hemanth'] = hemanth
treemap['siddhant'] = siddhant
treemap['vishal'] = vishal

In [27]:
treemap.display()

		vishal
	sonaksh
		siddhant
jadhesh
		hemanth
	biraj
		aakash


In [28]:
for key, value in treemap:
    print(key, value)

aakash User (username='aakash', name='Aakash Rai', email='aakash@example.com')
biraj User (username='biraj', name='Biraj Das', email='biraj@example.com')
hemanth User (username='hemanth', name='Hemanth Jain', email='hemanth@example.com')
jadhesh User (username='jadhesh', name='Jadhesh Verma', email='jadhesh@example.com')
siddhant User (username='siddhant', name='Siddhant Sinha', email='siddhant@example.com')
sonaksh User (username='sonaksh', name='Sonaksh Kumar', email='sonaksh@example.com')
vishal User (username='vishal', name='Vishal Goel', email='vishal@example.com')


In [29]:
list(treemap)

[('aakash',
  User (username='aakash', name='Aakash Rai', email='aakash@example.com')),
 ('biraj',
  User (username='biraj', name='Biraj Das', email='biraj@example.com')),
 ('hemanth',
  User (username='hemanth', name='Hemanth Jain', email='hemanth@example.com')),
 ('jadhesh',
  User (username='jadhesh', name='Jadhesh Verma', email='jadhesh@example.com')),
 ('siddhant',
  User (username='siddhant', name='Siddhant Sinha', email='siddhant@example.com')),
 ('sonaksh',
  User (username='sonaksh', name='Sonaksh Kumar', email='sonaksh@example.com')),
 ('vishal',
  User (username='vishal', name='Vishal Goel', email='vishal@example.com'))]

In [30]:
# update
treemap['aakash'] = User(username='aakash', name='Aakash N S', email='aakashns@example.com')

In [31]:
treemap['aakash']

User (username='aakash', name='Aakash N S', email='aakashns@example.com')