Binary Search Trees, Traversals and Balancing

Let's create user profiles and a data structure that can store 100 million records, insert, search and update the list of operations efficiently. 

In [2]:
#simple example; generic blueprint of a user
class user:
    pass

In [40]:
#instance of a user
user1 = user()

In [4]:
#how to call the userand verify its type with the following two calls.
user

__main__.user

In [5]:
type(user1)

__main__.user

We need to use a constructor to add useful information to the user class. This is a blueprint for our people who are considered objects in Python. Yes, Python objectifies people. It's nothing personal. Except the introduce yourself method. That's literally one person talking to another person. That's personal.

In [42]:
class User:
    def __init__(self, username, name, email) -> None:
        self.username = username
        self.name = name
        self.email = email
        print("There you go. You made a user. Treat your user well.")

    def introduce_yourself(self, guest_name):
        print("Hi {}, I'm {}! Contact me at {} .".format(guest_name, self.name, self.email))

In [35]:
user2 = User('Paddy', 'Paddy the Baddy', 'paddy@bakerman.com')

We can call user2 like user1.

In [8]:
user2

<__main__.User at 0x10be00a60>

We can call one of the properties with a '.' and specify which property after:

In [23]:
user2.name 

'Paddy the Baddy'

In [34]:
user3 = User('Patty', 'Patty Cakes', 'patty@cakes.com')

In [33]:
user3.introduce_yourself('Chad')

Hi Chad, I'm Patty Cakes! Contact me at patty@cakes.com .


The user was automatically passed above, but you can explicitely state the user in parenthesis as well. Let's add a helper method to our User class.

In [47]:
class User:
    def __init__(self, username, name, email):
        self.username = username
        self.name = name
        self.email = email
        
    def __repr__(self):
        return "User(username='{}', name='{}', email='{}')".format(self.username, self.name, self.email)
    
    def __str__(self):
        return self.__repr__()

In [30]:
user4 = User('sumguy', 'Sumguy Sumone', 'sumguy@nobody.com')
user4

User(username='sumguy', name='Sumguy Sumone', email='sumguy@nobody.com')

Now we can see the keys more clearly for the values that we entered. Think of some ways that this could be helpful (UI/UX to name a couple). Next, we'll make a user database for our example users.

In [62]:
class UserDatabase:
    def insert(self, user):
        pass
    def find(self, username):
        pass
    def update(self, user):
        pass
    def list_all(self):
        pass

In [55]:
user1 = User('God', 'God Allah', 'god@heaven.com')

user1, user2, user3, user4

(User(username='God', name='God Allah', email='god@heaven.com'),
 User(username='Paddy', name='Paddy the Baddy', email='paddy@bakerman.com'),
 User(username='Patty', name='Patty Cakes', email='patty@cakes.com'),
 User(username='sumguy', name='Sumguy Sumone', email='sumguy@nobody.com'))

Remember that you have to run the class again in the notebook if you instantiated your first user here like I did. Otherwise, your user will output an address, and people don't like to be called 0x123456678, so God probably wouldn't like that either. We cannot put the names in a list of users like below, though...

In [56]:
users = [God, Paddy, Patty, sumguy]

NameError: name 'God' is not defined

We need to set their data equal to their usernames first. Let's do that with sample data from Jovian, because it's quicker than thinking of more names off the top of my head:

In [57]:
aakash = User('aakash', 'Aakash Rai', 'aakash@example.com')
biraj = User('biraj', 'Biraj Das', 'biraj@example.com')
hemanth = User('hemanth', 'Hemanth Jain', 'hemanth@example.com')
jadhesh = User('jadhesh', 'Jadhesh Verma', 'jadhesh@example.com')
siddhant = User('siddhant', 'Siddhant Sinha', 'siddhant@example.com')
sonaksh = User('sonaksh', 'Sonaksh Kumar', 'sonaksh@example.com')
vishal = User('vishal', 'Vishal Goel', 'vishal@example.com')

In [58]:
users = [aakash, biraj, hemanth, jadhesh, siddhant, sonaksh, vishal]

Now we have overwritten our sloppy data with the clean samples. You can access different properties for each if you call the username.whateverpropertyyouwanthere

In [59]:
#forexmaple
aakash.email

'aakash@example.com'

Or print all of the information

In [60]:
aakash


User(username='aakash', name='Aakash Rai', email='aakash@example.com')

In [61]:
users

[User(username='aakash', name='Aakash Rai', email='aakash@example.com'),
 User(username='biraj', name='Biraj Das', email='biraj@example.com'),
 User(username='hemanth', name='Hemanth Jain', email='hemanth@example.com'),
 User(username='jadhesh', name='Jadhesh Verma', email='jadhesh@example.com'),
 User(username='siddhant', name='Siddhant Sinha', email='siddhant@example.com'),
 User(username='sonaksh', name='Sonaksh Kumar', email='sonaksh@example.com'),
 User(username='vishal', name='Vishal Goel', email='vishal@example.com')]

We can only list sample outputs once we impliment our data structure. That'll happen shortly.

Let's come up with a simple solution first. Impliment the various functions:

1. Insert: Loop through the list and add the user at a position that keeps it sorted.
2. Find: Loop through the list and find the user with the matching username and query.
3. Update: Loop throughh the list, find the user object matching the query and update with new details.
4. List: Return the list whenever you want to list all of the users.

Tip: since usernames are strings, we can compare them useing <, >, or ==. This will allow us to impliment the functions easily. The code will be pretty simple as well.

In [74]:
class UserDatabase:
    def __init__(self):
        self.users = []
    
    def insert(self, user):
        i = 0
        while i < len(self.users):
            #compare the username until one is greater than the new username
            if self.users[i].username > user.username:
                break
            i += 1
        self.users.insert(i, user)
    
    def find(self, username):
        for user in self.users:
            if user.username == username:
                return user
    
    def update(self, user):
        target = self.find(user.username)
        target.name, target.email = user.name, user.email
        
    def list_all(self):
        return self.users

Instatiate that to create a new user database. Note that you can't use the users before the sample code, and if you haven't indented your methods, you won't be able to insert them either!

In [75]:
database = UserDatabase()

In [76]:
database.insert(hemanth)
database.insert(aakash)
database.insert(biraj)

Retrieve and call one of them:

In [78]:
user = database.find('hemanth')
user

User(username='hemanth', name='Hemanth Jain', email='hemanth@example.com')

In [81]:
database.update(User(username = 'hemanth', name = 'Hemanth J', email = 'hemanth@anotherexample.com'))

In [82]:
database.list_all()

[User(username='aakash', name='Aakash Rai', email='aakash@example.com'),
 User(username='biraj', name='Biraj Das', email='biraj@example.com'),
 User(username='hemanth', name='Hemanth J', email='hemanth@anotherexample.com')]

In [84]:
database.insert(siddhant)

In [85]:
database.list_all()

[User(username='aakash', name='Aakash Rai', email='aakash@example.com'),
 User(username='biraj', name='Biraj Das', email='biraj@example.com'),
 User(username='hemanth', name='Hemanth J', email='hemanth@anotherexample.com'),
 User(username='siddhant', name='Siddhant Sinha', email='siddhant@example.com')]

You can test and use more methods by adding a new cell of code to run the various methods and update properties, or add your own! Now we should analyze the complexity and identify the inefficiencies. 

Time complexities of our various operations are:
1. Insert: O(N)
2. Find: O(N)
3. Update: O(N)
4. List all: O(1)

All of them have linear complexity, except listing all will always return the list with one iteration through the list. This is a constant time operation. Space is O(1) for all operations, but is time complexity optimized enough? No, because there are 100 million users on the platform.

In [86]:
%%time
for i in range(100000000):
    j = i * 1

CPU times: user 7.27 s, sys: 30.9 ms, total: 7.3 s
Wall time: 7.34 s


We would never want 10-15 second profile loads. People would stop using the application, so we need to optimize this. Let's choose a better data structure, so we can be senior engineers.