## Trees and Heaps 
Trees and heaps help with data organization, efficient searching, and prioritization. 

Trees are used in hierarchical data representations (e.g., decision trees in machine learning). Binary Search Trees (BST) are a special class of trees that allow efficient searching, insertion, and deletion.  Today we'll look at an example. Tree traversal (preorder, inorder, postorder) is useful in many algorithms.  Tree traversals are always written recursively.

Heaps are efficient for priority queues (used in scheduling).  You can think of priority queues as "lines" indicating the task or data to process next. Min-heaps and max-heaps help in quick retrieval of the smallest/largest elements.  Heaps are used in important algorithms like Dijkstra’s shortest path and Heap Sort.

### Binary Search Tree (BTS) 

A BST is a hierarchical data structure where:

* Each node has at most two children.
* The left child contains values less than the parent node.
* The right child contains values greater than the parent node.

When to use BSTs: Fast searching and sorting, e.g., searching for a record in sorted datasets, decision trees, indexing data, efficient search within ranges.  (Note: not as fast as hash tables/dictionaries for lookups)







In [9]:
class Node:
    def __init__(self, key):
        self.key = key
        self.left = None
        self.right = None

class BST:
    def __init__(self):
        self.root = None

    def insert(self, key):
        self.root = self._insert_recursive(self.root, key)

    def _insert_recursive(self, root, key):
        if root is None:
            return Node(key)
        if key < root.key:
            root.left = self._insert_recursive(root.left, key)
        else:
            root.right = self._insert_recursive(root.right, key)
        return root

    def inorder_traversal(self):
        if self.root:
            self.inorder_traversal_aux(self.root)


    def inorder_traversal_aux(self, subroot):
        if subroot:
            self.inorder_traversal_aux(subroot.left)
            print(subroot.key, end=" ")
            self.inorder_traversal_aux(subroot.right)


# Example usage:
bst = BST()
for num in [50, 30, 70, 20, 40, 60, 80]:
    bst.insert(num)

bst.inorder_traversal()  # Outputs: 20 30 40 50 60 70 80

20 30 40 50 60 70 80 

In [None]:
#What is the big-O?


### Heaps (Priority Queues)

A heap is a specialized tree-based data structure:

* A Min-Heap ensures that the smallest element is always at the root.
* A Max-Heap ensures that the largest element is always at the root.

Used for priority queues, sorting, and efficient data retrieval.  Behind the scenes a heap is usually stored as an array.  You'd learn more about this in a CS class.

Advantage: Used in finding the smallest/largest k elements, scheduling tasks, and efficient sorting.

In [13]:
import heapq

# Create a Min-Heap
heap = []
heapq.heappush(heap, 10)
heapq.heappush(heap, 5)
heapq.heappush(heap, 30)
heapq.heappush(heap, 20)

print(heapq.heappop(heap))  # Outputs: 5 (smallest element)

# Convert a list into a heap
data = [15, 3, 8, 20, 1]
heapq.heapify(data)

print(heapq.heappop(data))  # Outputs: 1 (smallest element)



5
1


### More heap operations

**heapq.heapify()** is a special funtion that transforms the list into a Min-Heap.

**heapq.heappop()** retrieves the smallest element and takes it off the heap.


In [17]:
import heapq

# Product prices dataset
prices = [100, 50, 30, 20, 80, 60, 10, 90]

# Convert list to a min-heap
heapq.heapify(prices)

# Extract 3 cheapest products
cheapest_products = [heapq.heappop(prices) for i in range(3)]
print("3 Cheapest Products:", cheapest_products)  # Outputs: [10, 20, 30]

3 Cheapest Products: [10, 20, 30]


###  How and when to use a heap in data processing

Heaps are useful when dealing with large datasets where efficiency is critical. Here are some common examples that come to mind:

1. Finding the Top K Elements
* Example: Top 10 highest-grossing movies in a large dataset.
* Solution: Use a Max-Heap to efficiently retrieve the top K elements.


2. Finding the Smallest K Elements
* Example: Finding the cheapest 5 products in an online marketplace.
* Solution: Use a Min-Heap for efficient extraction.

3. Internet Data Processing
* Example: Tracking the top 10 most frequent words in live tweets.
* Solution: Use a Heap-based priority queue to maintain a list of the top K elements.

4. Dijkstra’s Algorithm for Shortest Path (more on this later)
* Example: Google Maps shortest route calculation.
* Solution: Use a Min-Heap to always expand the shortest available path.

Heaps help us quickly extract important information from large datasets. We'll look at an example below to find the most frequent words in customer reviews.  This turns out to be an essential step in sentiment analysis, keyword extraction, and text mining.



In [None]:
from collections import Counter
import heapq


reviews = [
    "great product excellent quality",
    "terrible experience not recommended",
    "excellent quality and great price",
    "not great not terrible",
    "quality is good but not excellent"
]



# Count word frequency
word_counts = Counter(" ".join(reviews).split())  

print(word_counts)



# Extract top 5 most frequent words using a heap
top_k = 5
heap = heapq.nlargest(top_k, word_counts.items(), key=lambda x: x[1])

print("Top 5 most frequent words:", heap)

In [46]:
help(heapq.nlargest)




Help on function nlargest in module heapq:

nlargest(n, iterable, key=None)
    Find the n largest elements in a dataset.

    Equivalent to:  sorted(iterable, key=key, reverse=True)[:n]



### More heaqp1.nlargest()

The function heapq.nlargest(n, iterable, key=None) returns the n largest elements from an iterable using a heap.

In the above we used three parameters:
1. top_k

This is the number of largest elements we want. In our example, top_k = 5, so we are finding the 5 most frequent words.

2.  word_counts.items()

word_counts is a dictionary, where keys are words and values are their frequencies.
.items() converts this into a list of tuples.  (For example, [('great', 3), ('excellent', 3), ('quality', 3), ('not', 3), ('terrible', 2), ...])

3. key=lambda x: x[1]

* The key parameter tells Python how to compare elements.
* lambda x: x[1] means:
x is each tuple (word, frequency).
x[1] is the frequency value.
This tells heapq.nlargest() to compare tuples based on the frequency (not the word itself).

### lambda

Okay, cool.  I think I got it but what in the world is lambda?

A lambda function in Python is an unnamed function that you can define in a single line. It’s often used for short, simple functions that don’t need a full def statement.

The syntax is: **lambda arguments:expression**  where lambda defines the expression, the arguments are the input values and the expression is the return value.


In [5]:
double = lambda x: x * 2
print(double(5)) 

#using with sorting
data = [("apple", 3), ("banana", 1), ("cherry", 2)]
sorted_data = sorted(data, key=lambda x: x[1])
print(sorted_data)  # Output: [('banana', 1), ('cherry', 2), ('apple', 3)]

#using with filtering
numbers = [1, 2, 3, 4, 5, 6]
even_numbers = list(filter(lambda x: x % 2 == 0, numbers))
print(even_numbers)  # Output: [2, 4, 6]

#using it with map
numbers = [1, 2, 3, 4]
squared = list(map(lambda x: x ** 2, numbers))
print(squared)  # Output: [1, 4, 9, 16]

10


### Sorting verus Heaps
Sorting in the best case is O($n log n$).  Using a heap has an advantage if you want to extract min/max values in that the heap can be dynamically updated as elements are added to the heap.

In [38]:
#Let's talk about the effiency of heaps in greater detail
#heapify():  Creating a heap
#Delete Max/Min:
# Add an element: