## HashMaps in Python: Dictionaries

- It's important to remember that while dictionary values can be mutable or immutable types, dictionary keys must be of an immutable type (such as strings, numbers, or tuples).

- **Accessing Elements**: You can retrieve a book's title using its key in a straightforward way: `library_catalog['book1']` would return `'A Tale of Two Cities'`. But what happens if you try to access a key that isn't present in the dictionary? This would result in a KeyError.
    - To prevent such errors, Python dictionaries provide the `get()` method. It fetches the value for a given key if it exists. If it doesn't, it simply returns None.
    - Python dictionaries have a `get(key, default)` method, which is an alternative to checking whether a key already exists in a dictionary. It allows us to fetch the value of a key if it exists, or return a default value specified by you if it doesn't.

In [1]:
# Creating a catalog for the library using dictionaries
library_catalog = {'book1': 'A Tale of Two Cities', 
                   'book2': 'To Kill a Mockingbird', 
                   'book3': '1984'}

# Using get() to access a book's title
book1 = library_catalog.get('book1')
print(book1)  # Output: "A Tale of Two Cities"

# Using get() to access a nonexistent key
nonexistent_book = library_catalog.get('book100')
print(nonexistent_book)  # Output: None

A Tale of Two Cities
None


- **Adding or Updating Elements**: Adding or Updating Elements: Whether you're adding a new book to the catalog or updating an existing book's title, you'll use the assignment operator (`=`). This syntax in Python's dictionaries allows for both updating existing key-value pairs and establishing new ones.
    
    - If the specified key exists in the dictionary, the assigned value replaces the existing one. For updating a title: `library_catalog['book1'] = 'The Tell-Tale Heart'`.

    - If the key doesn't exist in the dictionary yet, the operation creates a new key-value pair. For adding a new book: `library_catalog['book4'] = 'Pride and Prejudice'`.

In [2]:
library_catalog['book4'] = 'Pride and Prejudice'

- **Removing Elements**: If `'book1'` no longer exists, you can remove it using `del library_catalog['book1']`.

In [3]:
del library_catalog['book1']

In [4]:
library_catalog

{'book2': 'To Kill a Mockingbird',
 'book3': '1984',
 'book4': 'Pride and Prejudice'}

## Dictionary Methods: `items(), keys(), values()`, and others

- **Checking for a Key**: Ensure if a given book is present in your catalog using 'book1' in library_catalog.

- **Accessing all Key-Value Pairs:** Use the `items()` method to retrieve all key-value pairs in the dictionary as tuples in a list-like object. This will come in handy when you need to examine all the data you have stored.

- **Accessing all Keys and Values**: The `keys()` and `values()` methods return list-like objects consisting of all keys and all values in the dictionary, respectively.

- Keep in mind that dictionary methods return "list-like" objects, but these aren't actual lists. They provide a dynamic view on the dictionary's entries, which means that any changes to the dictionary will be reflected in these objects. If you need a real list, you can simply transform these list-like objects into an actual list by using `list()` like `list(library_catalog.keys())`.

In [5]:
all_books = library_catalog.items()

# Getting all keys
all_keys = library_catalog.keys()
# all_keys now holds: dict_keys(['book1', 'book2', 'book3'])

# Getting all values
all_values = library_catalog.values()
# all_values now holds: dict_values(['A Tale of Two Cities', 'To Kill a Mockingbird', '1984'])

In [6]:
# Looping over the dictionary
for key, value in library_catalog.items():
    print(key, ":", value)

book2 : To Kill a Mockingbird
book3 : 1984
book4 : Pride and Prejudice


## Traversals: Know these traversals very well !!

In [7]:
colors = ['red', 'blue', 'red', 'green', 'blue', 'blue']

In [8]:
color_dict = {}
for color in colors:
    if color not in color_dict:
        color_dict[color] = 0
    color_dict[color] += 1 # can save time by directly using color_dict.get(color, 0) + 1 (as pointed out below!)

color_dict

{'red': 2, 'blue': 3, 'green': 1}

In [9]:
# Start the loop to iterate over each color
for color in colors:
    # Get the value of the color key if it exists, otherwise use a default value of 0. Then increment the value by 1
    color_dict[color] = color_dict.get(color, 0) + 1

# At the end of the loop, print our dictionary with counts
color_dict
# prints {'red': 2, 'blue': 3, 'green': 1}

{'red': 4, 'blue': 6, 'green': 2}

## Data Aggregations for Quick Summary Statistics

- Data aggregation using HashMaps are invaluable across a vast array of data analysis tasks, such as report generation or decision-making processes.
- `max_key = max(my_dict, key=my_dict.get)`
- `min_key = min(my_dict, key=my_dict.get)`

In [10]:
fruit_basket = {"apples": 5, "bananas": 4, "oranges": 8}
# A dictionary representing our fruit basket

total_fruits = sum(fruit_basket.values())
# Sums up the fruit quantities
print("The total number of fruits in the basket is:", total_fruits)
# It outputs: "The total number of fruits in the basket is: 17"

count_fruits = len(fruit_basket)  # The count operation
print("The number of fruit types in the basket is:", count_fruits)
# It outputs: "The number of fruit types in the basket is: 3"

# Using the `max` function
max_fruit = max(fruit_basket, key=fruit_basket.get)   # returns the "key" that is maximum 
# The expression for finding the maximum
print("The fruit with the most quantity is:", max_fruit)
# It outputs: "The fruit with the most quantity is: oranges"

# Using the `min` function
min_fruit = min(fruit_basket, key=fruit_basket.get)   # returns the "key" that is maximum 
# The expression for finding the minimum
print("The fruit with the least quantity is:", min_fruit)
# It outputs: "The fruit with the least quantity is: bananas"

average_fruits = sum(fruit_basket.values()) / len(fruit_basket)
# The expression for finding the average
print("The average number of each type of fruit in the basket is:", average_fruits)
# It outputs: "The average number of each type of fruit in the basket is: 5.67"

The total number of fruits in the basket is: 17
The number of fruit types in the basket is: 3
The fruit with the most quantity is: oranges
The fruit with the least quantity is: bananas
The average number of each type of fruit in the basket is: 5.666666666666667


---

## Real World Problems - 1

- You are provided with log data from a library's digital system, stored in string format. The log represents books' borrowing activities, including the book ID and the time a book is borrowed and returned. The structure of a log entry is as follows: 
    - `<book_id> borrow <time>, <book_id> return <time>`.

- The time is given in the `HH:MM` 24-hour format, and the book ID is a positive integer between 1 and 500. The logs are separated by a comma, followed by a space (", ").

- Your task is to create a Python function named `solution()`. This function will take as 
    - input a string of logs and 
    - output a list of tuples representing the books with the **longest** borrowed duration. 
    
- Each tuple contains two items: the book ID and the book's borrowed duration. By 'borrowed duration,' we mean the period from when the book was borrowed until it was returned. If a book has been borrowed and returned multiple times, the borrowed duration is the total cumulative sum of those durations. If multiple books share the same longest borrowed duration, the function should return all such books in ascending order of their IDs.

- For example, if we have a log string as follows: `"1 borrow 09:00, 2 borrow 10:00, 1 return 12:00, 3 borrow 13:00, 2 return 15:00, 3 return 16:00"`,
the function will return: `[(2, '05:00')]`.

- Note: You can safely assume that all borrowing actions for a given book will have a corresponding return action in the log, and vice versa. Also, the logs are sorted by the time of the action.

In [11]:
from datetime import datetime, timedelta

def solution(logs):

    books_meta_data = logs.split(", ")
    
    borrows   = {}
    durations = {}
    format = '%H:%M'  # The expected timestamp format
    
    for book_info in books_meta_data:
        
        book_id, status, time = book_info.split()
        book_id = int(book_id)
        time = datetime.strptime(time, format)  # Casting the timestamp from string to datetime object
        
        if status == "borrow":
            borrows[book_id] = time
        else:
            durations[book_id] = durations.get(book_id, timedelta()) + time - borrows[book_id]
            del borrows[book_id]
    
    maxDuration = max( durations.values() )
    answer = []
    
    for book_id, duration in durations.items():
        
        if duration == maxDuration:
            
            totalMinutes = duration.total_seconds() // 60
            totalHours   = int( totalMinutes // 60 ) 
            minutes      = int( totalMinutes % 60 ) 
            new_duration = f"{str(totalHours).zfill(2)}:{str(minutes).zfill(2)}"
            
            answer.append( (book_id, new_duration) )
            
    answer.sort(key = lambda item: item[0])
    
    return answer

In [12]:
import unittest

class SolutionTest(unittest.TestCase):
    def test_case1(self):
        logs = "1 borrow 09:00, 2 borrow 10:00, 1 return 12:00, 3 borrow 13:00, 2 return 15:00, 3 return 16:00"
        self.assertEqual(solution(logs), [(2, '05:00')])

    def test_case2(self):
        logs = "1 borrow 09:00, 2 borrow 10:00, 1 return 16:00, 3 borrow 13:00, 2 return 15:00, 3 return 16:00"
        self.assertEqual(solution(logs), [(1, '07:00')])

    def test_case3(self):
        logs = "1 borrow 05:00, 1 return 18:00, 2 borrow 08:00, 2 return 17:00"
        self.assertEqual(solution(logs), [(1, '13:00')])

    def test_case4(self):
        logs = "1 borrow 06:00, 2 borrow 07:00, 3 borrow 08:00, 1 return 12:00, 2 return 13:00, 3 return 14:00"
        self.assertEqual(solution(logs), [(1, '06:00'), (2, '06:00'), (3, '06:00')])

    def test_case5(self):
        logs = "1 borrow 09:00, 1 return 09:01, 2 borrow 09:02, 2 return 09:03"
        self.assertEqual(solution(logs), [(1, '00:01'), (2, '00:01')])

    def test_case6(self):
        logs = "1 borrow 12:00, 1 return 18:00, 2 borrow 06:00, 2 return 12:00, 3 borrow 00:00, 3 return 06:00"
        self.assertEqual(solution(logs), [(1, '06:00'), (2, '06:00'), (3, '06:00')])

    def test_case7(self):
        logs = "1 borrow 01:00, 1 return 04:00, 2 borrow 02:00, 2 return 05:00"
        self.assertEqual(solution(logs), [(1, '03:00'), (2, '03:00')])

    def test_case8(self):
        logs = "1 borrow 01:00, 1 return 02:00, 2 borrow 03:00, 2 return 05:00, 1 borrow 06:00, 1 return 10:00"
        self.assertEqual(solution(logs), [(1, '05:00')])

# Run only the new test cases
suite = unittest.TestLoader().loadTestsFromTestCase(SolutionTest)
unittest.TextTestRunner().run(suite)

........
----------------------------------------------------------------------
Ran 8 tests in 0.009s

OK


<unittest.runner.TextTestResult run=8 errors=0 failures=0>

---
## Real World Problems - 2

- You must select a particular integer, `k`, from the array. Once you've selected `k`, the function should remove all occurrences of `'k'` from the array, thereby splitting it into several contiguous blocks, or remaining sub-arrays. A unique feature of `k` is that it is chosen such that the **maximum length among these blocks is minimized.**

- For instance, consider the array `[1, 2, 2, 3, 1, 4, 4, 4, 1, 2, 5]`. 

    - If we eliminate all instances of 2 (our k), the remaining blocks would be `[1], [3, 1, 4, 4, 4, 1], [5]`, with the longest containing 6 elements. 
    - Now, if we instead remove all instances of 1, the new remaining blocks would be `[2, 2, 3], [4, 4, 4], [2, 5]`, the longest of which contains 3 elements. As such, the function should return 1 in this case, as it leads to a minimal maximum block length.

In [13]:
# brute force approach
def minimal_max_block_bruteforce(arr):
    min_max_block_size = float('inf')
    min_num = None

    for num in set(arr):  # Avoid duplicates.
        indices = [i for i, x in enumerate(arr) if x == num]  # Indices where 'num' appears.
        indices = [-1] + indices + [len(arr)]  # Add artificial indices at the ends.
        max_block_size = max(indices[i] - indices[i-1] - 1 for i in range(1, len(indices)))  # Calculate max block size.
        
        if max_block_size < min_max_block_size:
            min_max_block_size = max_block_size
            min_num = num

    return min_num

# optimal approach
def minimal_max_block(arr):
    last_occurrence = {}
    max_block_sizes = {}

    for i, num in enumerate(arr):
        if num not in last_occurrence:
            max_block_sizes[num] = i
        else:
            block_size = i - last_occurrence[num] - 1
            max_block_sizes[num] = max(max_block_sizes[num], block_size)
        last_occurrence[num] = i

    for num, pos in last_occurrence.items():
        block_size = len(arr) - pos - 1
        max_block_sizes[num] = max(max_block_sizes[num], block_size)

    min_num = min(max_block_sizes, key=max_block_sizes.get)

    return min_num

---
## Real World Problems - 3

- The goal is to partition the string s into the **largest possible number of contiguous substrings** (or "chapters") such that each letter appears in only one of these substrings.

- Here's how you can think about it:

    - Unique Occurrence: Each letter should only appear in one chapter. This means if a letter appears multiple times, all its occurrences must be within the same chapter.

    - Order Matters: The chapters must follow the order of the original string.

    - Output: You need to return a list of integers representing the lengths of these chapters.

    - For example, if the string is `"abacdcd"`, you can split it into `"aba"` and `"cdcd"`, resulting in chapter lengths `[3, 4]`.

In [14]:
def string_partition(s):
    
    last_occurence = {}
    for index, char in enumerate(s):
        last_occurence[char] = index
        
    max_last_index = float("-inf")
    answer = []
    chapter_start = 0
    
    for index, char in enumerate(s):
        
        last_occur_index = last_occurence[char]
        max_last_index = max(max_last_index, last_occur_index)
        
        if max_last_index == index:
            answer.append( index - chapter_start + 1)
            chapter_start = index + 1
    
    return answer

In [15]:
class SolutionTests(unittest.TestCase):
    def test1(self):
        self.assertEqual(string_partition("abacbc"), [6])

    def test2(self):
        self.assertEqual(string_partition("a"), [1])

    def test3(self):
        self.assertEqual(string_partition("abc"), [1, 1, 1])

    def test4(self):
        self.assertEqual(string_partition("aaabbbccc"), [3, 3, 3])
        
    def test5(self):
        self.assertEqual(string_partition("zabacbcz"), [8])
        
    def test6(self):
        self.assertEqual(string_partition("abcabcabc"), [9])
        
    def test7(self):
        self.assertEqual(string_partition("abacdcd"), [3, 4])
        
    def test8(self):
        self.assertEqual(string_partition("feepplkpadaasdr"), [1, 2, 5, 6, 1])

    def test9(self):
        self.assertEqual(string_partition("a" * 1000000), [1000000])
        
# Run only the new test cases
suite = unittest.TestLoader().loadTestsFromTestCase(SolutionTests)
unittest.TextTestRunner().run(suite)

.........
----------------------------------------------------------------------
Ran 9 tests in 0.319s

OK


<unittest.runner.TextTestResult run=9 errors=0 failures=0>

---
## Real World Problems - 4

- the goal is to find the character whose removal results in the **maximum number of unique words being broken**. If multiple characters result in the same number of broken words, choose the one that appears first in the string.

In [16]:
def solution(s):
    # Create a dictionary to map each character to the number of unique words it appears in
    char_to_word_count = {}
    unique_words = set(s.split())

    # Populate the dictionary
    for word in unique_words:
        seen_chars = set()
        for char in word:
            if char not in seen_chars:
                if char not in char_to_word_count:
                    char_to_word_count[char] = 0
                char_to_word_count[char] += 1
                seen_chars.add(char)

    # Find the character with the maximum count
    max_char = None
    max_count = -1

    for char in s:
        if char in char_to_word_count and char_to_word_count[char] > max_count:
            max_char = char
            max_count = char_to_word_count[char]

    return (max_char, max_count)

In [17]:
class TestSolution(unittest.TestCase):

    def test_case_1(self):
        self.assertEqual(solution("Hello, world!"), ('l', 2))

    def test_case_2(self):
        self.assertEqual(solution("Life is like a box of chocolates"), ('i', 3))

    def test_case_3(self):
        self.assertEqual(solution("1... 2... 3... Go!"), ('.', 3))

    def test_case_4(self):
        self.assertEqual(solution("A quick brown fox jumps over the lazy dog."), ('o', 4))

    def test_case_5(self):
        self.assertEqual(solution("Python is fun!"), ('n', 2))

    def test_case_6(self):
        self.assertEqual(solution("To be, or not to be: that is the question."), ('o', 5))

    def test_case_7(self):
        self.assertEqual(solution("Winners never quit and quitters never win."), ('i', 4))

    def test_case_8(self):
        self.assertEqual(solution("May the force be with you."), ('e', 3))

    def test_case_9(self):
        self.assertEqual(solution("In the end, it's not the years in your life that count. It's the life in your years."), ('t', 6))

    def test_case_10(self):
        self.assertEqual(solution("Whether you think you can or you think you can’t, you’re right."), ('t', 4))
        
# Run only the new test cases
suite = unittest.TestLoader().loadTestsFromTestCase(TestSolution)
unittest.TextTestRunner().run(suite)        

..........
----------------------------------------------------------------------
Ran 10 tests in 0.011s

OK


<unittest.runner.TextTestResult run=10 errors=0 failures=0>

---
## Real World Problems - 5

- Input: A list of `n` words. Each word consists of lowercase and uppercase English alphabets, with lengths ranging from 1 to 50.
- Output: A dictionary where:
    - Each key is a unique word from the list.
    - Each value is the shortest distance between two occurrences of that word in the list.
- Distance Calculation: The distance between two occurrences of a word is the difference in their indices.
- Constraints: If a word appears only once, it should not be included in the output dictionary.

In [18]:
def solution(word_list):
    last_occurrence = {}
    min_distances = {}

    for i, word in enumerate(word_list):
        if word in last_occurrence:
            distance = i - last_occurrence[word]
            if word in min_distances:
                min_distances[word] = min(min_distances[word], distance)
            else:
                min_distances[word] = distance
        last_occurrence[word] = i

    return min_distances

---
## Real World Problems - 6

- 

In [19]:
def find_influencer(connections):
    # Create an adjacency list to store direct friends
    friend_network = {}
    
    for i, j in connections:
        if i not in friend_network:
            friend_network[i] = set()
        friend_network[i].add(j)
        
        if j not in friend_network:
            friend_network[j] = set()
        friend_network[j].add(i)
        
    # Dictionary to count friends within two degrees
    counts = {}
    for person, direct_friends in friend_network.items():
        # Start with direct friends
        overall_set = set(direct_friends)
        # Add friends of friends
        for friend in direct_friends:
            overall_set.update(friend_network[friend])
        
        # Exclude the person themselves
        overall_set.discard(person)
        counts[person] = len(overall_set)
        
    # Find the person with the maximum network size
    max_value = max(counts.values())
    # Return the smallest person ID with the maximum network
    return min(person for person, count in counts.items() if count == max_value)