## Module 1 - Python Essentials for MLOps
## Top 3 Key Points

-> Lists store ordered, mutable items

-> Dictionaries map unique keys to values

-> Sets contain only unique elements

## Reflection Questions

-> When would you want to use a tuple over a list?

-> What real-world examples can be modeled with dictionaries?

-> How could sets help improve the efficiency of a program?

-> What issues can arise from nested data structures?

-> Why is proper data structure selection important when coding?

## Challenge Exercises

-> Implement a phone book as a dictionary.

-> Find the most frequent words in a text file using sets.

-> Build a histogram from word counts using dictionaries.

-> Convert a deeply nested list to a flat list.



In [20]:
#Lists store ordered, mutable items
num = [1,2,3,4,5,6,7]
num[:4]
num[2:5] # index at 2 to 5 -1 elements
num[::3] # 3 is step so list in step of 3 
num[0:-1] # All elements except last one element
num[-1] # last element
num[-2] # second last element
num[::-1] # reverse the order
num[5:7] # last two elements

[6, 7]

In [27]:
practice=["Introduction", "Python", "DataCamp", "R", "SQL", "Data Science"]
print(practice[-6:-2])

['Introduction', 'Python', 'DataCamp', 'R']


In [None]:
#Dictionaries map unique keys to values
dict = {"name": "James", "school_id": 1235}
dict
dict.v

{'name': 'James', 'school_id': 1235}

In [21]:
p = [[20, "A", "B"], ["C", 22, 18], ["D", "E", "F"]]
print(p[1][:2])

['C', 22]


In [3]:
dict.keys()

dict_keys(['name', 'school_id'])

In [4]:
dict.values()

dict_values(['James', 1235])

In [5]:
dict["name"]

'James'

In [6]:
dict["school_id"]

1235

In [7]:
dict.update({"name": "John"})

In [8]:
dict["name"]

'John'

In [9]:
dict.pop("school_id")

1235

In [10]:
dict

{'name': 'John'}

In [11]:
dict.update({"school_id":1235})

In [12]:
dict

{'name': 'John', 'school_id': 1235}

In [13]:
#Sets contain only unique elements
# A simple list with duplicate elements
my_list = [1, 2, 2, 3, 4, 4, 4, 5, 6, 6]
print(f"Original list with duplicates: {my_list}")

# Convert the list to a set. A set automatically removes duplicates.
my_set = set(my_list)
print(f"Set with unique elements: {my_set}")

# You can also convert the set back to a list if needed
unique_list = list(my_set)
print(f"List with unique elements: {unique_list}")

# You can also add elements to a set, but if the element already exists, it is ignored
my_set.add(5)
my_set.add(7)
print(f"\nSet after trying to add 5 (duplicate) and 7 (new): {my_set}")



Original list with duplicates: [1, 2, 2, 3, 4, 4, 4, 5, 6, 6]
Set with unique elements: {1, 2, 3, 4, 5, 6}
List with unique elements: [1, 2, 3, 4, 5, 6]

Set after trying to add 5 (duplicate) and 7 (new): {1, 2, 3, 4, 5, 6, 7}


In [14]:
my_set.remove(2)
my_set.discard(10)  # discard does not raise an error if the element is not found
my_set.add(8)

In [15]:
my_set

{1, 3, 4, 5, 6, 7, 8}

## Answers to Reflection Questions:

**1. When would you want to use a tuple over a list?**
You would use a tuple when you need an immutable collection of items that won't change, like coordinates or database records.

**2. What real-world examples can be modeled with dictionaries?**
Real-world examples for dictionaries include a contact list where names map to phone numbers, or a product catalog where product IDs map to item details.

**3. How could sets help improve the efficiency of a program?**
Sets are highly efficient for checking if an element exists within a collection because they have an average time complexity of $O(1)$ for membership testing.

**4. What issues can arise from nested data structures?**
Nested data structures can lead to increased complexity, making the code harder to read, debug, and maintain.

**5. Why is proper data structure selection important when coding?**
Proper data structure selection is crucial because it directly impacts a program's performance, memory usage, and overall readability.

## Challenge Exercise 01

# Implement a phone book as a dictionary.

In [18]:
# Create a empty dictionary to represent the phone book
phone_book = {}

# 1. Add contact details for phone book
# Name is key and phone number is value
phone_book = {
           "Alice":"555-1234",
           "James": "565-5678",
           "Bob": "555-8765"
           }
print("-------------Current Phone Book -----------------")
print(phone_book)

# 2. Look up a contact's number
# You can access a value using it's key
alice_number =phone_book['Alice']
print(f"Alice's number is: {alice_number}")

# 3. Check if contact exist in the phone book
if "Bob" in phone_book:
    print("Bob is in the phone book")
else:
    print("Bob is not in the phone book")

if "Eve" in phone_book:
    print("Eve is in the phone book")
else:
    print("Eve is not in the phone book")

# 4. Remove a contact
# Use the 'del' keyword to remove a key-value pair
del phone_book["James"]
print("\n--- Phone Book after removing James ---")
print(phone_book)

# 5. Update a contact
# James contact details
phone_book["James"] = "123-5678"
print("\n---Phone number of James is updated")
print(phone_book)

-------------Current Phone Book -----------------
{'Alice': '555-1234', 'James': '565-5678', 'Bob': '555-8765'}
Alice's number is: 555-1234
Bob is in the phone book
Eve is not in the phone book

--- Phone Book after removing James ---
{'Alice': '555-1234', 'Bob': '555-8765'}

---Phone number of James is updated
{'Alice': '555-1234', 'Bob': '555-8765', 'James': '123-5678'}


## Challenge Excercise 02
# Find the most frequent words in a text file using sets.

In [22]:
import re
from collections import Counter

def find_most_frequent_words(text):
    """
    Finds the most frequent word(s) in a given text string.

    Args:
        text (str): The input text to analyze.
    
    Return:
        tuple: A tuple containing the maximum frequency and a list
    """
    # 1. Normalize the text: convert to lowercase and remove punctuation
    # The re.findall() function finds all sequence of letters and numbers.

    words = re.findall(r'\b\w+\b', text.lower())

    # 2. Use a set to get all the unique words from the text
    unique_words = set(words)
    print(f"Total number of words: {len(words)}")
    print(f"Total number of unique words (using a set): {len(unique_words)}\n")

    # 3. Count word frequencies using a dictionary (Counter).
    # The Counter class is a specialized dictionary designed for this task.
    word_counts = Counter(words)

    # 4. Find the maximum frequency.
    # We get all the values (counts) from our dictionary and find the maximum.
    if not word_counts:
        return 0, []
    max_count = max(word_counts.values())

    # 5. Find all words that have the maximum frequency.
    # We iterate through the dictionary's items (key-value pairs)
    # and create a new list of words that match the max_count.
    most_frequent_words = [word for word, count in word_counts.items() if count == max_count]

    return max_count, most_frequent_words

def create_histogram(word_counts, bar_char='*'):
    """
    Prints a simple text-based histogram of word frequencies.
    """
    print("\n--- Word Frequency Histogram ---")
    # Sort the words by frequency in descending order for a better visual
    sorted_words = sorted(word_counts.items(), key=lambda item: item[1], reverse=True)
    
    # Find the longest word length for aligning the bars
    if sorted_words:
        max_word_length = max(len(word) for word, count in sorted_words)
    else:
        max_word_length = 0

    for word, count in sorted_words:
        # Pad the word to align the histogram bars
        print(f"{word.ljust(max_word_length)}: {bar_char * count}")  


In [23]:
# --- Main Part of the script ----
# This is the sample text that would typically be a file.
sample_text = """Data structures are a way of organizing and storing data in a computer
so that it can be accessed and modified efficiently. Lists, sets, and
dictionaries are common data structures in Python. Sets contain unique
elements. Lists allow for duplicates. Dictionaries store data in
key-value pairs. Dictionaries are very efficient for lookups.
"""
# Call the function to find the most frequent words
max_freq, top_words = find_most_frequent_words(sample_text)

print(f"The most frequent word(s) occur {max_freq} time(s).")
print(f"The most frequent word(s) are: {top_words}")

Total number of words: 53
Total number of unique words (using a set): 37

The most frequent word(s) occur 4 time(s).
The most frequent word(s) are: ['data']


## Challenge Excercise 03
# Build a histogram from word counts using dictionaries.

In [24]:
# Call the new function to print the histogram
word_counts_for_histogram = Counter(re.findall(r'\b\w+\b', sample_text.lower()))
create_histogram(word_counts_for_histogram)

print(f"\n--- Most Frequent Words Analysis ---")
print(f"The most frequent word(s) occur {max_freq} time(s).")
print(f"The most frequent word(s) are: {top_words}")


--- Word Frequency Histogram ---
data        : ****
are         : ***
and         : ***
in          : ***
dictionaries: ***
structures  : **
a           : **
lists       : **
sets        : **
for         : **
way         : *
of          : *
organizing  : *
storing     : *
computer    : *
so          : *
that        : *
it          : *
can         : *
be          : *
accessed    : *
modified    : *
efficiently : *
common      : *
python      : *
contain     : *
unique      : *
elements    : *
allow       : *
duplicates  : *
store       : *
key         : *
value       : *
pairs       : *
very        : *
efficient   : *
lookups     : *

--- Most Frequent Words Analysis ---
The most frequent word(s) occur 4 time(s).
The most frequent word(s) are: ['data']


## Challenge Excercise 04
# Convert deeply nested list to a flat list

In [25]:
def flatten_list(nested_list):
    """
    Recursively flattens a deeply nested list into a single flat list.

    Args:
        nested_list (list): The list to flatten, which may contain sub-lists.

    Returns:
        list: A new list with all elements from the nested list in a single dimension.
    """
    flat_list = []
    # Loop through each item in the provided list
    for item in nested_list:
        # Check if the item is a list itself
        if isinstance(item, list):
            # If it's a list, call the function again (recursion)
            # and extend the current flat list with the results
            flat_list.extend(flatten_list(item))
        else:
            # If it's not a list, it's a regular element, so add it
            flat_list.append(item)
    return flat_list

# Simple example: a list with varying levels of nesting
nested_list_example = [1, 2, [3, 4, [5, 6]], 7, [8, 9]]

# Call the function to flatten the list
flat_list_example = flatten_list(nested_list_example)

# Print the original and flattened lists to see the result
print(f"Original nested list: {nested_list_example}")
print(f"Flattened list: {flat_list_example}")


Original nested list: [1, 2, [3, 4, [5, 6]], 7, [8, 9]]
Flattened list: [1, 2, 3, 4, 5, 6, 7, 8, 9]
