# <a id='toc1_'></a>[Day 4: Python Lists and Sets 🐍](#toc0_)


Welcome to Day 4 of our course! In this lesson, we will explore Python lists and sets in depth. We'll cover their properties, built-in methods, time complexities, and practical examples. Enjoy the journey through theory, examples, and hands-on problems! 😊

## <a id='toc1_1_'></a>[Table of Contents 📖](#toc0_)


- [Table of Contents 📖](#toc1_1_)    
- [Objectives](#toc1_2_)    
- [Introduction & Theory 📚](#toc1_3_)    
  - [Python Lists 📝](#toc1_3_1_)    
  - [Python Sets 🧩](#toc1_3_2_)    
- [Built-in Methods & Complexity Tables](#toc1_4_)    
  - [List Methods 🔄](#toc1_4_1_)    
  - [Set Methods 🔍](#toc1_4_2_)    
- [Examples & Timing Comparisons ⏱️](#toc1_5_)    
  - [Membership Testing 🔎](#toc1_5_1_)    
  - [Duplicate Removal 🔄](#toc1_5_2_)    
  - [Insertion & Other Operations ⚡](#toc1_5_3_)    
- [Interesting Problems 🤔](#toc1_6_)    
  - [Problem 1: Remove Duplicates](#toc1_6_1_)    
  - [Problem 2: Membership Test Comparison Function](#toc1_6_2_)    
  - [Problem 3: Set Operations (Union & Intersection)](#toc1_6_3_)    
- [Exercises 📝](#toc1_7_)    
- [Reflection Questions](#toc1_8_)    
- [Additional Resources](#toc1_9_)    
- [Final Summary 🏁](#toc1_10_)    
  - [Bonus Content: Visualizing Performance with Matplotlib 📊](#toc1_10_1_)    

<!-- vscode-jupyter-toc-config
numbering=false
anchor=true
flat=false
minLevel=1
maxLevel=6
/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

## <a id='toc1_2_'></a>[Objectives](#toc0_)

- Understand the characteristics of Python lists (ordered, mutable) vs. sets (unordered, unique elements) and their properties. 🛠️
- Learn about the time complexities of various built-in methods for lists and sets. 🕰️
- Explore practical examples with timing comparisons. 📊
- Solve interesting problems using lists and sets. 🧩
- Deepen your understanding through extended theory and visualizations. 📚
- Have fun coding and experimenting with these data structures! 🚀

## <a id='toc1_3_'></a>[Introduction & Theory 📚](#toc0_)


In this section, we'll introduce the basic properties and differences between Python lists and sets. Understanding these differences will help you choose the right data structure for your problem.


### <a id='toc1_3_1_'></a>[Python Lists 📝](#toc0_)

- **Ordered:** Elements maintain the order in which they are inserted.
- **Mutable:** Elements can be changed after creation.
- **Duplicates Allowed:** The same element can appear more than once.
- **Use Cases:** Ideal when order matters or when you need to access elements by index.

### <a id='toc1_3_2_'></a>[Python Sets 🧩](#toc0_)


- **Unordered:** There is no guaranteed order of elements.
- **Unique Elements:** Automatically removes duplicate elements.
- **Hash-Based:** Offers O(1) average-case performance for membership testing.
- **Use Cases:** Best when you need to ensure all elements are unique or require fast membership testing.



Both data structures are powerful tools, and understanding their trade-offs is key to writing efficient Python code.

## <a id='toc1_4_'></a>[Built-in Methods & Complexity Tables](#toc0_)



Below are tables summarizing common methods for lists and sets, along with their time complexities. Understanding these complexities will help you write efficient code and choose the right data structure for your problem.

### <a id='toc1_4_1_'></a>[List Methods 🔄](#toc0_)



Lists in Python are versatile and allow various operations. The table below provides a quick reference to some of the most common methods and their complexities:

| **Method**       | **Description**                                | **Time Complexity**                  |
|------------------|------------------------------------------------|--------------------------------------|
| `append(x)`      | Add element to the end                         | O(1) amortized                       |
| `insert(i, x)`   | Insert element `x` at index `i`                | O(n)                                 |
| `pop()`          | Remove and return the last element             | O(1)                                 |
| `pop(i)`         | Remove and return element at index `i`         | O(n)                                 |
| `remove(x)`      | Remove first occurrence of `x`                 | O(n)                                 |
| `index(x)`       | Return index of first occurrence of `x`        | O(n)                                 |
| `sort()`         | Sort the list                                  | O(n log n) (Timsort)                 |

### <a id='toc1_4_2_'></a>[Set Methods 🔍](#toc0_)


Sets offer a different set of operations that take advantage of hash tables. Check out the table below for details:

| **Method**         | **Description**                                      | **Time Complexity**            |
|--------------------|------------------------------------------------------|--------------------------------|
| `add(x)`           | Add element `x` to the set                           | O(1) average-case              |
| `remove(x)`        | Remove element `x`; raises error if not present      | O(1) average-case              |
| `discard(x)`       | Remove element `x` if present                        | O(1) average-case              |
| `pop()`            | Remove and return an arbitrary element               | O(1)                           |
| `union(s)`    | Return the union of two sets                         | O(len(s1) + len(s2))           |
| `intersection(s)` / `&` | Return the intersection of two sets            | O(min(len(s1), len(s2)))       |
| `difference(s)`    | Return the difference of two sets                    | O(len(s1))                     |
| `symmetric_difference(s)` | Return elements in either set, but not both    | O(len(s1) + len(s2))           |


## <a id='toc1_5_'></a>[Examples & Timing Comparisons ⏱️](#toc0_)


Let's see these concepts in action! In the following examples, we will compare lists and sets through timing experiments and practical code examples.

### <a id='toc1_5_1_'></a>[Membership Testing 🔎](#toc0_)


Membership testing in a list is O(n) because it must scan through each element. In contrast, sets use a hash table and typically perform membership tests in O(1) average-case time. Check out the following timing comparison:

In [18]:
import timeit
from IPython.display import HTML, display

# Setup code: create a list and a corresponding set of 10,000 numbers
setup_code = "lst = list(range(10000)); s = set(lst)"
list_test = "9999 in lst"
set_test = "9999 in s"

# Time the membership testing
list_time = timeit.timeit(stmt=list_test, setup=setup_code, number=10000)
set_time = timeit.timeit(stmt=set_test, setup=setup_code, number=10000)

# Prepare the HTML output
html_membership = f'''\
<div style="border: 2px solid #4CAF50; border-radius: 10px; padding: 15px; background-color: #f9fff9; font-family: Arial, sans-serif;">
    <h3 style="color: #2E7D32;">Membership Testing Timing</h3>
    <p><strong>List:</strong> {list_time:.6f} sec (10,000 iterations)</p>
    <p><strong>Set:</strong> {set_time:.6f} sec (10,000 iterations)</p>
</div>
'''

display(HTML(html_membership))


This output demonstrates the significant speedup sets offer for membership testing. When `List` is replaced with `Set`, the time drops from 0.248011 seconds to 0.000085 seconds for 10,000 iterations.

### <a id='toc1_5_2_'></a>[Duplicate Removal 🔄](#toc0_)


Converting a list to a set automatically removes duplicate elements since sets only store unique items. Converting back to a list provides a duplicate-free list (though order is not preserved). Consider the following example:

In [None]:
sample_list = [1, 2, 2, 3, 4, 4, 4, 5]
unique_list = list(set(sample_list))
print("Original List:", sample_list)
print("After Removing Duplicates:", unique_list)

Original List: [1, 2, 2, 3, 4, 4, 4, 5]
After Removing Duplicates: [1, 2, 3, 4, 5]


### <a id='toc1_5_3_'></a>[Insertion & Other Operations ⚡](#toc0_)


Next, we compare the time it takes to insert 10,000 random numbers into a list versus a set. This example illustrates the performance differences during insertion operations.

In [26]:
import random
import timeit
from IPython.display import HTML, display

def insert_into_list():
    lst = []
    for _ in range(10000):
        lst.append(random.randint(0, 10000))
    return lst

def insert_into_set():
    s = set()
    for _ in range(10000):
        s.add(random.randint(0, 10000))
    return s

# Time the insertion operations (10 runs each)
list_insertion_time = timeit.timeit(insert_into_list, number=10)
set_insertion_time = timeit.timeit(insert_into_set, number=10)

# Prepare the HTML output for insertion timings
html_insertion = f'''\
<div style="border: 2px solid #1976D2; border-radius: 10px; padding: 15px; background-color: #e3f2fd; font-family: Arial, sans-serif;">
    <h3 style="color: #0D47A1;">Insertion Timing Comparison</h3>
    <p><strong>List Insertion:</strong> {list_insertion_time:.6f} sec (10 runs)</p>
    <p><strong>Set Insertion:</strong> {set_insertion_time:.6f} sec (10 runs)</p>
</div>
'''

display(HTML(html_insertion))


The output shows that `sets` are faster than `lists` for **insertion operations**. This is because sets use `hash tables` (we are going to cover this data structure in futher lessons), providing $O(1)$ average-case insertion time.

## <a id='toc1_6_'></a>[Interesting Problems 🤔](#toc0_)


Now, let's tackle some interesting problems that help solidify your understanding of lists and sets.

### <a id='toc1_6_1_'></a>[Problem 1: Remove Duplicates](#toc0_)



Write a function that removes duplicates from a list by converting it to a set and then back to a list. Compare the execution time of this approach with a manual duplicate removal method using a loop.

In [6]:
def remove_duplicates_set(lst):
    """Remove duplicates using set conversion."""
    return list(set(lst))

print('remove_duplicates_set defined')

remove_duplicates_set defined


In [28]:
def remove_duplicates_manual(lst):
    """Remove duplicates manually while preserving order."""
    seen = set()
    result = []
    for item in lst:
        if item not in seen:
            result.append(item)
            seen.add(item)
    return result

In [8]:
import random
large_list = [random.randint(0, 1000) for _ in range(10000)]

time_set = timeit.timeit(lambda: remove_duplicates_set(large_list), number=100)
time_manual = timeit.timeit(lambda: remove_duplicates_manual(large_list), number=100)

print("Set-based duplicate removal time:", time_set)
print("Manual duplicate removal time:", time_manual)

Set-based duplicate removal time: 0.006360791041515768
Manual duplicate removal time: 0.037637875066138804


The output shows that the `set`-based approach is significantly faster than the manual method for duplicate removal.

### <a id='toc1_6_2_'></a>[Problem 2: Membership Test Comparison Function](#toc0_)



Write a function that accepts a collection and a target element, and returns the time taken to check for membership in both a list and a set. Use this function to compare performance on collections of various sizes.

In [27]:
def compare_membership_times(n, target):
    lst = list(range(n))
    s = set(lst)
    list_time = timeit.timeit(lambda: target in lst, number=10000)
    set_time = timeit.timeit(lambda: target in s, number=10000)
    return list_time, set_time

In [10]:
lt, st = compare_membership_times(10000, 9999)
print(f"List membership time: {lt:.6f} sec")
print(f"Set membership time: {st:.6f} sec")

List membership time: 0.247998 sec
Set membership time: 0.000216 sec


The output demonstrates the speed difference between `lists` and `sets` for **membership testing**, which is crucial for large collections or frequent lookups.

### <a id='toc1_6_3_'></a>[Problem 3: Set Operations (Union & Intersection)](#toc0_)



Implement functions for union and intersection using both list-based approaches and built-in set operations. Compare their performance using timing experiments.

In [29]:
def union_lists(lst1, lst2):
    result = lst1.copy()
    for item in lst2:
        if item not in result:
            result.append(item)
    return result

In [30]:
def intersection_lists(lst1, lst2):
    return [item for item in lst1 if item in lst2]

In [31]:
def union_set(lst1, lst2):
    return list(set(lst1) | set(lst2))

In [32]:
def intersection_set(lst1, lst2):
    return list(set(lst1) & set(lst2))

In [33]:
lst1 = list(range(5000))
lst2 = list(range(2500, 7500))

union_list_time = timeit.timeit(lambda: union_lists(lst1, lst2), number=100)
union_set_time = timeit.timeit(lambda: union_set(lst1, lst2), number=100)
intersection_list_time = timeit.timeit(lambda: intersection_lists(lst1, lst2), number=100)
intersection_set_time = timeit.timeit(lambda: intersection_set(lst1, lst2), number=100)

print("Union (list):", union_list_time)
print("Union (set):", union_set_time)
print("Intersection (list):", intersection_list_time)
print("Intersection (set):", intersection_set_time)

Union (list): 5.594400957925245
Union (set): 0.013508958043530583
Intersection (list): 3.3591173340100795
Intersection (set): 0.008633249904960394


The output clearly shows that `set` operations are **significantly faster** than `list`-based approaches for both union and intersection operations.

## <a id='toc1_7_'></a>[Exercises 📝](#toc0_)



Try solving these exercises to further cement your understanding of lists and sets:

1. **Exercise 1:** Implement a function that removes duplicates from a list by converting it to a set and back, then compare its performance with a manual duplicate removal method.
2. **Exercise 2:** Write a function that tests membership of a target element in both a list and a set for various collection sizes. Present your findings in a well-formatted table.
3. **Exercise 3:** Create functions for the union and intersection of two collections using list-based and set-based approaches. Benchmark the performance differences.
4. **Exercise 4:** Find all unique pairs in a list that sum to a target value. Compare how the solution differs when using lists versus sets.

## <a id='toc1_8_'></a>[Reflection Questions](#toc0_)



- What are the primary differences between lists and sets in Python?
- How does the underlying implementation (array vs. hash table) affect performance?
- In which scenarios would you prefer a list over a set, and vice versa?
- How do the built-in method complexities influence your choice of data structure?
- What trade-offs do you observe when converting between lists and sets (e.g., order preservation vs. speed)?

## <a id='toc1_9_'></a>[Additional Resources](#toc0_)

- [Python Lists Documentation](https://docs.python.org/3/tutorial/datastructures.html#more-on-lists)
- [Python Sets Documentation](https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset)


## <a id='toc1_10_'></a>[Final Summary 🏁](#toc0_)



Today, we explored the fascinating world of Python lists and sets. We learned that:

- **Lists** are ordered, mutable, and allow duplicates, but membership testing is O(n).
- **Sets** are unordered, store only unique elements, and offer O(1) average-case membership testing.

We also compared the performance of various operations using timing experiments and discussed when to use each data structure. Keep experimenting with these examples, try out the exercises, and deepen your understanding by exploring the additional resources provided.

Happy coding and see you in the next lesson! 🚀

### <a id='toc1_10_1_'></a>[Bonus Content: Visualizing Performance with Matplotlib 📊](#toc0_)

For a more visual approach, you can plot the membership testing times for lists and sets across different collection sizes. Try running the following code to generate a plot:

In [None]:
import matplotlib.pyplot as plt

sizes = [1000, 5000, 10000, 50000, 100000]
list_times = []
set_times = []

for n in sizes:
    lst = list(range(n))
    s = set(lst)
    lt = timeit.timeit(lambda: 999 in lst, number=1000)
    st = timeit.timeit(lambda: 999 in s, number=1000)
    list_times.append(lt)
    set_times.append(st)

plt.figure(figsize=(8, 5))
plt.plot(sizes, list_times, marker='o', label='List Membership')
plt.plot(sizes, set_times, marker='s', label='Set Membership')
plt.xlabel('Size of Collection')
plt.ylabel('Time (sec)')
plt.title('Membership Testing Performance')
plt.legend()
plt.grid(True)
plt.show()

In [15]:
%%html
<!DOCTYPE html>
<html lang="en">
<head>
  <title>Moving Average Algorithm Steps</title>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <!-- Updated to include more modern fonts -->
  <link href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600&family=Poppins:wght@400;500;600&display=swap" rel="stylesheet">
  <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.4.1/css/bootstrap.min.css">
  <style>
    body {
      font-family: 'Inter', sans-serif;
      margin: 0;
      padding: 20px;
    }
    .container {
      max-width: 800px;
      margin: 0 auto;
    }
    h2 {
      font-family: 'Poppins', sans-serif;
      font-weight: 600;
      color: #2c3e50;
      margin-bottom: 30px;
      text-align: center;
      margin-left: auto;
      margin-right: auto;
    }
    .visualization-container {
      position: relative;
      width: 448px;
      height: 320px;
      margin: 40px auto;
    }
    .cell {
      width: 64px;
      height: 64px;
      background-color: #d8eeee;
      border-right: 1px solid white;
      display: flex;
      align-items: center;
      justify-content: center;
      font-size: 1.75rem; /* Increased font size */
      font-family: 'Poppins', sans-serif;
      font-weight: 500;
      color: #2c3e50;
    }
    .bottom-cell {
      width: 64px;
      height: 64px;
      background-color: #d8eeee;
      border-right: 1px solid white;
      display: flex;
      align-items: center;
      justify-content: center;
      font-size: 1.75rem; /* Increased font size */
      font-family: 'Poppins', sans-serif;
      font-weight: 500;
      color: #2c3e50;
      transition: all 0.3s ease;
    }
    .step-text {
      width: 448px;
      margin: 40px auto;
      padding: 15px 30px;
      font-size: 1.3rem;
      color: #2c3e50;
      text-align: center;
      background-color: #f8f9fa;
      border-radius: 8px;
      box-shadow: 0 2px 4px rgba(0,0,0,0.1);
      font-weight: 300;
      letter-spacing: 0.5px;
      font-family: 'Inter', sans-serif;
    }
    .nav-buttons {
      position: absolute;
      width: 100%;
      top: 50%;
      transform: translateY(-50%);
      display: flex;
      justify-content: space-between;
      padding: 0 20px;
    }
    .nav-button {
      padding: 15px;
      font-size: 24px;
      border: none;
      background-color: rgba(0, 0, 0, 0.1);
      color: #2c3e50;
      cursor: pointer;
      transition: all 0.3s ease;
      border-radius: 50%;
      width: 50px;
      height: 50px;
      display: flex;
      align-items: center;
      justify-content: center;
    }
    .nav-button:disabled {
      opacity: 0.25;
      cursor: not-allowed;
    }
    .nav-button:hover:not(:disabled) {
      background-color: rgba(0, 0, 0, 0.2);
      color: #34495e;
    }
    .moving-elements {
      position: absolute;
      transition: all 0.3s ease;
    }
    .fraction {
      font-family: 'Poppins', sans-serif;
      font-size: 1.5rem;
      color: #2c3e50;
      text-align: center;
    }
    .fraction-line {
      width: 7rem;
      border-top: 2px solid #2c3e50;
      margin: 4px 0;
    }
  </style>
</head>
<body>

<div class="container">
  <h2>Moving Average Algorithm Steps</h2>  
  <div style="position: relative;">
    <div class="visualization-container">
      <!-- Static top row -->
      <div style="display: flex;">
        <div class="cell">4</div>
        <div class="cell">3</div>
        <div class="cell">8</div>
        <div class="cell">1</div>
        <div class="cell">5</div>
        <div class="cell">6</div>
        <div class="cell">3</div>
      </div>

      <!-- Moving elements container -->
      <div id="movingElements" class="moving-elements">
        <!-- Bracket -->
        <!-- <img src="bracket.png"  -->
        <!-- <img src="../images/bracket.png"  -->
        <!-- <img src="03_03_words.png"  -->
          <img src="https://www.pngkit.com/png/full/100-1005823_open-thin-curly-bracket-png.png" 
             alt="Curly Bracket" 
             style="top: 70px; left: 306px; width: 172px; height: 32px;"/>
        
        <!-- Fraction -->
        <div class="fraction" style="position: absolute; top: 50px; transform: translateX(-50%);
                    display: flex; flex-direction: column; align-items: center;">
          <div id="fractionTop" style="font-weight: 500;">4 + 3 + 8</div>
          <div class="fraction-line"></div>
          <div id="fractionBottom" style="font-weight: 500;">3</div>
        </div>

        <!-- Arrow -->
        <svg width="30" height="80" style="position: absolute; top: 120px; transform: translateX(-50%);">
          <line x1="15" y1="0" x2="15" y2="70" stroke="#2c3e50" stroke-width="2" stroke-dasharray="4,4"></line>
          <polygon points="10,70 20,70 15,80" fill="#2c3e50"></polygon>
        </svg>
      </div>

      <!-- Static bottom row container -->
      <div style="position: absolute; top: 280px; left: 64px; display: flex; width: 320px;">
        <div id="result1" class="bottom-cell">5</div>
        <div id="result2" class="bottom-cell"></div>
        <div id="result3" class="bottom-cell"></div>
        <div id="result4" class="bottom-cell"></div>
        <div id="result5" class="bottom-cell"></div>
      </div>
    </div>

    <!-- Navigation buttons moved outside visualization container but inside relative container -->
    <div class="nav-buttons">
      <button id="prevBtn" class="nav-button">
        <span class="glyphicon glyphicon-chevron-left"></span>
      </button>
      <button id="nextBtn" class="nav-button">
        <span class="glyphicon glyphicon-chevron-right"></span>
      </button>
    </div>
  </div>

  <div class="step-text" id="stepText">
    Step 1: Calculate average of first three numbers (4, 3, 8)
  </div>
</div>

<script>
const steps = [
  {
    left: 96,  // Adjusted positions
    fraction: ['4 + 3 + 8', '3'],
    results: ['5', '', '', '', ''],
    text: 'Step 1: Calculate average of first three numbers (4, 3, 8)'
  },
  {
    left: 160,
    fraction: ['3 + 8 + 1', '3'],
    results: ['5', '4', '', '', ''],
    text: 'Step 2: Calculate average of numbers 2-4 (3, 8, 1)'
  },
  {
    left: 224,
    fraction: ['8 + 1 + 5', '3'],
    results: ['5', '4', '4.67', '', ''],
    text: 'Step 3: Calculate average of numbers 3-5 (8, 1, 5)'
  },
  {
    left: 288,
    fraction: ['1 + 5 + 6', '3'],
    results: ['5', '4', '4.67', '4', ''],
    text: 'Step 4: Calculate average of numbers 4-6 (1, 5, 6)'
  },
  {
    left: 352,
    fraction: ['5 + 6 + 3', '3'],
    results: ['5', '4', '4.67', '4', '4.67'],
    text: 'Step 5: Calculate average of last three numbers (5, 6, 3)'
  }
];

let currentStep = 0;

function updateStep(step) {
  const movingElements = document.getElementById('movingElements');
  movingElements.style.left = `${steps[step].left}px`;
  
  document.getElementById('fractionTop').textContent = steps[step].fraction[0];
  document.getElementById('fractionBottom').textContent = steps[step].fraction[1];
  
  steps[step].results.forEach((result, i) => {
    document.getElementById(`result${i + 1}`).textContent = result;
  });
  
  document.getElementById('stepText').textContent = steps[step].text;
  
  document.getElementById('prevBtn').disabled = step === 0;
  document.getElementById('nextBtn').disabled = step === steps.length - 1;
}

document.getElementById('prevBtn').addEventListener('click', () => {
  if (currentStep > 0) {
    currentStep--;
    updateStep(currentStep);
  }
});

document.getElementById('nextBtn').addEventListener('click', () => {
  if (currentStep < steps.length - 1) {
    currentStep++;
    updateStep(currentStep);
  }
});

// Initialize first step
updateStep(0);
</script>


</html> 