# Arrays/Lists: Slicing, Sorting, Array Manipulation, Multi-dimensional Arrays

## Learning Objectives
By the end of this notebook, you will be able to:
- Create and manipulate Python lists
- Use list slicing techniques effectively
- Sort lists using various methods
- Work with multi-dimensional arrays (nested lists)
- Perform common array operations for data engineering tasks

## 1. Creating and Basic List Operations

In [3]:
from typing import List, Any

# Creating lists with type hints (for better code documentation)
numbers: List[int] = [1, 2, 3, 4, 5]
names: List[str] = ["Alice", "Bob", "Charlie"]
mixed_data: List[Any] = [1, "hello", 3.14, True]

# Creating lists without type hints (standard approach)
simple_numbers = [1, 2, 3, 4, 5]
simple_names = ["Alice", "Bob", "Charlie"]

print(f"Numbers: {numbers}")
print(f"Names: {names}")
print(f"Mixed data: {mixed_data}")

# List length and basic operations
print(f"\nLength of numbers: {len(numbers)}")
print(f"First element: {numbers[0]}")
print(f"Last element: {numbers[-1]}")

Numbers: [1, 2, 3, 4, 5]
Names: ['Alice', 'Bob', 'Charlie']
Mixed data: [1, 'hello', 3.14, True]

Length of numbers: 5
First element: 1
Last element: 5


## 2. List Slicing Techniques

In [4]:
# Sample data for slicing demonstrations
data_points = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]

print(f"Original data: {data_points}")
print()

# Basic slicing: [start:end]
first_three = data_points[:3]
print(f"First three elements: {first_three}")

last_three = data_points[-3:]
print(f"Last three elements: {last_three}")

middle_elements = data_points[2:7]
print(f"Middle elements (index 2-6): {middle_elements}")

# Step slicing: [start:end:step]
every_second = data_points[::2]
print(f"Every second element: {every_second}")

every_third_from_index_1 = data_points[1::3]
print(f"Every third from index 1: {every_third_from_index_1}")

# Reverse slicing
reversed_list = data_points[::-1]
print(f"Reversed list: {reversed_list}")

Original data: [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]

First three elements: [10, 20, 30]
Last three elements: [80, 90, 100]
Middle elements (index 2-6): [30, 40, 50, 60, 70]
Every second element: [10, 30, 50, 70, 90]
Every third from index 1: [20, 50, 80]
Reversed list: [100, 90, 80, 70, 60, 50, 40, 30, 20, 10]


## 3. Sorting Operations

In [5]:
# Sample unsorted data
unsorted_numbers = [64, 34, 25, 12, 22, 11, 90]
unsorted_names = ["Charlie", "Alice", "Bob", "David"]

print(f"Original numbers: {unsorted_numbers}")
print(f"Original names: {unsorted_names}")
print()

# Using sorted() function (creates new list)
sorted_ascending = sorted(unsorted_numbers)
sorted_descending = sorted(unsorted_numbers, reverse=True)

print(f"Sorted ascending: {sorted_ascending}")
print(f"Sorted descending: {sorted_descending}")
print(f"Original unchanged: {unsorted_numbers}")
print()

# Using list.sort() method (modifies original list)
numbers_copy = unsorted_numbers.copy()
numbers_copy.sort()
print(f"After sort() method: {numbers_copy}")

# Sorting strings
sorted_names = sorted(unsorted_names)
print(f"Sorted names: {sorted_names}")

# Sorting by length
sorted_by_length = sorted(unsorted_names, key=len)
print(f"Sorted by length: {sorted_by_length}")

Original numbers: [64, 34, 25, 12, 22, 11, 90]
Original names: ['Charlie', 'Alice', 'Bob', 'David']

Sorted ascending: [11, 12, 22, 25, 34, 64, 90]
Sorted descending: [90, 64, 34, 25, 22, 12, 11]
Original unchanged: [64, 34, 25, 12, 22, 11, 90]

After sort() method: [11, 12, 22, 25, 34, 64, 90]
Sorted names: ['Alice', 'Bob', 'Charlie', 'David']
Sorted by length: ['Bob', 'Alice', 'David', 'Charlie']


## 4. List Manipulation Operations

In [6]:
# Starting with an empty list
dynamic_list = []

# Adding elements with append() - adds single element to end
dynamic_list.append(10)
dynamic_list.append(20)
print(f"After append operations: {dynamic_list}")

# Adding multiple elements
dynamic_list.extend([30, 40])  # Add multiple elements
dynamic_list.insert(1, 15)  # Insert at specific position
print(f"After extend and insert: {dynamic_list}")

# Removing elements with pop() - removes and returns element
last_element = dynamic_list.pop()  # Remove and return last element
print(f"Popped last element: {last_element}")
print(f"List after pop(): {dynamic_list}")

element_at_index = dynamic_list.pop(1)  # Remove and return element at index 1
print(f"Popped element at index 1: {element_at_index}")
print(f"List after pop(1): {dynamic_list}")

# Other removal methods
dynamic_list.append(30)  # Add it back for demonstration
dynamic_list.remove(30)  # Remove first occurrence of value
print(f"After remove(30): {dynamic_list}")

# Practical example: Stack operations using append() and pop()
stack = []
print("\nStack operations (Last In, First Out):")

# Push operations (append)
stack.append("first")
stack.append("second")
stack.append("third")
print(f"Stack after pushes: {stack}")

# Pop operations
item = stack.pop()
print(f"Popped: {item}, Stack now: {stack}")

# Finding elements
search_list = [1, 2, 3, 2, 4, 2, 5]
print(f"\nSearch list: {search_list}")
print(f"Index of first '2': {search_list.index(2)}")
print(f"Count of '2': {search_list.count(2)}")
print(f"Is 3 in list? {3 in search_list}")

After append operations: [10, 20]
After extend and insert: [10, 15, 20, 30, 40]
Popped last element: 40
List after pop(): [10, 15, 20, 30]
Popped element at index 1: 15
List after pop(1): [10, 20, 30]
After remove(30): [10, 20, 30]

Stack operations (Last In, First Out):
Stack after pushes: ['first', 'second', 'third']
Popped: third, Stack now: ['first', 'second']

Search list: [1, 2, 3, 2, 4, 2, 5]
Index of first '2': 1
Count of '2': 3
Is 3 in list? True


## 5. Multi-dimensional Arrays (Nested Lists)

In [7]:
# Creating a 2D array (matrix)
matrix = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
]

print("2D Matrix:")
for row in matrix:
    print(row)

# Accessing elements in 2D array
print(f"\nElement at row 1, column 2: {matrix[1][2]}")
print(f"First row: {matrix[0]}")
print(f"First column: {[row[0] for row in matrix]}")

# Data engineering example: Sales data by store and month
sales_data = [
    ["Store", "Jan", "Feb", "Mar"],  # Header row
    ["Store A", 1000, 1200, 1100],
    ["Store B", 800, 900, 950],
    ["Store C", 1500, 1600, 1400]
]

print("\nSales Data:")
for row in sales_data:
    print(row)

# Extract specific data
store_names = [row[0] for row in sales_data[1:]]  # Skip header
jan_sales = [row[1] for row in sales_data[1:]]  # January sales

print(f"\nStore names: {store_names}")
print(f"January sales: {jan_sales}")
print(f"Total January sales: {sum(jan_sales)}")

2D Matrix:
[1, 2, 3]
[4, 5, 6]
[7, 8, 9]

Element at row 1, column 2: 6
First row: [1, 2, 3]
First column: [1, 4, 7]

Sales Data:
['Store', 'Jan', 'Feb', 'Mar']
['Store A', 1000, 1200, 1100]
['Store B', 800, 900, 950]
['Store C', 1500, 1600, 1400]

Store names: ['Store A', 'Store B', 'Store C']
January sales: [1000, 800, 1500]
Total January sales: 3300


## 6. Practical Data Engineering Examples

In [8]:
# Example 1: Processing sensor readings
sensor_readings = [23.5, 24.1, 22.8, 25.0, 23.9, 24.3, 22.5]

print(f"Sensor readings: {sensor_readings}")
print(f"Average temperature: {sum(sensor_readings) / len(sensor_readings):.2f}")
print(f"Min temperature: {min(sensor_readings)}")
print(f"Max temperature: {max(sensor_readings)}")

# Find readings above average
average = sum(sensor_readings) / len(sensor_readings)
above_average = [temp for temp in sensor_readings if temp > average]
print(f"Readings above average: {above_average}")

# Example 2: Processing batch data with append/pop operations
batch_queue = []  # Start with empty queue

# Add batches as they arrive (append)
batch_queue.append(100)
batch_queue.append(250)
batch_queue.append(180)
print(f"Batch queue: {batch_queue}")

# Process batches (pop from beginning for FIFO)
processed_batch = batch_queue.pop(0)  # Remove first batch
print(f"Processed batch size: {processed_batch}")
print(f"Remaining batches: {batch_queue}")

# Example 3: Data validation with dynamic list building
data_with_errors = [1, 2, None, 4, "error", 6, 7]
clean_data = []

for item in data_with_errors:
    if isinstance(item, int):  #isinstance() function in Python is a built-in function used to determine if an object is an instance of a specified class or type
        clean_data.append(item)  # Build clean list dynamically

print(f"\nOriginal data: {data_with_errors}")
print(f"Clean data: {clean_data}")
print(f"Removed {len(data_with_errors) - len(clean_data)} invalid items")

Sensor readings: [23.5, 24.1, 22.8, 25.0, 23.9, 24.3, 22.5]
Average temperature: 23.73
Min temperature: 22.5
Max temperature: 25.0
Readings above average: [24.1, 25.0, 23.9, 24.3]
Batch queue: [100, 250, 180]
Processed batch size: 100
Remaining batches: [250, 180]

Original data: [1, 2, None, 4, 'error', 6, 7]
Clean data: [1, 2, 4, 6, 7]
Removed 2 invalid items


## Practice Exercises

Complete the following exercises to reinforce your understanding:

In [None]:
# Exercise 1: Create a function to find the second largest number in a list
def find_second_largest(numbers):
    # Find the second largest number in a list.
    
    # Args:
    #     numbers: List of integers
    
    # Returns:
    #     Second largest number

    # TODO: Implement this function
    # Hint: You can sort the list or use max() with removal
    pass

# Test your function
test_numbers = [64, 34, 25, 12, 22, 11, 90]
# Expected result: 64
# result = find_second_largest(test_numbers)
# print(f"Second largest: {result}")

In [None]:
# Exercise 2: Extract a specific column from a 2D matrix
def extract_column(matrix, column_index):

    # Extract a specific column from a 2D matrix.
    
    # Args:
    #     matrix: 2D list
    #     column_index: Index of column to extract
    
    # Returns:
    #     List containing the column values
    
    # TODO: Implement this function
    # Hint: Use list comprehension or a loop to get matrix[row][column_index]
    pass

# Test your function
test_matrix = [
    ["Name", "Age", "City"],
    ["Alice", 25, "New York"],
    ["Bob", 30, "London"],
    ["Charlie", 35, "Tokyo"]
]
# Expected result for column 1: ["Age", 25, 30, 35]
# ages = extract_column(test_matrix, 1)
# print(f"Age column: {ages}")

In [None]:
# Exercise 3: Process sales data to find top performers
def find_top_performers(sales_data, top_n=3):
   
    # Find top N performers based on total sales.
    
    # Args:
    #     sales_data: List where each inner list is [name, q1_sales, q2_sales, q3_sales, q4_sales]
    #     top_n: Number of top performers to return
    
    # Returns:
    #     List of top performers with their total sales
  
    # TODO: Implement this function
    # Hint: Calculate total sales for each person, then sort by total
    # You can use append() to build the result list
    pass

# Test data
quarterly_sales = [
    ["Alice", 1000, 1200, 1100, 1300],
    ["Bob", 800, 900, 950, 1000],
    ["Charlie", 1500, 1600, 1400, 1700],
    ["David", 1200, 1100, 1300, 1400],
    ["Eve", 900, 1000, 1050, 1100]
]

# Expected: Charlie should be #1 with highest total
# top_performers = find_top_performers(quarterly_sales, 3)
# print(f"Top 3 performers: {top_performers}")