# Arrays/Lists: Slicing, Sorting, Array Manipulation, Multi-dimensional Arrays

## Learning Objectives
By the end of this notebook, you will be able to:
- Create and manipulate Python lists
- Use list slicing techniques effectively
- Sort lists using various methods
- Work with multi-dimensional arrays (nested lists)
- Perform common array operations for data engineering tasks

## 1. Creating and Basic List Operations

In [None]:
from typing import List, Any

# Creating lists with type hints
numbers: List[int] = [1, 2, 3, 4, 5]
names: List[str] = ["Alice", "Bob", "Charlie"]
mixed_data: List[Any] = [1, "hello", 3.14, True]

print(f"Numbers: {numbers}")
print(f"Names: {names}")
print(f"Mixed data: {mixed_data}")

# List length and basic operations
print(f"\nLength of numbers: {len(numbers)}")
print(f"First element: {numbers[0]}")
print(f"Last element: {numbers[-1]}")

## 2. List Slicing Techniques

In [None]:
# Sample data for slicing demonstrations
data_points: List[int] = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]

print(f"Original data: {data_points}")
print()

# Basic slicing: [start:end]
first_three: List[int] = data_points[:3]
print(f"First three elements: {first_three}")

last_three: List[int] = data_points[-3:]
print(f"Last three elements: {last_three}")

middle_elements: List[int] = data_points[2:7]
print(f"Middle elements (index 2-6): {middle_elements}")

# Step slicing: [start:end:step]
every_second: List[int] = data_points[::2]
print(f"Every second element: {every_second}")

every_third_from_index_1: List[int] = data_points[1::3]
print(f"Every third from index 1: {every_third_from_index_1}")

# Reverse slicing
reversed_list: List[int] = data_points[::-1]
print(f"Reversed list: {reversed_list}")

## 3. Sorting Operations

In [None]:
# Sample unsorted data
unsorted_numbers: List[int] = [64, 34, 25, 12, 22, 11, 90]
unsorted_names: List[str] = ["Charlie", "Alice", "Bob", "David"]

print(f"Original numbers: {unsorted_numbers}")
print(f"Original names: {unsorted_names}")
print()

# Using sorted() function (creates new list)
sorted_ascending: List[int] = sorted(unsorted_numbers)
sorted_descending: List[int] = sorted(unsorted_numbers, reverse=True)

print(f"Sorted ascending: {sorted_ascending}")
print(f"Sorted descending: {sorted_descending}")
print(f"Original unchanged: {unsorted_numbers}")
print()

# Using list.sort() method (modifies original list)
numbers_copy: List[int] = unsorted_numbers.copy()
numbers_copy.sort()
print(f"After sort() method: {numbers_copy}")

# Sorting strings
sorted_names: List[str] = sorted(unsorted_names)
print(f"Sorted names: {sorted_names}")

# Sorting by length
sorted_by_length: List[str] = sorted(unsorted_names, key=len)
print(f"Sorted by length: {sorted_by_length}")

## 4. List Manipulation Operations

In [None]:
# Starting with an empty list
dynamic_list: List[int] = []

# Adding elements
dynamic_list.append(10)  # Add single element
dynamic_list.extend([20, 30, 40])  # Add multiple elements
dynamic_list.insert(1, 15)  # Insert at specific position

print(f"After additions: {dynamic_list}")

# Removing elements
dynamic_list.remove(30)  # Remove first occurrence of value
popped_element: int = dynamic_list.pop()  # Remove and return last element
popped_at_index: int = dynamic_list.pop(1)  # Remove and return element at index

print(f"After removals: {dynamic_list}")
print(f"Popped element: {popped_element}")
print(f"Popped at index 1: {popped_at_index}")

# Finding elements
search_list: List[int] = [1, 2, 3, 2, 4, 2, 5]
print(f"\nSearch list: {search_list}")
print(f"Index of first '2': {search_list.index(2)}")
print(f"Count of '2': {search_list.count(2)}")
print(f"Is 3 in list? {3 in search_list}")

## 5. Multi-dimensional Arrays (Nested Lists)

In [None]:
from typing import List

# Creating a 2D array (matrix)
matrix: List[List[int]] = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
]

print("2D Matrix:")
for row in matrix:
    print(row)

# Accessing elements in 2D array
print(f"\nElement at row 1, column 2: {matrix[1][2]}")
print(f"First row: {matrix[0]}")
print(f"First column: {[row[0] for row in matrix]}")

# Data engineering example: Sales data by store and month
sales_data: List[List[Any]] = [
    ["Store", "Jan", "Feb", "Mar"],  # Header row
    ["Store A", 1000, 1200, 1100],
    ["Store B", 800, 900, 950],
    ["Store C", 1500, 1600, 1400]
]

print("\nSales Data:")
for row in sales_data:
    print(row)

# Extract specific data
store_names: List[str] = [row[0] for row in sales_data[1:]]  # Skip header
jan_sales: List[int] = [row[1] for row in sales_data[1:]]  # January sales

print(f"\nStore names: {store_names}")
print(f"January sales: {jan_sales}")
print(f"Total January sales: {sum(jan_sales)}")

## 6. Practical Data Engineering Examples

In [None]:
# Example 1: Processing sensor readings
sensor_readings: List[float] = [23.5, 24.1, 22.8, 25.0, 23.9, 24.3, 22.5]

print(f"Sensor readings: {sensor_readings}")
print(f"Average temperature: {sum(sensor_readings) / len(sensor_readings):.2f}")
print(f"Min temperature: {min(sensor_readings)}")
print(f"Max temperature: {max(sensor_readings)}")

# Find readings above average
average: float = sum(sensor_readings) / len(sensor_readings)
above_average: List[float] = [temp for temp in sensor_readings if temp > average]
print(f"Readings above average: {above_average}")

# Example 2: Processing batch data
batch_sizes: List[int] = [100, 250, 180, 300, 220, 150]
batch_sizes.sort(reverse=True)  # Sort largest first

print(f"\nBatch sizes (largest first): {batch_sizes}")
print(f"Top 3 batches: {batch_sizes[:3]}")
print(f"Total records: {sum(batch_sizes)}")

# Example 3: Data validation
data_with_errors: List[Any] = [1, 2, None, 4, "error", 6, 7]
clean_data: List[int] = []

for item in data_with_errors:
    if isinstance(item, int):
        clean_data.append(item)

print(f"\nOriginal data: {data_with_errors}")
print(f"Clean data: {clean_data}")
print(f"Removed {len(data_with_errors) - len(clean_data)} invalid items")

## Practice Exercises

Complete the following exercises to reinforce your understanding:

In [None]:
# Exercise 1: Create a function to find the second largest number in a list
def find_second_largest(numbers: List[int]) -> int:
    """
    Find the second largest number in a list.
    
    Args:
        numbers: List of integers
    
    Returns:
        Second largest number
    """
    # TODO: Implement this function
    pass

# Test your function
test_numbers: List[int] = [64, 34, 25, 12, 22, 11, 90]
# Expected result: 64
# result = find_second_largest(test_numbers)
# print(f"Second largest: {result}")

In [None]:
# Exercise 2: Extract a specific column from a 2D matrix
def extract_column(matrix: List[List[Any]], column_index: int) -> List[Any]:
    """
    Extract a specific column from a 2D matrix.
    
    Args:
        matrix: 2D list
        column_index: Index of column to extract
    
    Returns:
        List containing the column values
    """
    # TODO: Implement this function
    pass

# Test your function
test_matrix: List[List[Any]] = [
    ["Name", "Age", "City"],
    ["Alice", 25, "New York"],
    ["Bob", 30, "London"],
    ["Charlie", 35, "Tokyo"]
]
# Expected result for column 1: ["Age", 25, 30, 35]
# ages = extract_column(test_matrix, 1)
# print(f"Age column: {ages}")

In [None]:
# Exercise 3: Process sales data to find top performers
def find_top_performers(sales_data: List[List[Any]], top_n: int = 3) -> List[List[Any]]:
    """
    Find top N performers based on total sales.
    
    Args:
        sales_data: List where each inner list is [name, q1_sales, q2_sales, q3_sales, q4_sales]
        top_n: Number of top performers to return
    
    Returns:
        List of top performers with their total sales
    """
    # TODO: Implement this function
    # Hint: Calculate total sales for each person, then sort by total
    pass

# Test data
quarterly_sales: List[List[Any]] = [
    ["Alice", 1000, 1200, 1100, 1300],
    ["Bob", 800, 900, 950, 1000],
    ["Charlie", 1500, 1600, 1400, 1700],
    ["David", 1200, 1100, 1300, 1400],
    ["Eve", 900, 1000, 1050, 1100]
]

# Expected: Charlie should be #1 with highest total
# top_performers = find_top_performers(quarterly_sales, 3)
# print(f"Top 3 performers: {top_performers}")

## Summary

In this notebook, you learned:

1. **List Creation**: How to create typed lists for different data types
2. **Slicing**: Various slicing techniques for extracting data subsets
3. **Sorting**: Different methods to sort lists and custom sorting criteria
4. **Manipulation**: Adding, removing, and modifying list elements
5. **Multi-dimensional Arrays**: Working with nested lists for structured data
6. **Practical Applications**: Real-world data engineering scenarios

These skills are fundamental for data engineering tasks like:
- Processing batch data
- Cleaning and validating datasets
- Extracting specific data subsets
- Organizing data for analysis

**Next Steps**: Practice the exercises above and move on to the Dictionary notebook to learn about key-value data structures.