# Metadata

**L1 Taxonomy** - Computing Paradigms

**L2 Taxonomy** - Functional Programming

**Subtopic** - Higher-Order Functions and their Applications

**Use Case** - Create a Python module that uses higher-order functions to process and transform data from a CSV file. The module should include functions for filtering rows based on certain criteria, mapping values to transform the data, and reducing the data to aggregate results.

**Programming Language** - Python

**Target Model** - GPT-4o

# Setup

```requirements.txt
```


# Prompt
## Problem Description

You are tasked with creating a Python module that implements a flexible data processing pipeline using functional programming principles. The pipeline will process data from a CSV file by applying a sequence of operations such as filtering, mapping, and reducing.

The core of the solution will be a main function that takes a file path and a list of processing steps. It must use higher-order functions to dynamically apply these steps, demonstrating a clean separation between the data and the logic that transforms it.

## Input Format

The main function will accept two arguments:
1.  file_path: A string representing the path to the input CSV file. The CSV will have a header row.
2.  operations: A list of tuples, where each tuple defines a processing step.

The operation tuples will have the following structure:
-   Filter: ('filter', column_name, filter_func)
-   Map: ('map', column_name, map_func)
-   Reduce: ('reduce', reduce_func, initial_value)

## Output Format

The function should return the final result after all operations have been applied.
-   If the last operation is a filter or map, the output will be a list of dictionaries.
-   If the last operation is a reduce, the output will be the single aggregated value.

## Example

Input CSV (sales.csv):
```
product,category,quantity,price
Laptop,Electronics,10,1200
Mouse,Electronics,50,25
Keyboard,Electronics,30,75
T-Shirt,Apparel,100,15
```

Input Operations:
```
operations = [
    ('filter', 'category', lambda x: x == 'Electronics'),
    ('map', 'total_price', lambda row: int(row['quantity']) * int(row['price'])),
    ('reduce', lambda acc, row: acc + row['total_price'], 0)
]
```

Example Execution:
process_csv_pipeline('sales.csv', operations)

Explanation:
1.  Filter: Keeps only the rows where the category is 'Electronics'.
2.  Map: Creates a new field total_price for each of these rows.
3.  Reduce: Sums the total_price of the filtered and mapped rows.
    -   Laptop: 10 * 1200 = 12000
    -   Mouse: 50 * 25 = 1250
    -   Keyboard: 30 * 75 = 2250
    -   Total = 12000 + 1250 + 2250 = 15500

Final Output: 15500


# Requirements
### Explicit Requirements

1.  Main Function Signature: Implement a main function with the following signature:
    def process_csv_pipeline(file_path: str, operations: list) -> any:

2.  CSV Reading: The function must read the data from the CSV file specified by file_path. The first row should be treated as the header, and each subsequent row should be loaded as a dictionary.

3.  Higher-Order Function for Pipeline: The main function must iterate through the operations list and apply each transformation sequentially. The output of one step serves as the input to the next.

4.  Filter Operation: When it encounters a ('filter', ...) tuple, it must apply the provided filter_func to the specified column_name for each row, keeping only the rows for which the function returns True.

5.  Map Operation: When it encounters a ('map', ...) tuple, it must apply the map_func to each row and add the result as a new key-value pair in that row's dictionary, with the key being column_name.

6.  Reduce Operation: When it encounters a ('reduce', ...) tuple, it must use the reduce_func and initial_value to aggregate the entire dataset into a single value. This should be the final operation in the pipeline if present.

### Implicit Requirements

-   The solution must be implemented using a functional style. Avoid using imperative loops where standard functional tools like map(), filter(), and functools.reduce() would be more appropriate.
-   The functions provided in the operations list (e.g., lambdas) must be treated as first-class citizens.
-   The data processing should be lazy where possible, though not a strict requirement.
-   The original data from the file should be treated as immutable. Each map or filter operation should produce a new collection of data.

### Edge Cases

Your implementation must correctly handle the following situations:

-   The input file_path does not point to an existing file. A FileNotFoundError should be raised.
-   The input CSV file is empty or contains only a header. The pipeline should run and return an empty list or the initial value for a reduce operation.
-   An operation tuple in the operations list is malformed (e.g., wrong number of elements or invalid operation name). A ValueError should be raised.
-   A function (filter or map) applied to a row causes an exception (e.g., a KeyError if a column is missing, or a TypeError on an unexpected data type). The row that caused the error should be skipped, and the processing should continue with the next row.

### Constraints

-   You must use only the Python standard library. csv and functools are permitted.
-   No input validation is required for the file_path or operations parameters themselves, but the function must handle the runtime errors specified in the edge cases.


In [None]:
# code

import csv
from functools import reduce
from typing import List, Dict, Any, Callable, Tuple


def _safe_apply_filter(row: Dict[str, Any], column: str, func: Callable) -> bool:
    """
    Safely applies a filter function to a single row.
    Returns False if the column doesn't exist or if the function raises an exception.
    """
    if column not in row:
        return False
    try:
        return func(row[column])
    except Exception:
        # Skip the row if any error occurs during filter function execution
        return False


def _safe_apply_map(row: Dict[str, Any], new_col_name: str, func: Callable) -> Dict[str, Any] | None:
    """
    Safely applies a map function to a single row.
    Returns the updated row, or None if the function raises an exception.
    """
    try:
        new_value = func(row)
        return {**row, new_col_name: new_value}
    except Exception:
        # Skip the row by returning None if any error occurs
        return None


def process_csv_pipeline(file_path: str, operations: List[Tuple]) -> Any:
    """
    Processes data from a CSV file using a functional pipeline of operations.

    Args:
        file_path: The path to the input CSV file.
        operations: A list of tuples, each defining a processing step
                    ('filter', 'map', or 'reduce').

    Returns:
        The final processed data, which can be a list of dictionaries or a single
        aggregated value if a 'reduce' operation is the final step.

    Raises:
        FileNotFoundError: If the specified file_path does not exist.
        ValueError: If an operation tuple is malformed or an unknown operation
                    type is provided.
    """
    try:
        with open(file_path, mode='r', newline='', encoding='utf-8') as csvfile:
            reader = csv.DictReader(csvfile)
            data: List[Dict[str, Any]] = list(reader)
    except FileNotFoundError:
        raise FileNotFoundError(f"The file {file_path} does not exist.")

    # Process each operation in the pipeline sequentially
    for operation in operations:
        if not isinstance(operation, tuple) or not operation:
            raise ValueError(
                "Invalid operation format: Each operation must be a non-empty tuple.")

        op_type = operation[0]

        if op_type == 'filter':
            if len(operation) != 3:
                raise ValueError(
                    "Filter operation requires 3 elements: ('filter', column_name, filter_func).")
            _, column_name, filter_func = operation
            data = [row for row in data if _safe_apply_filter(
                row, column_name, filter_func)]

        elif op_type == 'map':
            if len(operation) != 3:
                raise ValueError(
                    "Map operation requires 3 elements: ('map', new_column_name, map_func).")
            _, new_column_name, map_func = operation

            # Process map row by row, skipping any that cause errors
            mapped_data = []
            for row in data:
                new_row = _safe_apply_map(row, new_column_name, map_func)
                if new_row is not None:
                    mapped_data.append(new_row)
            data = mapped_data

        elif op_type == 'reduce':
            if len(operation) != 3:
                raise ValueError(
                    "Reduce operation requires 3 elements: ('reduce', reduce_func, initial_value).")
            _, reduce_func, initial_value = operation
            # The reduce operation is terminal and returns the final aggregated value
            return reduce(reduce_func, data, initial_value)

        else:
            raise ValueError(
                f"Unknown operation type: '{op_type}'. Must be 'filter', 'map', or 'reduce'.")

    return data

In [None]:
# tests
"""Unit tests for the CSV processing pipeline module."""

import unittest
import tempfile
import os
import csv
from main import process_csv_pipeline


class TestProcessCSVPipeline(unittest.TestCase):
    """Test suite for validating process_csv_pipeline behavior."""

    def setUp(self):
        """Create a temporary CSV file for testing."""
        self.temp_file = tempfile.NamedTemporaryFile(
            delete=False, mode='w', newline='', encoding='utf-8'
        )
        writer = csv.DictWriter(
            self.temp_file, fieldnames=['name', 'age', 'salary']
        )
        writer.writeheader()
        writer.writerows([
            {'name': 'Alice', 'age': '30', 'salary': '70000'},
            {'name': 'Bob', 'age': '25', 'salary': '50000'},
            {'name': 'Charlie', 'age': '35', 'salary': '80000'},
            {'name': 'Dana', 'age': 'invalid', 'salary': '40000'},
        ])
        self.temp_file.close()
        self.file_path = self.temp_file.name

    def tearDown(self):
        """Remove the temporary CSV file after testing."""
        os.remove(self.file_path)

    def test_filter_valid_numeric_column(self):
        """Test filtering rows with numeric age > 30."""
        ops = [('filter', 'age', lambda x: x.isdigit() and int(x) > 30)]
        result = process_csv_pipeline(self.file_path, ops)
        self.assertEqual(len(result), 1)
        self.assertEqual(result[0]['name'], 'Charlie')

    def test_filter_invalid_column(self):
        """Test filtering using a nonexistent column."""
        ops = [('filter', 'nonexistent', lambda x: True)]
        result = process_csv_pipeline(self.file_path, ops)
        self.assertEqual(result, [])

    def test_map_valid_operation(self):
        """Test mapping a column to create a new computed column."""
        ops = [('map', 'double_salary', lambda row: int(row['salary']) * 2)]
        result = process_csv_pipeline(self.file_path, ops)
        self.assertEqual(result[0]['double_salary'], 140000)
        self.assertEqual(result[1]['double_salary'], 100000)

    def test_map_raises_exception_skips_row(self):
        """Test that rows raising exceptions during map are skipped."""
        ops = [('map', 'parsed_age', lambda row: int(row['age']))]
        result = process_csv_pipeline(self.file_path, ops)
        for row in result:
            self.assertIn('parsed_age', row)
            self.assertNotEqual(row['name'], 'Dana')
        self.assertEqual(len(result), 3)

    def test_reduce_sum_salary(self):
        """Test reducing mapped numeric salary values to their sum."""
        ops = [
            ('map', 'numeric_salary', lambda row: int(row['salary'])),
            ('reduce', lambda acc, row: acc + row['numeric_salary'], 0),
        ]
        result = process_csv_pipeline(self.file_path, ops)
        self.assertEqual(result, 70000 + 50000 + 80000 + 40000)

    def test_reduce_returns_initial_on_empty_data(self):
        """Test reduce on filtered-out data returns initial value."""
        ops = [
            ('filter', 'age', lambda x: False),
            ('reduce', lambda acc, row: acc + 1, 0),
        ]
        result = process_csv_pipeline(self.file_path, ops)
        self.assertEqual(result, 0)

    def test_invalid_file_path(self):
        """Test error raised on missing input file."""
        with self.assertRaises(FileNotFoundError):
            process_csv_pipeline("nonexistent.csv", [])

    def test_invalid_operation_format(self):
        """Test malformed operation raises ValueError."""
        with self.assertRaises(ValueError):
            process_csv_pipeline(self.file_path, [None])

    def test_unknown_operation_type(self):
        """Test unknown operation raises ValueError."""
        with self.assertRaises(ValueError):
            process_csv_pipeline(
                self.file_path, [('transform', 'col', lambda x: x)]
            )

    def test_invalid_filter_length(self):
        """Test ValueError for improperly sized filter tuple."""
        with self.assertRaises(ValueError):
            process_csv_pipeline(self.file_path, [('filter', 'age')])

    def test_invalid_map_length(self):
        """Test ValueError for improperly sized map tuple."""
        with self.assertRaises(ValueError):
            process_csv_pipeline(self.file_path, [('map', 'new_col')])

    def test_invalid_reduce_length(self):
        """Test ValueError for improperly sized reduce tuple."""
        with self.assertRaises(ValueError):
            process_csv_pipeline(
                self.file_path, [('reduce', lambda x, y: x)]
            )

    def test_pipeline_no_operations(self):
        """Test pipeline returns unmodified data if no ops given."""
        result = process_csv_pipeline(self.file_path, [])
        self.assertEqual(len(result), 4)

    def test_chained_filter_and_map(self):
        """Test combined filtering and mapping operations."""
        ops = [
            ('filter', 'age', lambda x: x.isdigit() and int(x) < 33),
            ('map', 'greeting', lambda row: f"Hello {row['name']}"),
        ]
        result = process_csv_pipeline(self.file_path, ops)
        self.assertEqual(len(result), 2)
        self.assertIn('greeting', result[0])

    def test_filter_all_invalid_rows(self):
        """Test filtering out all rows results in empty output."""
        ops = [('filter', 'age', lambda x: False)]
        result = process_csv_pipeline(self.file_path, ops)
        self.assertEqual(result, [])

    def test_map_add_constant_column(self):
        """Test adding a constant value column using map."""
        ops = [('map', 'country', lambda row: 'India')]
        result = process_csv_pipeline(self.file_path, ops)
        for row in result:
            self.assertEqual(row['country'], 'India')

    def test_reduce_to_list_of_names(self):
        """Test reducing to a list of names."""
        ops = [('reduce', lambda acc, row: acc + [row['name']], [])]
        result = process_csv_pipeline(self.file_path, ops)
        self.assertListEqual(result, ['Alice', 'Bob', 'Charlie', 'Dana'])

    def test_reduce_on_filtered_rows(self):
        """Test reduction after filtering based on salary."""
        ops = [
            ('filter', 'salary', lambda x: int(x) > 60000),
            ('reduce', lambda acc, row: acc + [row['name']], []),
        ]
        result = process_csv_pipeline(self.file_path, ops)
        self.assertEqual(result, ['Alice', 'Charlie'])

    def test_map_add_length_of_name(self):
        """Test mapping to create name length column."""
        ops = [('map', 'name_len', lambda row: len(row['name']))]
        result = process_csv_pipeline(self.file_path, ops)
        for row in result:
            self.assertEqual(row['name_len'], len(row['name']))

    def test_map_skips_row_on_key_error(self):
        """Test map skips rows if referenced column is missing."""
        ops = [('map', 'missing_col', lambda row: row['unknown'])]
        result = process_csv_pipeline(self.file_path, ops)
        self.assertEqual(result, [])

    def test_filter_with_exception_returns_false(self):
        """Test filter returns False on exception (invalid int)."""
        ops = [('filter', 'age', lambda x: int(x) > 0)]
        result = process_csv_pipeline(self.file_path, ops)
        self.assertEqual(len(result), 3)


# Model Breaking Proof

#### Model Breaking Task URL: https://rlhf-v3.turing.com/prompt/2f2bbdf8-98b2-4b9a-ad58-a2d05726dbf3

#### Model code:

```python
import csv
from functools import reduce

def process_csv_pipeline(file_path: str, operations: list) -> any:
    # Step 1: Read the CSV file, handle FileNotFoundError
    try:
        with open(file_path, newline='') as csvfile:
            reader = csv.DictReader(csvfile)
            data = list(reader)
    except FileNotFoundError:
        raise FileNotFoundError(f"The file {file_path} does not exist.")
    
    # Step 2: Process each operation
    for operation in operations:
        if not isinstance(operation, tuple) or not operation:
            raise ValueError("Invalid operation format.")

        op_type = operation[0]

        # <Issue>: The error handling for the 'filter' operation is incorrect. The prompt requires that if an exception occurs when applying the `filter_func` to a single row, that specific row should be skipped. This implementation wraps the entire `filter` operation in a single `try...except` block. If any row causes an error, the `except` block is triggered, which then applies a completely different and incorrect fallback logic (`lambda row: column_name in row and True`) to the *entire dataset*, instead of skipping only the problematic row.
        if op_type == 'filter':
            if len(operation) != 3:
                raise ValueError("Filter operation requires exactly 3 elements.")
            column_name, filter_func = operation[1], operation[2]
            try:
                data = list(filter(lambda row: (column_name in row and filter_func(row[column_name])), data))
            except Exception as e:
                data = list(filter(lambda row: column_name in row and True, data))  # skip error rows

        elif op_type == 'map':
            if len(operation) != 3:
                raise ValueError("Map operation requires exactly 3 elements.")
            column_name, map_func = operation[1], operation[2]
            try:
                data = [{**row, column_name: map_func(row)} for row in data]
            except Exception as e:
                data = [row for row in data if column_name in row]  # skip error rows

        elif op_type == 'reduce':
            if len(operation) != 3:
                raise ValueError("Reduce operation requires exactly 3 elements.")
            reduce_func, initial_value = operation[1], operation[2]
            try:
                return reduce(reduce_func, data, initial_value)
            except Exception as e:
                continue  # skip reduce if error occurs

        else:
            raise ValueError(f"Unknown operation type: {op_type}")

    # Step 3: Return the transformed data or initial value, depending on last operation
    return data
```