# Metadata

**L1 Taxonomy** - Backend Development

**L2 Taxonomy** - API Development

**Subtopic** - Building a RESTful API with Flask

**Use Case** - Develop a Python module that reads a CSV file and transforms it into a JSON object

**Programming Language** - Python

**Target Model** - GPT-4o

# Setup

```requirements.txt
Flask==3.1.1
```


# Prompt

## Problem Overview

You are tasked with building a **modular and extensible Python-based RESTful API** using **Flask** that can ingest a CSV file through an HTTP POST request, parse it, and return a JSON response. The transformation should support both **flat and nested JSON formats** based on user input. The system must ensure robustness, correctness, and scalability to support varying CSV schemas. The solution should be structured in a way that allows the CSV-to-JSON logic to function independently from the Flask interface and be easily tested or reused in other systems.


## Input Format

1. **HTTP POST Request** to endpoint `/transform`.
2. The request must be in **multipart/form-data** format and contain:

   * `file`: The uploaded CSV file.
   * `format`: A string value either `"flat"` or `"nested"` specifying the transformation style.
3. CSV headers can include dot notation (e.g., `person.name`, `person.age`) for nested output.

Example request body:

```
POST /transform
Content-Type: multipart/form-data
- file: data.csv
- format: nested
```


## Output Format

A **JSON object** containing:

* `success`: Boolean indicating success or failure.
* `data`: The transformed JSON object (list of records).
* `error`: Error message if any.

**Successful response example**:

```json
{
  "success": true,
  "data": [
    {
      "person": {
        "name": "Alice",
        "age": "30"
      },
      "city": "New York"
    },
    {
      "person": {
        "name": "Bob",
        "age": "25"
      },
      "city": "Los Angeles"
    }
  ],
  "error": null
}
```

**Failure response example**:

```json
{
  "success": false,
  "data": null,
  "error": "Missing 'format' parameter or invalid CSV structure"
}
```


## Examples

### Input CSV (`flat` format):

```
name,age,city
Alice,30,New York
Bob,25,Los Angeles
```

### Output (`flat`):

```json
[
  {"name": "Alice", "age": "30", "city": "New York"},
  {"name": "Bob", "age": "25", "city": "Los Angeles"}
]
```

### Input CSV (`nested` format with dot notation):

```
person.name,person.age,location.city
Alice,30,New York
Bob,25,Los Angeles
```

### Output (`nested`):

```json
[
  {
    "person": {"name": "Alice", "age": "30"},
    "location": {"city": "New York"}
  },
  {
    "person": {"name": "Bob", "age": "25"},
    "location": {"city": "Los Angeles"}
  }
]
```



# Requirements

## Explicit Requirements

1. The Flask API must only accept `POST` requests with a file and format type.
2. CSV rows must be converted to dictionaries.
3. If the format is `nested`, headers with dot notation should form nested JSON objects.
4. The module must handle:

   * Empty files
   * Inconsistent row lengths
   * Invalid format values (`not in ['flat', 'nested']`)
5. Error messages must be clear and returned in the response JSON.
6. JSON keys should always be strings. Numeric values can be strings for consistency.


## Implicit Requirements

* The implementation must follow PEP8 and include docstrings for all functions and classes.
* The module must be independently testable (i.e., functions should be decoupled from Flask).
* File size should be limited to 2MB and validated in the Flask layer.
* Rows should be processed in a memory-efficient manner using generators if possible.
* Flask should return a proper 400 or 500 HTTP status code in case of errors.
* Only standard Python libraries and Flask should be used.


## Function Signature(s) and/or Class Definition(s)

You are expected to implement the following module structure:

```python
def parse_csv(file_stream: IO[str], format_type: str) -> List[Dict[str, Any]]:
    """
    Parse the uploaded CSV file and return a list of JSON records.

    :param file_stream: A text stream of the CSV file.
    :param format_type: 'flat' or 'nested' to specify transformation style.
    :return: List of dictionaries representing JSON rows.
    :raises ValueError: If the format_type is invalid or CSV parsing fails.
    """
```

**`app.py`**:

```python
@app.route("/transform", methods=["POST"])
def transform():
    """
    Flask route to accept a CSV file and return a JSON response.
    Validates input, delegates to the transformer module, and structures response.
    """
```


## Edge Cases

* A CSV with no rows (just headers).
* A CSV with missing values in some cells.
* Headers that conflict when transformed into nested keys (e.g., `a.b`, `a`).
* Non-UTF-8 encoded CSV files.
* Header with illegal or empty characters.
* Rows with more columns than headers or fewer columns than headers.
* Repeated headers (should raise an error).
* Missing `format` field or invalid value.


## Constraints

* Python version must be 3.7 or above.
* Use only Flask and standard Python libraries (`csv`, `io`, `typing`, etc).
* Maximum file size: 2 MB.
* CSV delimiter is always a comma (`,`).
* No external CSV parsing libraries like `pandas` allowed.
* Nested transformation must support arbitrary depth via dot notation.
* Response must not exceed 2 seconds for files with up to 10,000 rows.

Write the solution in a single module/file.

In [None]:
# code

"""
Flask RESTful API to transform uploaded CSV files into flat or nested
JSON format. The transformation logic is encapsulated in helper functions
within this module to preserve modularity.
"""

from flask import Flask, request, jsonify
import csv
import io
from typing import List, Dict, Any, IO
from werkzeug.exceptions import RequestEntityTooLarge

app = Flask(__name__)

# Enforce 2MB upload limit
app.config['MAX_CONTENT_LENGTH'] = 2 * 1024 * 1024


@app.route("/transform", methods=["POST"])
def transform():
    """
    Flask route to accept a CSV file and return a JSON response.
    Validates input, delegates to transformation logic, and returns a
    structured JSON result.
    """
    if 'file' not in request.files or 'format' not in request.form:
        return _error_response(
            "Missing 'file' or 'format' parameter.", 400
        )

    uploaded_file = request.files['file']
    format_type = request.form['format']

    try:
        file_stream = io.TextIOWrapper(
            uploaded_file.stream, encoding='utf-8'
        )
        data = parse_csv(file_stream, format_type)
        return jsonify(success=True, data=data, error=None)

    except UnicodeDecodeError:
        return _error_response("CSV file must be UTF-8 encoded.", 400)

    except ValueError as ve:
        return _error_response(str(ve), 400)

    except RequestEntityTooLarge:
        return _error_response("File too large, must be under 2MB.", 413)

    except Exception:
        return _error_response(
            "An unexpected error occurred while processing the file.", 500
        )


def parse_csv(file_stream: IO[str], format_type: str) -> List[Dict[str, Any]]:
    """
    Parse the uploaded CSV file and return a list of JSON records.

    :param file_stream: A text stream of the CSV file.
    :param format_type: 'flat' or 'nested' to specify transformation style.
    :return: List of dictionaries representing JSON rows.
    :raises ValueError: If the format_type is invalid or CSV parsing fails.
    """
    if format_type not in ('flat', 'nested'):
        raise ValueError("Invalid format_type. Must be 'flat' or 'nested'.")

    try:
        reader = csv.DictReader(file_stream)

        if not reader.fieldnames:
            raise ValueError("CSV file is empty or missing headers.")

        if len(set(reader.fieldnames)) != len(reader.fieldnames):
            raise ValueError("CSV headers contain duplicates.")

        _validate_headers(reader.fieldnames)

        data = []

        for row_num, row in enumerate(reader, start=1):
            if None in row:
                raise ValueError(
                    f"Row {row_num} has more columns than headers."
                )

            if len(row) != len(reader.fieldnames):
                raise ValueError(
                    f"Inconsistent column count in row {row_num}."
                )

            str_row = {k: str(v) for k, v in row.items()}

            if format_type == 'flat':
                data.append(str_row)
            else:
                nested = _to_nested_dict(str_row)
                data.append(nested)

        return data

    except csv.Error as e:
        raise ValueError(f"CSV parsing error: {str(e)}") from e


def _validate_headers(headers: List[str]) -> None:
    """
    Validate that headers are non-empty, legal, and non-conflicting.

    :param headers: List of header strings from the CSV file.
    :raises ValueError: If headers are invalid or conflicting.
    """
    seen = set()

    for header in headers:
        if not header or header.strip() == "":
            raise ValueError("Empty header found.")

        parts = header.split('.')
        path = []

        for part in parts:
            path.append(part)
            path_str = '.'.join(path)

            if path_str in seen:
                continue

            if any(h == path_str for h in headers if h != header):
                raise ValueError(
                    f"Conflicting nested key: '{path_str}' conflicts with "
                    f"another header."
                )

            seen.add(path_str)


def _to_nested_dict(flat_dict: Dict[str, str]) -> Dict[str, Any]:
    """
    Convert a flat dictionary with dotted keys into a nested dictionary.

    :param flat_dict: Dictionary with possibly dotted keys.
    :return: Nested dictionary.
    :raises ValueError: If there are key conflicts in nesting.
    """
    nested = {}

    for key, value in flat_dict.items():
        parts = key.split('.')
        sub_dict = nested

        for part in parts[:-1]:
            if part not in sub_dict:
                sub_dict[part] = {}
            elif not isinstance(sub_dict[part], dict):
                raise ValueError(
                    f"Key conflict while nesting: '{key}' cannot be created."
                )
            sub_dict = sub_dict[part]

        if parts[-1] in sub_dict:
            raise ValueError(
                f"Duplicate nested key detected: '{key}'"
            )

        sub_dict[parts[-1]] = value

    return nested


def _error_response(message: str, status_code: int):
    """
    Helper function to format error JSON responses.

    :param message: Error message to return.
    :param status_code: HTTP status code.
    :return: Flask response with JSON body.
    """
    return (
        jsonify(success=False, data=None, error=message),
        status_code
    )


if __name__ == "__main__":
    app.run(debug=True)


In [None]:
# tests
"""Unit tests for the CSVTransformer class."""

import unittest
import tempfile
import os
from datetime import datetime, timezone
from main import CSVTransformer


class TestCSVTransformer(unittest.TestCase):
    """Test suite for the CSVTransformer class."""

    def setUp(self):
        """Prepare common schema and sample CSV content for tests."""
        self.schema = {
            "id": {"target_field": "user.id", "type": int},
            "name": {"target_field": "user.name", "type": str},
            "active": {"target_field": "user.active", "type": bool},
            "created": {"target_field": "meta.created", "type": datetime},
        }

        self.valid_csv = (
            "id,name,active,created\n"
            "1,Alice,true,2023-01-01T10:00:00Z\n"
            "2,Bob,false,2023-02-01T11:30:00Z\n"
        )

    def _write_temp_file(self, content):
        """Write content to a temporary file and return the file path."""
        f = tempfile.NamedTemporaryFile(mode="w+", delete=False)
        f.write(content)
        f.close()
        return f.name

    def tearDown(self):
        """Clean up temporary files created during the tests."""
        for f in getattr(self, "_files_to_cleanup", []):
            os.unlink(f)

    def _cleanup_file(self, path):
        """Track a file for cleanup after the test."""
        if not hasattr(self, "_files_to_cleanup"):
            self._files_to_cleanup = []
        self._files_to_cleanup.append(path)

    def test_flat_mode_transform(self):
        """Test transformation in 'flat' mode."""
        path = self._write_temp_file(self.valid_csv)
        self._cleanup_file(path)
        transformer = CSVTransformer(self.schema, mode="flat")
        result = transformer.transform(path)
        self.assertEqual(len(result), 2)
        self.assertIn("user.id", result[0])
        self.assertIn("meta.created", result[0])

    def test_nested_mode_transform(self):
        """Test transformation in 'nested' mode."""
        path = self._write_temp_file(self.valid_csv)
        self._cleanup_file(path)
        transformer = CSVTransformer(self.schema, mode="nested")
        result = transformer.transform(path)
        self.assertEqual(result[0]["user"]["id"], 1)

    def test_grouped_mode(self):
        """Test grouping rows by a column in 'grouped' mode."""
        schema = {
            "name": {"target_field": "name", "type": str},
            "id": {"target_field": "id", "type": int},
        }
        csv = "id,name\n1,Alice\n2,Alice\n3,Bob\n"
        path = self._write_temp_file(csv)
        self._cleanup_file(path)
        transformer = CSVTransformer(schema, mode="grouped", group_by="name")
        result = transformer.transform(path)
        self.assertEqual(len(result["Alice"]), 2)
        self.assertEqual(len(result["Bob"]), 1)

    def test_invalid_mode(self):
        """Test error raised for invalid mode."""
        with self.assertRaises(ValueError):
            CSVTransformer(self.schema, mode="invalid")

    def test_invalid_on_error(self):
        """Test error raised for invalid on_error mode."""
        with self.assertRaises(ValueError):
            CSVTransformer(self.schema, on_error="warn")

    def test_grouped_missing_group_by(self):
        """Test error raised when 'group_by' is missing in grouped mode."""
        with self.assertRaises(ValueError):
            CSVTransformer(self.schema, mode="grouped")

    def test_missing_required_column_raises(self):
        """Test that missing columns raise an error with on_error='raise'."""
        csv = "id,created\n1,2022-01-01T00:00:00Z\n"
        path = self._write_temp_file(csv)
        self._cleanup_file(path)
        transformer = CSVTransformer(self.schema, on_error="raise")
        with self.assertRaises(ValueError):
            transformer.transform(path)

    def test_missing_required_column_skip(self):
        """Test that missing columns are skipped with on_error='skip'."""
        csv = "id,created\n1,2022-01-01T00:00:00Z\n"
        path = self._write_temp_file(csv)
        self._cleanup_file(path)
        transformer = CSVTransformer(self.schema, on_error="skip")
        result = transformer.transform(path)
        self.assertEqual(len(result), 1)

    def test_casting_error_raise(self):
        """Test that casting errors raise an exception."""
        csv = "id,name,active,created\nX,Alice,true,invalid\n"
        path = self._write_temp_file(csv)
        self._cleanup_file(path)
        transformer = CSVTransformer(self.schema, on_error="raise")
        with self.assertRaises(Exception):
            transformer.transform(path)

    def test_casting_error_skip(self):
        """Test that casting errors are skipped with on_error='skip'."""
        csv = "id,name,active,created\n1,Alice,,2023-01-01T00:00:00Z\n"
        path = self._write_temp_file(csv)
        self._cleanup_file(path)
        transformer = CSVTransformer(self.schema, mode="flat", on_error="skip")
        result = transformer.transform(path)
        expected = [{
            "user.id": 1,
            "user.name": "Alice",
            "meta.created": datetime(2023, 1, 1, 0, 0, tzinfo=timezone.utc),
        }]
        self.assertEqual(result, expected)

    def test_empty_csv_returns_empty_list(self):
        """Test that an empty CSV returns an empty list."""
        path = self._write_temp_file("")
        self._cleanup_file(path)
        transformer = CSVTransformer(self.schema, mode="flat")
        result = transformer.transform(path)
        self.assertEqual(result, [])

    def test_exceed_max_columns(self):
        """Test error raised when column count exceeds maximum allowed."""
        header = ",".join([f"col{i}" for i in range(101)])
        path = self._write_temp_file(header + "\n")
        self._cleanup_file(path)
        transformer = CSVTransformer({}, mode="flat")
        with self.assertRaises(ValueError):
            transformer.transform(path)

    def test_schema_duplicate_target_field(self):
        """Test schema validation for duplicate target_field entries."""
        schema = {
            "id": {"target_field": "user.id", "type": int},
            "uid": {"target_field": "user.id", "type": int},
        }
        with self.assertRaises(ValueError):
            CSVTransformer(schema)

    def test_schema_exceeds_nesting(self):
        """Test error for schema field path exceeding nesting limit."""
        schema = {
            "id": {"target_field": "a.b.c.d.e.f", "type": int},
        }
        with self.assertRaises(ValueError):
            CSVTransformer(schema)

    def test_cast_type_bool(self):
        """Test casting string values to bool."""
        t = CSVTransformer({})
        self.assertTrue(t._cast_type("True", bool))
        self.assertFalse(t._cast_type("0", bool))

    def test_cast_type_invalid_bool(self):
        """Test casting invalid bool string raises error."""
        t = CSVTransformer({}, on_error="raise")
        with self.assertRaises(ValueError):
            t._cast_type("maybe", bool)

    def test_cast_type_int(self):
        """Test casting string to integer."""
        t = CSVTransformer({})
        self.assertEqual(t._cast_type("123", int), 123)

    def test_cast_type_datetime(self):
        """Test parsing ISO 8601 datetime string."""
        t = CSVTransformer({})
        dt = t._cast_type("2023-01-01T10:00:00Z", datetime)
        self.assertIsInstance(dt, datetime)

    def test_cast_type_unsupported_type(self):
        """Test error raised for unsupported type casting."""
        t = CSVTransformer({}, on_error="raise")
        with self.assertRaises(TypeError):
            t._cast_type("{}", dict)

    def test_flatten_dict(self):
        """Test flattening of a nested dictionary."""
        t = CSVTransformer({})
        nested = {"a": {"b": {"c": 1}}}
        flat = t._flatten_dict(nested)
        self.assertEqual(flat["a.b.c"], 1)

    def test_construct_mapped_field(self):
        """Test nested field assignment using dot-separated path."""
        t = CSVTransformer({})
        d = {}
        t._construct_mapped_field(d, "a.b.c", 42)
        self.assertEqual(d["a"]["b"]["c"], 42)

    def test_extract_group_key(self):
        """Test extraction of group_by key from nested structure."""
        t = CSVTransformer({}, mode="grouped", group_by="a.b")
        row = {"a": {"b": "x"}}
        self.assertEqual(t._extract_group_key(row), "x")

    def test_extract_group_key_missing(self):
        """Test missing group_by key returns None."""
        t = CSVTransformer({}, mode="grouped", group_by="a.b")
        row = {"a": {"c": "x"}}
        self.assertIsNone(t._extract_group_key(row))


# Model Breaking Proof

#### Model Breaking Task URL: https://rlhf-v3.turing.com/prompt/3d6ef542-d380-4505-a09d-5ef6bedf10d2?origin=https%3A%2F%2Flabeling-ms.turing.com&redirect_url=https%3A%2F%2Flabeling-ms.turing.com%2Fconversations%2F74836%2Fview

#### Model code:

```python
from flask import Flask, request, jsonify
import csv
import io
from typing import List, Dict, Any, IO

app = Flask(__name__)

def parse_csv(file_stream: IO[str], format_type: str) -> List[Dict[str, Any]]:
    """
    Parse the uploaded CSV file and return a list of JSON records.

    :param file_stream: A text stream of the CSV file.
    :param format_type: 'flat' or 'nested' to specify transformation style.
    :return: List of dictionaries representing JSON rows.
    :raises ValueError: If the format_type is invalid or CSV parsing fails.
    """
    # Read the CSV file into a DictReader
    try:
        reader = csv.DictReader(file_stream) # <Issue>: The code does not validate reader.fieldnames for duplicates, which may lead to overwriting or undefined behavior.
        if format_type not in ('flat', 'nested'):
            raise ValueError(f"Invalid format_type {format_type}. Must be 'flat' or 'nested'.")
        
        data = []
        
        for row in reader: # <Issue>: csv.DictReader silently pads/truncates rows. The code fails to detect rows with more or fewer values than headers.
            if format_type == 'flat':
                data.append(row) # <Issue>: Values from csv.DictReader may remain as native types (e.g., integers, floats). This violates the requirement that all values in the JSON must be strings.
            else:  # format_type == 'nested'
                nested_row = {}
                for key, value in row.items():
                    keys = key.split('.')
                    sub_dict = nested_row
                    for part in keys[:-1]:
                        if part not in sub_dict:
                            sub_dict[part] = {}
                        sub_dict = sub_dict[part] # <Issue>: The code blindly inserts into nested structures without verifying if the current key already exists as a non-dict value, which could lead to incorrect nesting or crashes.
                    sub_dict[keys[-1]] = value
                data.append(nested_row)

        return data
    except Exception as e:
        raise ValueError(f"Error processing CSV: {str(e)}")

@app.route("/transform", methods=["POST"])
def transform():
    """
    Flask route to accept a CSV file and return a JSON response.
    Validates input, delegates to the transformer module, and structures response.
    """
    if 'file' not in request.files or 'format' not in request.form:
        return jsonify(success=False, data=None, error="Missing file or format parameter"), 400

    file = request.files['file']
    format_type = request.form['format']

    # Limit file size to 2MB
    file.stream.seek(0, io.SEEK_END)
    file_length = file.stream.tell()
    file.stream.seek(0)
    if file_length > 2 * 1024 * 1024:
        return jsonify(success=False, data=None, error="File too large, must be less than 2MB"), 400

    try:
        file_stream = io.TextIOWrapper(file.stream, encoding='utf-8') # <Issue>: If decoding fails, the generic except Exception block will catch it and return "An unexpected error occurred", violating the requirement for clear error messages.
        data = parse_csv(file_stream, format_type)
        return jsonify(success=True, data=data, error=None)
    except ValueError as e:
        return jsonify(success=False, data=None, error=str(e)), 400
    except Exception as e:
        return jsonify(success=False, data=None, error="An unexpected error occurred"), 500

if __name__ == "__main__":
    app.run(debug=True)
```