### Exercises

#### Question 1

Alongside this note book, four CSV files are specified (one is in fact a TSV file).

For each file, load it using the CSV module, and find the smallest and largest numbers in the data.

All these files contain just lists of numbers - with the exception of a possible header row

#### Question 1 – Solution

We will:

1. Use the built-in **`csv`** module (as required).
2. Use **`pathlib.Path`** to locate all `.csv` and `.tsv` files in the current directory.
3. For each file:
   * Select the correct delimiter based on the extension (`','` for CSV, `'\t'` for TSV).
   * Read all cells and try to convert them to `float`.
   * Ignore non-numeric values automatically (this skips header rows and any stray text).
   * Track the **minimum** and **maximum** values in a numerically safe way.
4. Collect the results in a dictionary `{filename: (min_value, max_value)}` and print them.

This approach is robust, handles possible header rows and multiple columns, and keeps the logic well-structured in reusable functions.

In [1]:
from __future__ import annotations

from pathlib import Path
import csv
from typing import Iterable, Tuple, Dict


def _detect_delimiter(path: Path) -> str:
    """Return the appropriate delimiter for a given file based on its suffix.

    *.csv -> ','
    *.tsv -> '\t'
    otherwise defaults to ','
    """
    suffix = path.suffix.lower()
    if suffix == ".tsv":
        return "\t"
    return ","


def _iter_numeric_values(reader: Iterable[Iterable[str]]) -> Iterable[float]:
    """Yield numeric values (as floats) from a CSV reader.

    Any cell that cannot be converted to float is ignored. This naturally skips
    header rows and any non-numeric junk in the file.
    """
    for row in reader:
        for cell in row:
            text = cell.strip()
            if not text:
                # Skip empty cells
                continue
            try:
                yield float(text)
            except ValueError:
                # Non-numeric (e.g., header names) -> just ignore
                continue


def find_min_max_in_file(path: Path) -> Tuple[float, float]:
    """Return (min_value, max_value) for all numeric cells in a CSV/TSV file.

    Raises ValueError if the file does not contain any numeric values.
    """
    delimiter = _detect_delimiter(path)

    with path.open(mode="r", encoding="utf-8", newline="") as f:
        reader = csv.reader(f, delimiter=delimiter)
        values_iter = _iter_numeric_values(reader)

        try:
            first_value = next(values_iter)
        except StopIteration as exc:
            raise ValueError(f"No numeric data found in {path}") from exc

        # Initialize min and max with the first numeric value
        min_value = max_value = first_value

        for value in values_iter:
            if value < min_value:
                min_value = value
            if value > max_value:
                max_value = value

    return min_value, max_value


def find_min_max_for_all_data_files(root: Path | str = Path(".")) -> Dict[str, Tuple[float, float]]:
    """Find min and max values for all CSV/TSV files in the given directory.

    Parameters
    ----------
    root:
        Directory to search for data files. Defaults to current directory.

    Returns
    -------
    Dict[str, Tuple[float, float]]
        A mapping from file name to a tuple of (min_value, max_value).
    """
    root_path = Path(root)

    data_files = sorted(root_path.glob("*.csv")) + sorted(root_path.glob("*.tsv"))

    results: Dict[str, Tuple[float, float]] = {}
    for path in data_files:
        try:
            min_value, max_value = find_min_max_in_file(path)
        except ValueError as exc:
            # If a file has no numeric data, we report and skip it.
            print(f"Skipping {path.name}: {exc}")
            continue
        results[path.name] = (min_value, max_value)

    return results


# Example usage (run this cell once the data files are in the same directory):
results = find_min_max_for_all_data_files()
for filename, (min_val, max_val) in results.items():
    print(f"{filename}: min = {min_val}, max = {max_val}")

Skipping file4.csv: No numeric data found in file4.csv
Skipping test.csv: No numeric data found in test.csv
file1.csv: min = 10.0, max = 80.0
file2.csv: min = -3.3, max = 500.0
file3.tsv: min = 10.0, max = 300.0


#### Question 2

Given this data structure consisting of a list of dictionaries, write a function that will write this data out to a file, where the column headers (in the first row) are based on the dictionary keys, and the values are flattened out to one row per dictionary (under the corresponding column header).

Note that not all dictionaries contain all the same keys, nor are the keys necessarily in the same order when present.

For "missing" values, your function should just write an empty string.

For example, given this `data`:

In [2]:
data = [
    {'a': '1_a', 'b': '1_b', 'c': '1_c'},
    {'c': '2_c', 'd': '2_d'},
    {'a': '3_a', 'c': '3_c', 'e': '3_e'}
]

```
a,b,c,d,e
1_a,1_b,1_c,,,
,,2_c,2_d,
3_a,,3_c,,3_e
```

The order of the columns and rows is not important - as long as they match up with respective column headers.

#### Question 2 – Solution

We want a reusable function that:

1. Accepts a **list of dictionaries** and an **output file path**.
2. Computes the **union of all keys** across all dictionaries.
3. Writes those keys as the **header row**.
4. Writes one row per dictionary. For keys that are missing in a particular dictionary, an **empty string** should appear in the CSV.

We can do this cleanly with `csv.DictWriter`:

* `fieldnames` will be the sorted union of all keys.
* Writing each dictionary with `writer.writerow(row)` automatically fills missing keys with empty strings.
* We add simple validation so an empty data list is handled explicitly.

Below is a best-practice implementation plus a small demo using the provided `data`.

In [3]:
from pathlib import Path
import csv
from typing import Iterable, Mapping, Any, List


def write_dicts_to_csv(
    rows: Iterable[Mapping[str, Any]],
    output_path: Path | str,
) -> None:
    """Write a list of dictionaries to a CSV file.

    Parameters
    ----------
    rows:
        An iterable of dictionaries. Keys will become column headers.
    output_path:
        File path to write the CSV to.

    Behaviour
    ---------
    * The set of headers is the union of all keys across all dictionaries.
    * Missing keys for a particular row are written as empty strings.
    """
    output_path = Path(output_path)

    # Materialize the rows so we can iterate multiple times safely
    row_list: List[Mapping[str, Any]] = list(rows)

    if not row_list:
        raise ValueError("Cannot write CSV: no data rows provided.")

    # Collect union of all keys
    all_keys = set()
    for row in row_list:
        all_keys.update(row.keys())

    # Sort keys to make the output deterministic and easy to test
    fieldnames = sorted(all_keys)

    with output_path.open(mode="w", encoding="utf-8", newline="") as f:
        writer = csv.DictWriter(f, fieldnames=fieldnames, extrasaction="ignore")
        writer.writeheader()
        for row in row_list:
            # DictWriter will automatically fill missing keys with ''
            writer.writerow(row)


# --- Demo using the provided `data` structure ---
data = [
    {"a": "1_a", "b": "1_b", "c": "1_c"},
    {"c": "2_c", "d": "2_d"},
    {"a": "3_a", "c": "3_c", "e": "3_e"},
]

output_file = Path("output.csv")
write_dicts_to_csv(data, output_file)

print(f"Wrote CSV to: {output_file.resolve()}")
print("\nFile contents:\n----------------")
print(output_file.read_text(encoding="utf-8"))

Wrote CSV to: D:\_Udemy_course_PRACTICE\Python_3_Fundamentals_Udemy_by_Fred_Baptiste\22_CSV_Module\06_Exercises\output.csv

File contents:
----------------
a,b,c,d,e
1_a,1_b,1_c,,
,,2_c,2_d,
3_a,,3_c,,3_e

