## Advanced Exercises – Reading CSV Files

The following exercises build on the `csv` module and focus on:

- Using `csv.DictReader`
- Converting string fields to proper Python types
- Handling different delimiters and quote characters
- Skipping comments and empty lines

Try to solve each exercise **before** looking at the provided solution.

In [7]:
from __future__ import annotations

import io
import csv


### Exercise 1 – Reading into Dictionaries with `DictReader`

We have the following CSV text (same structure as `actors.csv`), but provided as a string instead of a file:

```text
First Name,Last Name,DOB,Sketches
John,Cleese,10/27/39,"The Cheese Shop, Ministry of Silly Walks, It's the Arts"
Eric,Idle,3/29/43,"The Cheese Shop, Nudge Nudge, ""Spam"""
Peter,O'Toole,8/2/32,Lawrence of Arabia
```

Write a function `read_actors(csv_text: str) -> list[dict]` that:

- Uses `io.StringIO` to create a file-like object from the string
- Uses `csv.DictReader` to read the rows
- Returns a `list` of dictionaries, one per row (excluding the header)

Each dictionary should have the keys: `'First Name'`, `'Last Name'`, `'DOB'`, `'Sketches'`.

Do **not** hardcode the field names – let `DictReader` pick them up from the header row.

In [8]:
def read_actors(csv_text: str) -> list[dict]:
    '''YOUR CODE HERE

    Hint:
    - Use io.StringIO(csv_text) to get a file-like object
    - Pass that object to csv.DictReader
    - Convert the DictReader iterator to a list
    '''
    raise NotImplementedError()


#### Exercise 1 – Solution

In [9]:
def read_actors(csv_text: str) -> list[dict]:
    """Read actor records from CSV text using csv.DictReader.

    Returns a list of dictionaries with keys taken from the header row.
    """
    f = io.StringIO(csv_text)
    reader = csv.DictReader(f)
    return list(reader)


sample_actors_csv = (
    "First Name,Last Name,DOB,Sketches\n"
    "John,Cleese,10/27/39,\"The Cheese Shop, Ministry of Silly Walks, It's the Arts\"\n"
    "Eric,Idle,3/29/43,\"The Cheese Shop, Nudge Nudge, \"\"Spam\"\"\"\n"
    "Peter,O'Toole,8/2/32,Lawrence of Arabia\n"
)

actors = read_actors(sample_actors_csv)
for actor in actors:
    print(actor)


{'First Name': 'John', 'Last Name': 'Cleese', 'DOB': '10/27/39', 'Sketches': "The Cheese Shop, Ministry of Silly Walks, It's the Arts"}
{'First Name': 'Eric', 'Last Name': 'Idle', 'DOB': '3/29/43', 'Sketches': 'The Cheese Shop, Nudge Nudge, "Spam"'}
{'First Name': 'Peter', 'Last Name': "O'Toole", 'DOB': '8/2/32', 'Sketches': 'Lawrence of Arabia'}


### Exercise 2 – Converting Field Types

Consider the following CSV text where all fields are strings, but we really want
`age` as an `int` and `height_m` as a `float`:

```text
name,age,height_m
Alice,30,1.65
Bob,  41 ,1.80
Charlie,not_available,1.75
Diana,25,invalid
```

Write a function `read_people_with_types(csv_text: str) -> list[dict]` that:

- Uses `csv.DictReader`
- Strips whitespace around the `age` and `height_m` fields before converting
- Converts `age` to an `int` and `height_m` to a `float`
- **Skips** any row where conversion fails (for either `age` or `height_m`)

The returned list should only contain rows that were successfully converted, with
the dictionary values already typed (`int` and `float`).

Raises a `ValueError` with a helpful message if no rows could be parsed successfully at all.

In [10]:
def read_people_with_types(csv_text: str) -> list[dict]:
    '''YOUR CODE HERE

    Hint:
    - Use csv.DictReader
    - For each row, try to convert age and height_m
    - If conversion fails, skip the row (continue)
    - At the end, if the result list is empty, raise ValueError
    '''
    raise NotImplementedError()


#### Exercise 2 – Solution

In [11]:
def read_people_with_types(csv_text: str) -> list[dict]:
    """Read people from CSV text and convert age/height_m to proper types.

    - age → int
    - height_m → float

    Rows that cannot be converted are skipped.
    Raises ValueError if *no* valid rows are found.
    """
    f = io.StringIO(csv_text)
    reader = csv.DictReader(f)
    result: list[dict] = []

    for row in reader:
        try:
            age = int(row["age"].strip())
            height = float(row["height_m"].strip())
        except (KeyError, ValueError, AttributeError):
            # skip rows with missing or invalid data
            continue

        result.append({
            "name": row.get("name"),
            "age": age,
            "height_m": height,
        })

    if not result:
        raise ValueError("No valid rows were found in CSV text.")

    return result


sample_people_csv = (
    "name,age,height_m\n"
    "Alice,30,1.65\n"
    "Bob,  41 ,1.80\n"
    "Charlie,not_available,1.75\n"
    "Diana,25,invalid\n"
)

people = read_people_with_types(sample_people_csv)
for p in people:
    print(p, type(p["age"]), type(p["height_m"]))


{'name': 'Alice', 'age': 30, 'height_m': 1.65} <class 'int'> <class 'float'>
{'name': 'Bob', 'age': 41, 'height_m': 1.8} <class 'int'> <class 'float'>


### Exercise 3 – Custom Delimiter and Quote Character

Not all CSV files use commas and double quotes. Suppose we have a file that uses:

- `;` (semicolon) as the field delimiter
- `'` (single quote) as the quote character

Example:

```text
first;last;comment
John;Cleese;'Loves "The Cheese Shop" sketch'
Eric;Idle;'Writes songs; also does comedy'
```

Write a function `read_semicolon_csv(csv_text: str) -> list[list[str]]` that:

- Uses `csv.reader`
- Configures `delimiter=';'` and `quotechar="'"`
- Returns a list of rows, where each row is a list of strings

Check that embedded commas and semicolons inside quoted fields are handled correctly.

In [12]:
def read_semicolon_csv(csv_text: str) -> list[list[str]]:
    '''YOUR CODE HERE

    Hint:
    - Use io.StringIO(csv_text)
    - Configure csv.reader with delimiter=';' and quotechar="'"
    - Convert the iterator of rows to a list
    '''
    raise NotImplementedError()


#### Exercise 3 – Solution

In [13]:
def read_semicolon_csv(csv_text: str) -> list[list[str]]:
    """Read a semicolon-separated file with single-quoted fields.

    Returns a list of rows; each row is a list of strings.
    """
    f = io.StringIO(csv_text)
    reader = csv.reader(f, delimiter=';', quotechar="'")
    return list(reader)


sample_semicolon_csv = (
    "first;last;comment\n"
    "John;Cleese;'Loves \"The Cheese Shop\" sketch'\n"
    "Eric;Idle;'Writes songs; also does comedy'\n"
)

rows = read_semicolon_csv(sample_semicolon_csv)
for r in rows:
    print(r)


['first', 'last', 'comment']
['John', 'Cleese', 'Loves "The Cheese Shop" sketch']
['Eric', 'Idle', 'Writes songs; also does comedy']


### Exercise 4 – Skipping Comments and Empty Lines

Sometimes CSV files contain:

- Comment lines starting with `#`
- Completely empty lines

Example:

```text
# This is a comment and should be ignored
first,last
John,Cleese

# Another comment
Eric,Idle
```

Write a function `read_csv_skip_comments(csv_text: str) -> list[list[str]]` that:

- Uses `csv.reader` with the default comma delimiter
- Skips any line that is empty *after* stripping whitespace
- Skips any line that starts with `#` after stripping whitespace
- Returns the remaining rows as a list of lists of strings

Hint: you can pre-filter lines before feeding them into `csv.reader` by creating
a generator or using a list comprehension.

In [14]:
def read_csv_skip_comments(csv_text: str) -> list[list[str]]:
    '''YOUR CODE HERE

    Hint:
    - Wrap StringIO(csv_text) in a generator that yields only non-comment
      and non-empty lines
    - Pass that generator to csv.reader
    - Collect and return the rows in a list
    '''
    raise NotImplementedError()


#### Exercise 4 – Solution

In [15]:
def read_csv_skip_comments(csv_text: str) -> list[list[str]]:
    """Read CSV text, skipping comment and empty lines.

    - Lines starting with '#' after stripping are ignored
    - Empty lines (after stripping) are ignored
    """
    f = io.StringIO(csv_text)

    def filtered_lines():
        for line in f:
            stripped = line.strip()
            if not stripped:
                # skip empty lines
                continue
            if stripped.startswith('#'):
                # skip comments
                continue
            yield line

    reader = csv.reader(filtered_lines())
    return list(reader)


sample_with_comments = (
    "# This is a comment and should be ignored\n"
    "first,last\n"
    "John,Cleese\n"
    "\n"
    "# Another comment\n"
    "Eric,Idle\n"
)

rows = read_csv_skip_comments(sample_with_comments)
for r in rows:
    print(r)


['first', 'last']
['John', 'Cleese']
['Eric', 'Idle']
