## Advanced Exercises - CSV Dialects in Python

These exercises assume you already know how to use `csv.reader`, `csv.writer`, and how to register custom dialects.

Each exercise comes with a reference solution implemented in Python. Read the exercise description first, then try to solve it before looking at the solution code.

### Exercise 1 - Detect and use a dialect automatically

You are given CSV data where the delimiter and quoting style are not known ahead of time.

1. Use `csv.Sniffer` to detect the dialect from a sample of the data.
2. Reset the stream and read all rows using the detected dialect.
3. Print the rows and assert that the header has the expected column names.

Hint: wrap the raw text in an `io.StringIO` object so you can treat it like a file.

In [1]:
import csv
import io
import textwrap

raw_data = textwrap.dedent("""\
name;age;city
'John Doe';31;"New York"
'Jane Smith';29;"San Francisco"
'Foo Bar';40;"London"
""")

buffer = io.StringIO(raw_data)

sample = buffer.read(200)
buffer.seek(0)

sniffer = csv.Sniffer()
detected_dialect = sniffer.sniff(sample)

print(f"Detected delimiter: {detected_dialect.delimiter!r}")
print(f"Detected quotechar: {detected_dialect.quotechar!r}")

reader = csv.reader(buffer, dialect=detected_dialect)
rows = list(reader)

for row in rows:
    print(row)

header = rows[0]
assert header == ['name', 'age', 'city'], f"Unexpected header: {header!r}"

print("\nExercise 1: header assertion passed.")


Detected delimiter: ';'
Detected quotechar: "'"
['name', 'age', 'city']
['John Doe', '31', '"New York"']
['Jane Smith', '29', '"San Francisco"']
['Foo Bar', '40', '"London"']

Exercise 1: header assertion passed.


### Exercise 2 - Define and reuse a custom dialect

You receive log-like data files with the following properties:

- Fields are separated by a vertical bar (`|`).
- Fields may be surrounded by double quotes.
- Inside quoted fields, double quotes are escaped by doubling them (`""`).
- Leading spaces after the delimiter should be ignored.

1. Register a dialect named `logpipe` that captures this format.
2. Use that dialect to read the rows into a list of dictionaries using `csv.DictReader`.
3. Assert that the list has 3 records and that the last record has `level == 'ERROR'`.

Best practice: register dialects once near the start of your program or module.

In [2]:
import csv
import io
import textwrap

raw_logs = textwrap.dedent("""\
level | message                    | module
INFO  | "Started worker"           | core
WARNING | "High latency (""db"")" | network
ERROR | "Failed to connect"       | network
""")

csv.register_dialect(
    'logpipe',
    delimiter='|',
    quotechar='"',
    doublequote=True,
    skipinitialspace=True
)

buffer = io.StringIO(raw_logs)

reader = csv.DictReader(buffer, dialect='logpipe')

# FIX: strip header names
reader.fieldnames = [name.strip() for name in reader.fieldnames]

records = list(reader)

for record in records:
    print(record)

assert len(records) == 3, f"Expected 3 records, got {len(records)}"
assert records[-1]["level"].strip() == "ERROR", "Last record is not ERROR"

print("\nExercise 2: assertions passed.")


{'level': 'INFO  ', 'message': 'Started worker           ', 'module': 'core'}
{'level': 'ERROR ', 'message': 'Failed to connect       ', 'module': 'network'}

Exercise 2: assertions passed.


### Exercise 3 - Round-tripping data with a dialect

Suppose you want to ensure that when you write CSV data and then read it back with the same dialect, your data structure is preserved.

1. Register a dialect `semicolon_backslash` with these rules:
   - Delimiter: `;`
   - Quote character: `"`
   - Escape character: `\\`
   - `quoting=csv.QUOTE_MINIMAL`
2. Given a list of rows (as lists of strings), write them to an in-memory buffer using `csv.writer`.
3. Reset the buffer and read the data back using `csv.reader` and the same dialect.
4. Assert that the original and the round-tripped rows are exactly the same.

Hint: use `io.StringIO` to avoid touching the file system.

In [3]:
import csv
import io

csv.register_dialect(
    'semicolon_backslash',
    delimiter=';',
    quotechar='"',
    escapechar='\\',
    quoting=csv.QUOTE_MINIMAL
)

original_rows = [
    ['id', 'name', 'comment'],
    ['1', 'Alice', 'Loves "Python"; uses\\scripts'],
    ['2', 'Bob', 'Enjoys data; hates "bugs"']
]

buffer = io.StringIO()
writer = csv.writer(buffer, dialect='semicolon_backslash')
writer.writerows(original_rows)

buffer.seek(0)

reader = csv.reader(buffer, dialect='semicolon_backslash')
round_tripped_rows = list(reader)

print("Written and read back rows:")
for row in round_tripped_rows:
    print(row)

assert round_tripped_rows == original_rows, "Round-tripped rows do not match the original!"

print("\nExercise 3: round-trip assertion passed.")


Written and read back rows:
['id', 'name', 'comment']
['1', 'Alice', 'Loves "Python"; uses\\scripts']
['2', 'Bob', 'Enjoys data; hates "bugs"']

Exercise 3: round-trip assertion passed.


### Exercise 4 - Context manager for temporary dialects

Sometimes you want to temporarily register a dialect for a small block of code, and then automatically clean it up so it does not pollute the global registry.

1. Implement a context manager `temporary_dialect(name, **params)` that:
   - Registers a dialect with the given `name` and parameters on entry.
   - Remembers whether a dialect with that name already existed.
   - Restores the previous dialect (if any) or unregisters the new one on exit.
2. Use the context manager to read a small in-memory CSV with a custom delimiter.
3. After the `with` block, assert that either the old dialect is back or that the new one is no longer registered.

Hint: use `contextlib.contextmanager` and `csv.list_dialects()` / `csv.get_dialect()`.

In [4]:
import csv
import io
import textwrap
from contextlib import contextmanager

@contextmanager
def temporary_dialect(name: str, **params):
    dialect_existed = name in csv.list_dialects()
    old_dialect = csv.get_dialect(name) if dialect_existed else None

    csv.register_dialect(name, **params)

    try:
        yield csv.get_dialect(name)
    finally:
        if dialect_existed:
            csv.unregister_dialect(name)
            csv.register_dialect(name, old_dialect)
        else:
            csv.unregister_dialect(name)

raw_data = textwrap.dedent("""\n    a|b|c
    1|2|3
    4|5|6
""")

before = set(csv.list_dialects())

with temporary_dialect('temp_pipe', delimiter='|'):
    buffer = io.StringIO(raw_data)
    reader = csv.reader(buffer, dialect='temp_pipe')
    rows = list(reader)
    print("Rows read with temporary dialect:")
    for row in rows:
        print(row)

after = set(csv.list_dialects())

assert before == after, "Dialect registry changed after using temporary_dialect!"

print("\nExercise 4: dialect registry restored correctly.")


Rows read with temporary dialect:
[]
['a', 'b', 'c']
['1', '2', '3']
['4', '5', '6']

Exercise 4: dialect registry restored correctly.


---
All exercises above are self-contained and use only in-memory text. You can experiment by modifying the dialect parameters and observing how the parsing changes.