# Structural pattern matching for data science

An exploration of potential use cases for structural pattern matching, a feature introduced in Python 3.10, for data science code. Time will tell if these or other use cases will be adapted by the community. Stay tuned for updates here. 

## Matching a string to parse malformatted CSV format

Messy data is everywhere. Let's say we have a string, `bad_csv`, in malformatted CSV format which, for example, could have been read from a file and looks like this:

In [1]:
bad_csv = """
0,1,2
1,2,3
1,2
0
1
"""

We now want to convert `bad_csv` to a rectangular list of lists according to the following rules:

- keep lines with three values
- for lines with two values only, add a `None`
- skip empty lines
- for lines with one value only that is a 1, add 2 and 3
- for lines with one value only that is not a 1, add None and None

This can easily be translated into a structural pattern matching expression:

In [2]:
values = []
for line in bad_csv.split("\n"):
    match line.split(","):
        case [x, y, z]:
            values.append([x, y, z])
        case [x, y]:
            values.append([x, y, None])
        case [""]:
            continue
        case ["1"]:
             values.append([1, 2, 3])
        case [x]:
            values.append([x, None, None])
        case _:  # matches if nothing above matches
            raise Exception("This should not happen. We want to handle every case explicitely.")

Implementing the above with if-else blocks would involve multiple comparisons to `len(line.split(","))` and potentially incorporate nested if-blocks. This would be harder to understand at first glance, especially if the parsing rules should get more complicated.

In [3]:
values

[['0', '1', '2'],
 ['1', '2', '3'],
 ['1', '2', None],
 ['0', None, None],
 [1, 2, 3]]

Note that the order of the cases matters and that there is no fall-through once a case matches.
Compare:

In [4]:
match ["a"]:
    case [x]:
        print("x")
    case ["a"]:
        print("a")

x


to this:

In [5]:
match ["a"]:
    case ["a"]:
        print("a")
    case [x]:
        print("x")

a


Before going all-in on structural pattern matching, note that the the popular code formatter **black currently does not support reformating structural pattern matching and fails with an error when encountering such a construct**.

## Matching a dataclass

In [6]:
from dataclasses import dataclass

# TODO

## TODO further use cases

- matching a REST response
- matching a machine learning model
- matching a pandas DataFrame

## Further reading

[PEP 636 -- Structural Pattern Matching: Tutorial](https://www.python.org/dev/peps/pep-0636/)