### Dialects

We saw that we have different ways CSV files can be formatted - from field separators and field delimiters to how field delimiters inside a field need to be doubled up or escaped somehow.

These settings, grouped together, form a **dialect**.

In fact, Python has a few dialects pre-defined for us:

In [1]:
import csv

In [2]:
csv.list_dialects()

['excel', 'excel-tab', 'unix']

You can also create your own dialects, and you may want to do that if you need to customize a lot of these parameters and want to re-use the same settings multiple times in your code. This way you don't have to constantly respecify them every time you load a similarly formatted CSV file.

For example, suppose, consider this file: 

In [3]:
with open('actors.pdv') as f:
    for row in f:
        print(row.strip())

FIRST_NAME| LAST_NAME| DOB| SKETCHES
John|Cleese| 10/27/39| 'The Cheese Shop, Ministry of Silly Walks, It\'s the Arts'
Eric| Idle| 3/29/43| 'The Cheese Shop, Nudge Nudge, "Spam"'
Peter| 'O\'Toole'| 8/2/32| Lawrence of Arabia


Notice a few things:
1. the fields are delimited by pipe characters (`|`)
2. there is white space immediately following each separator
3. single quotes are used to optionally delimit fields
4. single quotes inside a field are escaped, not by doubling up the single quote, but by prefixing it with a `\`

So, we'll need to customize our settings for importing such a file.

Before we do that, we are going to need to see how we can create a string literal that contains an actual `\` character - remember that in Python `\` in a string literal "escapes" the next character - i.e. gives it special meaning:

In [4]:
print("1\t2\t3\t4")

1	2	3	4


As you can see `\t` is interpreted as a tab character. So, we cannot just use `\` by itself in a string, to actually define a `\` character in a string literal we can use `\\`:

In [5]:
print('\\')

\


Ok, now that we have that covered, let's go ahead and define all the settings we need in order to parse that file:

In [6]:
with open('actors.pdv') as f:
    reader = csv.reader(
        f, 
        delimiter='|', 
        quotechar = "'", 
        skipinitialspace=True, 
        escapechar="\\"
    )
    for row in reader:
        print(row)

['FIRST_NAME', 'LAST_NAME', 'DOB', 'SKETCHES']
['John', 'Cleese', '10/27/39', "The Cheese Shop, Ministry of Silly Walks, It's the Arts"]
['Eric', 'Idle', '3/29/43', 'The Cheese Shop, Nudge Nudge, "Spam"']
['Peter', "O'Toole", '8/2/32', 'Lawrence of Arabia']


Now if I have to parse another file with the same format, I'm going to have to re-type all those arguments, again and again.

Instead, we can store all these settings in a dialect once, somewhere at the start of our program, and just specify that dialect when we need it.

To create a new dialect, we have to **register** a new dialect as follows:

In [7]:
csv.register_dialect(
    'pdv', 
    delimiter='|', 
    quotechar = "'", 
    skipinitialspace=True, 
    escapechar="\\"
)

And we can see our dialect has been registered:

In [8]:
csv.list_dialects()

['excel', 'excel-tab', 'unix', 'pdv']

And now we can use that dialect instead of specifying the parsing parameters individually:

In [9]:
with open('actors.pdv') as f:
    reader = csv.reader(f, dialect='pdv')
    for row in reader:
        print(row)

['FIRST_NAME', 'LAST_NAME', 'DOB', 'SKETCHES']
['John', 'Cleese', '10/27/39', "The Cheese Shop, Ministry of Silly Walks, It's the Arts"]
['Eric', 'Idle', '3/29/43', 'The Cheese Shop, Nudge Nudge, "Spam"']
['Peter', "O'Toole", '8/2/32', 'Lawrence of Arabia']
