### Project 5

#### Goal 1

For this goal, you are given a number of CSV files, each of which have their first row with the field name.

You goal is to create a context manager that you can use to produce the data from each file in a named tuple with field names corresponding to the  header row field names.

You should use the `csv` module's `reader` function to help with parsing the data.

Your context manager should be generic in the sense that it should just need the file name, no other configuration or hardcoded functionality is required. You do not need to worry about data types for this goal - just return every field as a string.

In addition, your context manager should produce lazy iterators.

Implement this using a class that implements the context manager protocol

#### Goal 2

The goal is to reproduce the work you did in Goal 1, but using a generator function and the `contextlib` `contextmanager` decorator.

### Notes

The files included with this project are:
* `cars.csv`
* `personal_info.csv`

In addition you might find the following useful.

##### Reading and rewinding data from a File

The file object supports reading data by specifying the amount of data we want to read, and repositioning the "read head" using the `seek` function.

Let's take a look:

In [1]:
with open('cars.csv') as f:
    print('---', f.read(100))
    print('---', f.read(100))
    f.seek(0)
    print('---', f.read(100))
    

--- Car;MPG;Cylinders;Displacement;Horsepower;Weight;Acceleration;Model;Origin
Chevrolet Chevelle Malibu
--- ;18.0;8;307.0;130.0;3504.;12.0;70;US
Buick Skylark 320;15.0;8;350.0;165.0;3693.;11.5;70;US
Plymouth 
--- Car;MPG;Cylinders;Displacement;Horsepower;Weight;Acceleration;Model;Origin
Chevrolet Chevelle Malibu


As you can see, we could read the file calling the `read` method - this reads data and advances the "read head". We can then rewind and start again - either reading directly, or even just iterating through the rows using the iterator:

In [2]:
from itertools import islice

with open('cars.csv') as f:
    print('---', f.read(100))
    print('---', f.read(100))
    print('--------------------')
    print('rewinding to 0...')
    f.seek(0)
    for row in islice(f, 5):
        print(row, end='')

--- Car;MPG;Cylinders;Displacement;Horsepower;Weight;Acceleration;Model;Origin
Chevrolet Chevelle Malibu
--- ;18.0;8;307.0;130.0;3504.;12.0;70;US
Buick Skylark 320;15.0;8;350.0;165.0;3693.;11.5;70;US
Plymouth 
--------------------
rewinding to 0...
Car;MPG;Cylinders;Displacement;Horsepower;Weight;Acceleration;Model;Origin
Chevrolet Chevelle Malibu;18.0;8;307.0;130.0;3504.;12.0;70;US
Buick Skylark 320;15.0;8;350.0;165.0;3693.;11.5;70;US
Plymouth Satellite;18.0;8;318.0;150.0;3436.;11.0;70;US
AMC Rebel SST;16.0;8;304.0;150.0;3433.;12.0;70;US


##### Sniffing the CSV dialect

The dialect of a CSV file refers to some of the specifics used to define data in a CSV file. The separators can be different (for example some failes use a comma, some use a semi-colon, some use a tab, etc).

Also, as we have seen before, a field is also sometimes delimited using quotes, or double quotes, or maybe some entirely different character.

When we have to deal with files that may be encoded using different dialects it can require quite a bit of work to determine what those specifics are. This is were the `Sniffer` class from the `csv` module can be useful. By providing it a sample fo the CSV file, it can analyze it and determine a best guess as to the specific dialect that was used. We can then use that dialect when we use the `csv.reader` function.

Let's see how to use it with one of our files: `personal_info.csv`:

In [3]:
import csv

with open('personal_info.csv') as f:
    sample = f.read(2000)
    dialect = csv.Sniffer().sniff(sample)
print(vars(dialect))

{'__module__': 'csv', '_name': 'sniffed', 'lineterminator': '\r\n', 'quoting': 0, '__doc__': None, 'doublequote': False, 'delimiter': ',', 'quotechar': '"', 'skipinitialspace': False}


We can now use this dialect to open the csv reader:

In [4]:
with open('personal_info.csv') as f:
    reader = csv.reader(f, dialect)
    for row in islice(reader, 5):
        print(row)

['ssn', 'first_name', 'last_name', 'gender', 'language']
['100-53-9824', 'Sebastiano', 'Tester', 'Male', 'Icelandic']
['101-71-4702', 'Cayla', 'MacDonagh', 'Female', 'Lao']
['101-84-0356', 'Nomi', 'Lipprose', 'Female', 'Yiddish']
['104-22-0928', 'Justinian', 'Kunzelmann', 'Male', 'Dhivehi']


In [5]:
# Goal 1

In [6]:
import csv
from collections import namedtuple

In [7]:
class FileParser:
    def __init__(self, f_name):
        self.f_name = f_name
        
    def __enter__(self):
        f = open(self.f_name)
        dialect = csv.Sniffer().sniff(f.read(1024))
        f.seek(0)
        self.csvreader = csv.reader(f, dialect)
        fieldnames = ['_'.join(fieldname.lower().split()) for fieldname in next(self.csvreader)]
        self.Record = namedtuple('Record', fieldnames, rename=True)
        return self
    
    def __exit__(self, exc_type, exc_value, traceback):
        f.close()
    
    def __iter__(self):
        return self
    
    def __next__(self):
        return self.Record(*next(self.csvreader))

In [8]:
from itertools import islice

with FileParser('cars.csv') as data:
    for row in islice(data, 10):
        print(row)

Record(car='Chevrolet Chevelle Malibu', mpg='18.0', cylinders='8', displacement='307.0', horsepower='130.0', weight='3504.', acceleration='12.0', model='70', origin='US')
Record(car='Buick Skylark 320', mpg='15.0', cylinders='8', displacement='350.0', horsepower='165.0', weight='3693.', acceleration='11.5', model='70', origin='US')
Record(car='Plymouth Satellite', mpg='18.0', cylinders='8', displacement='318.0', horsepower='150.0', weight='3436.', acceleration='11.0', model='70', origin='US')
Record(car='AMC Rebel SST', mpg='16.0', cylinders='8', displacement='304.0', horsepower='150.0', weight='3433.', acceleration='12.0', model='70', origin='US')
Record(car='Ford Torino', mpg='17.0', cylinders='8', displacement='302.0', horsepower='140.0', weight='3449.', acceleration='10.5', model='70', origin='US')
Record(car='Ford Galaxie 500', mpg='15.0', cylinders='8', displacement='429.0', horsepower='198.0', weight='4341.', acceleration='10.0', model='70', origin='US')
Record(car='Chevrolet Im

In [9]:
with FileParser('personal_info.csv') as data:
    for row in data:
        print(row)

Record(ssn='100-53-9824', first_name='Sebastiano', last_name='Tester', gender='Male', language='Icelandic')
Record(ssn='101-71-4702', first_name='Cayla', last_name='MacDonagh', gender='Female', language='Lao')
Record(ssn='101-84-0356', first_name='Nomi', last_name='Lipprose', gender='Female', language='Yiddish')
Record(ssn='104-22-0928', first_name='Justinian', last_name='Kunzelmann', gender='Male', language='Dhivehi')
Record(ssn='104-84-7144', first_name='Claudianus', last_name='Brixey', gender='Male', language='Afrikaans')
Record(ssn='105-27-5541', first_name='Federico', last_name='Aggett', gender='Male', language='Chinese')
Record(ssn='105-85-7486', first_name='Angelina', last_name='McAvey', gender='Female', language='Punjabi')
Record(ssn='105-91-5022', first_name='Moselle', last_name='Apfel', gender='Female', language='Latvian')
Record(ssn='105-91-7777', first_name='Audi', last_name='Roach', gender='Female', language='Estonian')
Record(ssn='106-35-1938', first_name='Mackenzie', las

Record(ssn='190-53-4517', first_name='Hadlee', last_name='Rohan', gender='Male', language='Swahili')
Record(ssn='191-15-3194', first_name='Calhoun', last_name='Mugford', gender='Male', language='Thai')
Record(ssn='192-54-0695', first_name='Trev', last_name='Ruppeli', gender='Male', language='Maori')
Record(ssn='192-67-3201', first_name='Edvard', last_name='Cattel', gender='Male', language='Hebrew')
Record(ssn='194-76-5816', first_name='Jacquelynn', last_name='Shurey', gender='Female', language='Amharic')
Record(ssn='195-23-5731', first_name='Talyah', last_name='Canny', gender='Female', language='Portuguese')
Record(ssn='196-69-3796', first_name='Marijo', last_name='Tester', gender='Female', language='Oriya')
Record(ssn='198-70-6443', first_name='Judah', last_name='Duggan', gender='Male', language='Macedonian')
Record(ssn='199-45-0804', first_name='Consolata', last_name='Pigram', gender='Female', language='Guarana')
Record(ssn='199-54-4639', first_name='Ellene', last_name='Dowrey', gend

In [10]:
# Goal 2

In [11]:
from contextlib import contextmanager

In [12]:
@contextmanager
def parsed_data(f_name):
    f = open(f_name)
    try:
        dialect = csv.Sniffer().sniff(f.read(1024))
        f.seek(0)
        csvreader = csv.reader(f, dialect)
        fieldnames = ['_'.join(fieldname.lower().split()) for fieldname in next(csvreader)]
        Record = namedtuple('Record', fieldnames, rename=True)
        yield (Record(*row) for row in csvreader)
    finally:
        f.close()

In [13]:
f_names = 'cars.csv', 'personal_info.csv'
for f_name in f_names:
    with parsed_data(f_name) as data:
        for row in islice(data, 5):
            print(row)
    print('-------------------')

Record(car='Chevrolet Chevelle Malibu', mpg='18.0', cylinders='8', displacement='307.0', horsepower='130.0', weight='3504.', acceleration='12.0', model='70', origin='US')
Record(car='Buick Skylark 320', mpg='15.0', cylinders='8', displacement='350.0', horsepower='165.0', weight='3693.', acceleration='11.5', model='70', origin='US')
Record(car='Plymouth Satellite', mpg='18.0', cylinders='8', displacement='318.0', horsepower='150.0', weight='3436.', acceleration='11.0', model='70', origin='US')
Record(car='AMC Rebel SST', mpg='16.0', cylinders='8', displacement='304.0', horsepower='150.0', weight='3433.', acceleration='12.0', model='70', origin='US')
Record(car='Ford Torino', mpg='17.0', cylinders='8', displacement='302.0', horsepower='140.0', weight='3449.', acceleration='10.5', model='70', origin='US')
-------------------
Record(ssn='100-53-9824', first_name='Sebastiano', last_name='Tester', gender='Male', language='Icelandic')
Record(ssn='101-71-4702', first_name='Cayla', last_name='M

In [14]:
for f_name in f_names:
    with parsed_data(f_name) as data:
        for row in data:
            print(row)
    print('-------------------')

Record(car='Chevrolet Chevelle Malibu', mpg='18.0', cylinders='8', displacement='307.0', horsepower='130.0', weight='3504.', acceleration='12.0', model='70', origin='US')
Record(car='Buick Skylark 320', mpg='15.0', cylinders='8', displacement='350.0', horsepower='165.0', weight='3693.', acceleration='11.5', model='70', origin='US')
Record(car='Plymouth Satellite', mpg='18.0', cylinders='8', displacement='318.0', horsepower='150.0', weight='3436.', acceleration='11.0', model='70', origin='US')
Record(car='AMC Rebel SST', mpg='16.0', cylinders='8', displacement='304.0', horsepower='150.0', weight='3433.', acceleration='12.0', model='70', origin='US')
Record(car='Ford Torino', mpg='17.0', cylinders='8', displacement='302.0', horsepower='140.0', weight='3449.', acceleration='10.5', model='70', origin='US')
Record(car='Ford Galaxie 500', mpg='15.0', cylinders='8', displacement='429.0', horsepower='198.0', weight='4341.', acceleration='10.0', model='70', origin='US')
Record(car='Chevrolet Im

Record(ssn='418-72-1888', first_name='Roddy', last_name='Worman', gender='Male', language='Bosnian')
Record(ssn='418-99-6659', first_name='Grenville', last_name='Gulliver', gender='Male', language='Ndebele')
Record(ssn='420-46-7789', first_name='Steffie', last_name='Britto', gender='Female', language='New Zealand Sign Language')
Record(ssn='420-52-3813', first_name='Arnaldo', last_name='Balsdone', gender='Male', language='Zulu')
Record(ssn='421-01-2511', first_name='Mikaela', last_name='Andrivot', gender='Female', language='English')
Record(ssn='421-13-9794', first_name='Minne', last_name='Measey', gender='Female', language='Danish')
Record(ssn='421-20-3448', first_name='Alie', last_name='Gerauld', gender='Female', language='Khmer')
Record(ssn='421-88-7316', first_name='Cherie', last_name='Yeudall', gender='Female', language='Dutch')
Record(ssn='422-67-4000', first_name='Corby', last_name='Boyington', gender='Male', language='Sotho')
Record(ssn='423-62-1820', first_name='Ruddy', last_n

Record(ssn='705-15-6989', first_name='Minta', last_name='Beecroft', gender='Female', language='Maltese')
Record(ssn='705-98-0229', first_name='Vic', last_name='Elsley', gender='Male', language='Czech')
Record(ssn='709-73-6600', first_name='Stillman', last_name='Fairbrass', gender='Male', language='German')
Record(ssn='714-55-2111', first_name='Elysee', last_name='Bardsley', gender='Female', language='Lao')
Record(ssn='715-89-3174', first_name='Nicholas', last_name='Crack', gender='Male', language='Greek')
Record(ssn='717-91-7919', first_name='Ches', last_name='Rablan', gender='Male', language='Thai')
Record(ssn='718-89-7490', first_name='Jessamyn', last_name='Hamill', gender='Female', language='Icelandic')
Record(ssn='718-91-6110', first_name='Emory', last_name='Ambrus', gender='Male', language='Papiamento')
Record(ssn='719-05-6621', first_name='Susanne', last_name='Dilger', gender='Female', language='Punjabi')
Record(ssn='719-96-8751', first_name='Sallie', last_name='Gosnold', gender=