### Project 5

#### Project Setup

In this projeect you are provided two CSV files:
- cars.csv
- personal_info.csv

The first row contains the field names

The basic goal will be to create a context manager that only requires the file name and provides us an iterator we can use to iterate over the data in those files

The iterator should yield named tuples with named based on the header row in the CSV file

For simplicity, we assume all fields are just strings

#### Goal 1

For this goal, implement the context manager using a context manager class

ie, a class that implements the context manager protocol

In [None]:
__enter__ __exit__

Make sure that your iterator uses lazy evaluation

If you can, try to create a single class that implements both the **context manager** protocol and the **iterator** protocol

#### Goal 2

For this goal, re-implement what you did in Goal 1, but usign a generator function instead

You'll have to use the @contextmanager from the contextlib module

##### Tips

File objext implement the iterator protocol:

In [None]:
with open(f_name) as f:
    for row in f:
        print(row)

But file objects also support just reading data using the read function. We specify how much of the file to read (that can span multiple rows)

When we do this, a "read head" is maintained -> We can reposition this read head -> seek()

In [None]:
with open(f_name) as f:
    print(f.read(100)) # reads the first 100 characters, read head is now at 100 (read head starts at 0)
    print(f.read(100)) # reads the next 100 characters, read head is now at 200
    f.seek(0) # the read head is now at 0 (moves the read head back to the begininning of file)

CSV files can be read using csv.reader, but CSV files can be written in different "styles" -> dialects. ie

john,cleese,42 vs john;cleese;42 vs john|cleese|42 vs john\tcleese\t42 vs "john","cleese","42" vs 'john';'cleese';'42' etc...

The csv module hasa *Sniffer* class we can use to auto-determine the specific dialect
- need to provide it a sample of the csv file   
ie:

In [None]:
with open(f_name) as f:
    sample = f.read(2000)
    dialect = csv.Sniffer().sniff(sample)
    
with open(f_name) as f:
    reader = csv.reader(f, dialect)

This way we can be a little more generic wrt the types of csv files we can handle

#### Goal 1 Solution

In [2]:
f_names = 'cars.csv', 'personal_info.csv'

In [3]:
for f_name in f_names:
    with open(f_name) as f:
        print(next(f), end='')
        print(next(f), end='')
    print('\n-------------------------------------')

Car;MPG;Cylinders;Displacement;Horsepower;Weight;Acceleration;Model;Origin
Chevrolet Chevelle Malibu;18.0;8;307.0;130.0;3504.;12.0;70;US

-------------------------------------
ssn,first_name,last_name,gender,language
100-53-9824,Sebastiano,Tester,Male,Icelandic

-------------------------------------


In [4]:
import csv
from itertools import islice

with open(f_names[0]) as f:
    dialect = csv.Sniffer().sniff(f.read(1000))
print(vars(dialect))

{'__module__': 'csv', '_name': 'sniffed', 'lineterminator': '\r\n', 'quoting': 0, '__doc__': None, 'doublequote': False, 'delimiter': ';', 'quotechar': '"', 'skipinitialspace': False}


In [5]:
with open(f_names[1]) as f:
    dialect = csv.Sniffer().sniff(f.read(1000))
print(vars(dialect))

{'__module__': 'csv', '_name': 'sniffed', 'lineterminator': '\r\n', 'quoting': 0, '__doc__': None, 'doublequote': False, 'delimiter': ',', 'quotechar': '"', 'skipinitialspace': False}


In [6]:
def get_dialect(f_name):
    with open(f_name) as f:
        return csv.Sniffer().sniff(f.read(1000))

In [7]:
from collections import namedtuple

In [11]:
class FileParser:
    def __init__(self, f_name):
        self.f_name = f_name
        
    def __enter__(self):
        self._f = open(self.f_name, 'r')
        self._reader = csv.reader(self._f, get_dialect(self.f_name))
        headers = map(lambda s: s.lower(), next(self._reader))
        self._nt = namedtuple('Data', headers)
        return self
    
    def __exit__(self, exc_type, exc_value, exc_tb):
        self._f.close()
        return False
    
    def __iter__(self):
        return self
    
    def __next__(self):
        if self._f.closed:
            raise StopIteration
        else:
            return self._nt(*next(self._reader))

In [12]:
with FileParser('cars.csv') as data:
    for row in islice(data, 10):
        print(row)

Data(car='Chevrolet Chevelle Malibu', mpg='18.0', cylinders='8', displacement='307.0', horsepower='130.0', weight='3504.', acceleration='12.0', model='70', origin='US')
Data(car='Buick Skylark 320', mpg='15.0', cylinders='8', displacement='350.0', horsepower='165.0', weight='3693.', acceleration='11.5', model='70', origin='US')
Data(car='Plymouth Satellite', mpg='18.0', cylinders='8', displacement='318.0', horsepower='150.0', weight='3436.', acceleration='11.0', model='70', origin='US')
Data(car='AMC Rebel SST', mpg='16.0', cylinders='8', displacement='304.0', horsepower='150.0', weight='3433.', acceleration='12.0', model='70', origin='US')
Data(car='Ford Torino', mpg='17.0', cylinders='8', displacement='302.0', horsepower='140.0', weight='3449.', acceleration='10.5', model='70', origin='US')
Data(car='Ford Galaxie 500', mpg='15.0', cylinders='8', displacement='429.0', horsepower='198.0', weight='4341.', acceleration='10.0', model='70', origin='US')
Data(car='Chevrolet Impala', mpg='14

In [14]:
with FileParser('personal_info.csv') as data:
    for row in islice(data, 10):
        print(row)

Data(ssn='100-53-9824', first_name='Sebastiano', last_name='Tester', gender='Male', language='Icelandic')
Data(ssn='101-71-4702', first_name='Cayla', last_name='MacDonagh', gender='Female', language='Lao')
Data(ssn='101-84-0356', first_name='Nomi', last_name='Lipprose', gender='Female', language='Yiddish')
Data(ssn='104-22-0928', first_name='Justinian', last_name='Kunzelmann', gender='Male', language='Dhivehi')
Data(ssn='104-84-7144', first_name='Claudianus', last_name='Brixey', gender='Male', language='Afrikaans')
Data(ssn='105-27-5541', first_name='Federico', last_name='Aggett', gender='Male', language='Chinese')
Data(ssn='105-85-7486', first_name='Angelina', last_name='McAvey', gender='Female', language='Punjabi')
Data(ssn='105-91-5022', first_name='Moselle', last_name='Apfel', gender='Female', language='Latvian')
Data(ssn='105-91-7777', first_name='Audi', last_name='Roach', gender='Female', language='Estonian')
Data(ssn='106-35-1938', first_name='Mackenzie', last_name='Nussey', gen

Goal 1 achieved! (Also, this is dynamic enough to work with just about any csv file)

#### Goal 2 Solution

In [17]:
def parsed_data_iter(data_iter, nt):
    for row in data_iter:
        yield nt(*row)

In [16]:
from contextlib import contextmanager

In [20]:
@contextmanager
def parsed_data(f_name):
    f = open(f_name, 'r')
    try:
        reader = csv.reader(f, get_dialect(f_name))
        headers = map(lambda s: s.lower(), next(reader))
        nt = namedtuple('Data', headers)
        yield parsed_data_iter(reader, nt)
    finally:
        f.close()

In [21]:
with parsed_data('personal_info.csv') as data:
    for row in islice(data, 5):
        print(row)

Data(ssn='100-53-9824', first_name='Sebastiano', last_name='Tester', gender='Male', language='Icelandic')
Data(ssn='101-71-4702', first_name='Cayla', last_name='MacDonagh', gender='Female', language='Lao')
Data(ssn='101-84-0356', first_name='Nomi', last_name='Lipprose', gender='Female', language='Yiddish')
Data(ssn='104-22-0928', first_name='Justinian', last_name='Kunzelmann', gender='Male', language='Dhivehi')
Data(ssn='104-84-7144', first_name='Claudianus', last_name='Brixey', gender='Male', language='Afrikaans')


In [22]:
with parsed_data('cars.csv') as data:
    for row in islice(data, 5):
        print(row)

Data(car='Chevrolet Chevelle Malibu', mpg='18.0', cylinders='8', displacement='307.0', horsepower='130.0', weight='3504.', acceleration='12.0', model='70', origin='US')
Data(car='Buick Skylark 320', mpg='15.0', cylinders='8', displacement='350.0', horsepower='165.0', weight='3693.', acceleration='11.5', model='70', origin='US')
Data(car='Plymouth Satellite', mpg='18.0', cylinders='8', displacement='318.0', horsepower='150.0', weight='3436.', acceleration='11.0', model='70', origin='US')
Data(car='AMC Rebel SST', mpg='16.0', cylinders='8', displacement='304.0', horsepower='150.0', weight='3433.', acceleration='12.0', model='70', origin='US')
Data(car='Ford Torino', mpg='17.0', cylinders='8', displacement='302.0', horsepower='140.0', weight='3449.', acceleration='10.5', model='70', origin='US')


In [23]:
@contextmanager
def parsed_data(f_name):
    def get_dialect(f_name):
        with open(f_name) as f:
            return csv.Sniffer().sniff(f.read(1000))


    def parsed_data_iter(data_iter, nt):
        for row in data_iter:
            yield nt(*row)
            
            
    f = open(f_name, 'r')
    try:
        reader = csv.reader(f, get_dialect(f_name))
        headers = map(lambda s: s.lower(), next(reader))
        nt = namedtuple('Data', headers)
        yield parsed_data_iter(reader, nt)
    finally:
        f.close()

In [24]:
with parsed_data('cars.csv') as data:
    for row in islice(data, 5):
        print(row)

Data(car='Chevrolet Chevelle Malibu', mpg='18.0', cylinders='8', displacement='307.0', horsepower='130.0', weight='3504.', acceleration='12.0', model='70', origin='US')
Data(car='Buick Skylark 320', mpg='15.0', cylinders='8', displacement='350.0', horsepower='165.0', weight='3693.', acceleration='11.5', model='70', origin='US')
Data(car='Plymouth Satellite', mpg='18.0', cylinders='8', displacement='318.0', horsepower='150.0', weight='3436.', acceleration='11.0', model='70', origin='US')
Data(car='AMC Rebel SST', mpg='16.0', cylinders='8', displacement='304.0', horsepower='150.0', weight='3433.', acceleration='12.0', model='70', origin='US')
Data(car='Ford Torino', mpg='17.0', cylinders='8', displacement='302.0', horsepower='140.0', weight='3449.', acceleration='10.5', model='70', origin='US')


In [25]:
@contextmanager
def parsed_data(f_name):
    def get_dialect(f_name):
        with open(f_name) as f:
            return csv.Sniffer().sniff(f.read(1000))

        
    f = open(f_name, 'r')
    try:
        reader = csv.reader(f, get_dialect(f_name))
        headers = map(lambda s: s.lower(), next(reader))
        nt = namedtuple('Data', headers)
        yield (nt(*row) for row in reader)
    finally:
        f.close()

In [26]:
with parsed_data('cars.csv') as data:
    for row in islice(data, 5):
        print(row)

Data(car='Chevrolet Chevelle Malibu', mpg='18.0', cylinders='8', displacement='307.0', horsepower='130.0', weight='3504.', acceleration='12.0', model='70', origin='US')
Data(car='Buick Skylark 320', mpg='15.0', cylinders='8', displacement='350.0', horsepower='165.0', weight='3693.', acceleration='11.5', model='70', origin='US')
Data(car='Plymouth Satellite', mpg='18.0', cylinders='8', displacement='318.0', horsepower='150.0', weight='3436.', acceleration='11.0', model='70', origin='US')
Data(car='AMC Rebel SST', mpg='16.0', cylinders='8', displacement='304.0', horsepower='150.0', weight='3433.', acceleration='12.0', model='70', origin='US')
Data(car='Ford Torino', mpg='17.0', cylinders='8', displacement='302.0', horsepower='140.0', weight='3449.', acceleration='10.5', model='70', origin='US')


In [27]:
@contextmanager
def parsed_data(f_name):
    def get_dialect():
        with open(f_name) as f:
            return csv.Sniffer().sniff(f.read(1000))

        
    f = open(f_name, 'r')
    try:
        reader = csv.reader(f, get_dialect())
        headers = map(lambda s: s.lower(), next(reader))
        nt = namedtuple('Data', headers)
        yield (nt(*row) for row in reader)
    finally:
        f.close()

In [28]:
with parsed_data('cars.csv') as data:
    for row in islice(data, 5):
        print(row)

Data(car='Chevrolet Chevelle Malibu', mpg='18.0', cylinders='8', displacement='307.0', horsepower='130.0', weight='3504.', acceleration='12.0', model='70', origin='US')
Data(car='Buick Skylark 320', mpg='15.0', cylinders='8', displacement='350.0', horsepower='165.0', weight='3693.', acceleration='11.5', model='70', origin='US')
Data(car='Plymouth Satellite', mpg='18.0', cylinders='8', displacement='318.0', horsepower='150.0', weight='3436.', acceleration='11.0', model='70', origin='US')
Data(car='AMC Rebel SST', mpg='16.0', cylinders='8', displacement='304.0', horsepower='150.0', weight='3433.', acceleration='12.0', model='70', origin='US')
Data(car='Ford Torino', mpg='17.0', cylinders='8', displacement='302.0', horsepower='140.0', weight='3449.', acceleration='10.5', model='70', origin='US')


In [31]:
@contextmanager
def parsed_data(f_name):      
    f = open(f_name, 'r')
    try:
        dialect = csv.Sniffer().sniff(f.read(1000))
        f.seek(0)
        reader = csv.reader(f, dialect)
        headers = map(lambda s: s.lower(), next(reader))
        nt = namedtuple('Data', headers)
        yield (nt(*row) for row in reader)
    finally:
        f.close()

In [32]:
with parsed_data('cars.csv') as data:
    for row in islice(data, 5):
        print(row)

Data(car='Chevrolet Chevelle Malibu', mpg='18.0', cylinders='8', displacement='307.0', horsepower='130.0', weight='3504.', acceleration='12.0', model='70', origin='US')
Data(car='Buick Skylark 320', mpg='15.0', cylinders='8', displacement='350.0', horsepower='165.0', weight='3693.', acceleration='11.5', model='70', origin='US')
Data(car='Plymouth Satellite', mpg='18.0', cylinders='8', displacement='318.0', horsepower='150.0', weight='3436.', acceleration='11.0', model='70', origin='US')
Data(car='AMC Rebel SST', mpg='16.0', cylinders='8', displacement='304.0', horsepower='150.0', weight='3433.', acceleration='12.0', model='70', origin='US')
Data(car='Ford Torino', mpg='17.0', cylinders='8', displacement='302.0', horsepower='140.0', weight='3449.', acceleration='10.5', model='70', origin='US')


In [33]:
with parsed_data('personal_info.csv') as data:
    for row in islice(data, 5):
        print(row)

Data(ssn='100-53-9824', first_name='Sebastiano', last_name='Tester', gender='Male', language='Icelandic')
Data(ssn='101-71-4702', first_name='Cayla', last_name='MacDonagh', gender='Female', language='Lao')
Data(ssn='101-84-0356', first_name='Nomi', last_name='Lipprose', gender='Female', language='Yiddish')
Data(ssn='104-22-0928', first_name='Justinian', last_name='Kunzelmann', gender='Male', language='Dhivehi')
Data(ssn='104-84-7144', first_name='Claudianus', last_name='Brixey', gender='Male', language='Afrikaans')
