### Project

For this project you are given a file that contains some parking ticket violations for NYC.

(It's just a tiny extract!)

If you're wondering where I get these data sets, Kaggle is an **excellent** source of data sets in a whole variety of topics: 
https://www.kaggle.com/

You have to sign up, but it's free.

If you want the full data set, it's available here: https://www.kaggle.com/new-york-city/nyc-parking-tickets/version/2#

For this sample data set, the file is named: 
```
nyc_parking_tickets_extract.csv
```

Your goals are as follows:

##### Goal 1
Create a lazy iterator that will return a named tuple of the data in each row. The data types should be appropriate - i.e. if the column is a date, you should be storing dates in the named tuple, if the field is an integer, then it should be stored as an integer, etc.

##### Goal 2

Calculate the number of violations by car make.

##### Note:
Try to use lazy evaluation as much as possible - it may not always be possible though! That's OK, as long as it's kept to a minimum.

In [1]:
from collections import namedtuple, Counter
from datetime import datetime

In [2]:
file_name = 'nyc_parking_tickets_extract.csv'

In [3]:
def determine_dtype(string):
    try:
        int(string)
        return int
    except ValueError:
        try:
            float(string)
            return float
        except ValueError:
            try:
                datetime.strptime(string, '%m/%d/%Y')
                return lambda x: datetime.strptime(x, '%m/%d/%Y').date()
            except ValueError:
                return str
def safe_parse(parser, val_str):
    try:
        return parser(val_str)
    except ValueError:
        return None

In [4]:
def parsed_data(file_name):
    with open(file_name) as f:
        fieldnames = ['_'.join(fieldname.split()) for fieldname in f.readline().lower().split(',')]
        Ticket = namedtuple('Ticket', fieldnames, rename=True)
        first_row_values_str = [value.strip() for value in f.readline().split(',')]
        dtype_parsers = [determine_dtype(string) for string in first_row_values_str]
        first_row_parsed = Ticket(*[safe_parse(parser, val_str) for val_str, parser in zip(first_row_values_str, dtype_parsers)])
        yield first_row_parsed
        for row in f.readlines():
            row_values_str = [value.strip() for value in row.split(',')]
            yield Ticket(*[safe_parse(parser, val_str) for val_str, parser in zip(row_values_str, dtype_parsers)])

In [5]:
parsed_rows = parsed_data(file_name)
for _ in range(5):
    print(next(parsed_rows))

Ticket(summons_number=4006478550, plate_id='VAD7274', registration_state='VA', plate_type='PAS', issue_date=datetime.date(2016, 10, 5), violation_code=5, vehicle_body_type='4D', vehicle_make='BMW', violation_description='BUS LANE VIOLATION')
Ticket(summons_number=4006462396, plate_id='22834JK', registration_state='NY', plate_type='COM', issue_date=datetime.date(2016, 9, 30), violation_code=5, vehicle_body_type='VAN', vehicle_make='CHEVR', violation_description='BUS LANE VIOLATION')
Ticket(summons_number=4007117810, plate_id='21791MG', registration_state='NY', plate_type='COM', issue_date=datetime.date(2017, 4, 10), violation_code=5, vehicle_body_type='VAN', vehicle_make='DODGE', violation_description='BUS LANE VIOLATION')
Ticket(summons_number=4006265037, plate_id='FZX9232', registration_state='NY', plate_type='PAS', issue_date=datetime.date(2016, 8, 23), violation_code=5, vehicle_body_type='SUBN', vehicle_make='FORD', violation_description='BUS LANE VIOLATION')
Ticket(summons_number=4

In [6]:
violation_counts_by_make = lambda : dict(Counter((row.vehicle_make for row in parsed_data(file_name))).most_common())
violation_counts_by_make()

{'TOYOT': 112,
 'HONDA': 106,
 'FORD': 104,
 'CHEVR': 76,
 'NISSA': 70,
 'DODGE': 45,
 'FRUEH': 44,
 'ME/BE': 38,
 'GMC': 35,
 'HYUND': 35,
 'BMW': 34,
 'LEXUS': 26,
 'INTER': 25,
 'JEEP': 22,
 'NS/OT': 18,
 'SUBAR': 18,
 'INFIN': 13,
 'LINCO': 12,
 'CHRYS': 12,
 'ACURA': 12,
 'AUDI': 12,
 'VOLVO': 12,
 'MITSU': 11,
 'ISUZU': 10,
 'CADIL': 9,
 'KIA': 8,
 'VOLKS': 8,
 'HIN': 6,
 'KENWO': 5,
 '': 5,
 'ROVER': 5,
 'BUICK': 5,
 'MAZDA': 5,
 'MERCU': 4,
 'JAGUA': 3,
 'SMART': 3,
 'PORSC': 3,
 'WORKH': 2,
 'SATUR': 2,
 'SCION': 2,
 'SAAB': 2,
 'HINO': 2,
 'FIR': 1,
 'OLDSM': 1,
 'PETER': 1,
 'CITRO': 1,
 'GEO': 1,
 'YAMAH': 1,
 'BSA': 1,
 'MINI': 1,
 'PONTI': 1,
 'SPRI': 1,
 'PLYMO': 1,
 'UPS': 1,
 'FIAT': 1,
 'UD': 1,
 'UTILI': 1,
 'GMCQ': 1,
 'STAR': 1,
 'AM/T': 1,
 'MI/F': 1}