For this project you are given a file that contains some parking ticket violations for NYC.

(It's just a tiny extract!)

If you're wondering where I get these data sets, Kaggle is an **excellent** source of data sets in a whole variety of topics: 
https://www.kaggle.com/

You have to sign up, but it's free.

If you want the full data set, it's available here: https://www.kaggle.com/new-york-city/nyc-parking-tickets/version/2#

In [1]:
data = '/mnt/data-ubuntu/Projects/Learning_PY_hardway/data/deep_dive/nyc_parking_tickets_extract.csv'

## Goal 1
Create a lazy iterator that will return a named tuple of the data in each row. The data types should be appropriate - i.e. if the column is a date, you should be storing dates in the named tuple, if the field is an integer, then it should be stored as an integer, etc.

In [31]:
class Tickets:
    from collections import namedtuple
    from datetime import datetime
    Ticket = namedtuple('Ticket', 
                        '''
                        summond_num, 
                        plate_id,
                        reg_state,
                        plate_type,
                        issue_date,
                        violation_code,
                        vehicle_body_type,
                        maker,
                        violation_descr
                        ''')
    
    def __init__(self, data):
        self.data = data
        
    def __iter__(self):
        return Tickets.ticket(self.data)
    
    @staticmethod
    def ticket(data):
        with open(data) as f:
            next(f)
            for row in f:
                lst_t = row.split(',')
                t = Tickets.Ticket(int(lst_t[0]),
                                   str(lst_t[1]),
                                   str(lst_t[2]),
                                   str(lst_t[3]),
                                   datetime.strptime(lst_t[4], '%m/%d/%Y').date(),
                                   int(lst_t[5]),
                                   str(lst_t[6]),
                                   str(lst_t[7]),
                                   str(lst_t[8].strip())
                                  )
                yield t

In [32]:
tickets = Tickets(data)

In [37]:
for t in tickets:
    print(t)

Ticket(summond_num=4006478550, plate_id='VAD7274', reg_state='VA', plate_type='PAS', issue_date=datetime.date(2016, 10, 5), violation_code=5, vehicle_body_type='4D', maker='BMW', violation_descr='BUS LANE VIOLATION')
Ticket(summond_num=4006462396, plate_id='22834JK', reg_state='NY', plate_type='COM', issue_date=datetime.date(2016, 9, 30), violation_code=5, vehicle_body_type='VAN', maker='CHEVR', violation_descr='BUS LANE VIOLATION')
Ticket(summond_num=4007117810, plate_id='21791MG', reg_state='NY', plate_type='COM', issue_date=datetime.date(2017, 4, 10), violation_code=5, vehicle_body_type='VAN', maker='DODGE', violation_descr='BUS LANE VIOLATION')
Ticket(summond_num=4006265037, plate_id='FZX9232', reg_state='NY', plate_type='PAS', issue_date=datetime.date(2016, 8, 23), violation_code=5, vehicle_body_type='SUBN', maker='FORD', violation_descr='BUS LANE VIOLATION')
Ticket(summond_num=4006535600, plate_id='N203399C', reg_state='NY', plate_type='OMT', issue_date=datetime.date(2016, 10, 19

## Goal 2

Calculate the number of violations by car make.

In [40]:
tickets = Tickets(data)
viol_counter = {}
for t in tickets:
    if t.maker not in viol_counter:
        viol_counter[t.maker] = 1
    else:
        viol_counter[t.maker] += 1

In [41]:
print(viol_counter)

{'BMW': 34, 'CHEVR': 76, 'DODGE': 45, 'FORD': 104, 'FRUEH': 44, 'HONDA': 106, 'LINCO': 12, 'TOYOT': 112, 'CADIL': 9, 'CHRYS': 12, 'FIR': 1, 'GMC': 35, 'HYUND': 35, 'JAGUA': 3, 'JEEP': 22, 'LEXUS': 26, 'ME/BE': 38, 'MERCU': 4, 'MITSU': 11, 'NISSA': 70, 'HIN': 6, 'NS/OT': 18, 'WORKH': 2, 'ACURA': 12, 'AUDI': 12, 'INTER': 25, 'ISUZU': 10, 'KENWO': 5, 'KIA': 8, 'OLDSM': 1, 'SUBAR': 18, 'VOLVO': 12, 'SATUR': 2, 'SMART': 3, 'INFIN': 13, 'PETER': 1, '': 5, 'CITRO': 1, 'ROVER': 5, 'BUICK': 5, 'GEO': 1, 'MAZDA': 5, 'PORSC': 3, 'VOLKS': 8, 'YAMAH': 1, 'BSA': 1, 'MINI': 1, 'PONTI': 1, 'SPRI': 1, 'PLYMO': 1, 'SCION': 2, 'UPS': 1, 'FIAT': 1, 'UD': 1, 'UTILI': 1, 'GMCQ': 1, 'SAAB': 2, 'HINO': 2, 'STAR': 1, 'AM/T': 1, 'MI/F': 1}
