### Project
For this project you are given a file that contains some parking ticket violations for NYC.

(It's just a tiny extract!)

If you're wondering where I get these data sets, Kaggle is an excellent source of data sets in a whole variety of topics: https://www.kaggle.com/

You have to sign up, but it's free.

If you want the full data set, it's available here: https://www.kaggle.com/new-york-city/nyc-parking-tickets/version/2#

For this sample data set, the file is named:

nyc_parking_tickets_extract.csv
Your goals are as follows:

### Goal 1
Create a lazy iterator that will return a named tuple of the data in each row. The data types should be appropriate - i.e. if the column is a date, you should be storing dates in the named tuple, if the field is an integer, then it should be stored as an integer, etc.

### Goal 2
Calculate the number of violations by car make.

Note:
Try to use lazy evaluation as much as possible - it may not always be possible though! That's OK, as long as it's kept to a minimum.





In [196]:
file_name = 'nyc_parking_tickets_extract-1.csv'

In [197]:
# A peak in data

with open(file_name) as f:
    for row in range(10):
        print(next(f))

Summons Number,Plate ID,Registration State,Plate Type,Issue Date,Violation Code,Vehicle Body Type,Vehicle Make,Violation Description

4006478550,VAD7274,VA,PAS,10/5/2016,5,4D,BMW,BUS LANE VIOLATION

4006462396,22834JK,NY,COM,9/30/2016,5,VAN,CHEVR,BUS LANE VIOLATION

4007117810,21791MG,NY,COM,4/10/2017,5,VAN,DODGE,BUS LANE VIOLATION

4006265037,FZX9232,NY,PAS,8/23/2016,5,SUBN,FORD,BUS LANE VIOLATION

4006535600,N203399C,NY,OMT,10/19/2016,5,SUBN,FORD,BUS LANE VIOLATION

4007156700,92163MG,NY,COM,4/13/2017,5,VAN,FRUEH,BUS LANE VIOLATION

4006687989,MIQ600,SC,PAS,11/21/2016,5,VN,HONDA,BUS LANE VIOLATION

4006943052,2AE3984,MD,PAS,2/1/2017,5,SW,LINCO,BUS LANE VIOLATION

4007306795,HLG4926,NY,PAS,5/30/2017,5,SUBN,TOYOT,BUS LANE VIOLATION



In [198]:
with open(file_name) as f:
    column_headers = next(f).strip('\n').split(',')
    sample_data = next(f).strip('\n').split(',')

In [199]:
column_headers

['Summons Number',
 'Plate ID',
 'Registration State',
 'Plate Type',
 'Issue Date',
 'Violation Code',
 'Vehicle Body Type',
 'Vehicle Make',
 'Violation Description']

In [200]:
with open(file_name) as f:
    col_names = next(f)

In [201]:
col_names

'Summons Number,Plate ID,Registration State,Plate Type,Issue Date,Violation Code,Vehicle Body Type,Vehicle Make,Violation Description\n'

In [202]:
[col.strip(" ") for col in col_names.strip("\n").split(",")]

['Summons Number',
 'Plate ID',
 'Registration State',
 'Plate Type',
 'Issue Date',
 'Violation Code',
 'Vehicle Body Type',
 'Vehicle Make',
 'Violation Description']

In [203]:
with open(file_name) as f:
    column_names =[col.strip(" ") for col in next(f).strip("\n").split(",")]
    data = [row.strip(" ") for row in next(f).strip("\n").split(",")]

In [204]:
list(column_names)

['Summons Number',
 'Plate ID',
 'Registration State',
 'Plate Type',
 'Issue Date',
 'Violation Code',
 'Vehicle Body Type',
 'Vehicle Make',
 'Violation Description']

In [205]:
data

['4006478550',
 'VAD7274',
 'VA',
 'PAS',
 '10/5/2016',
 '5',
 '4D',
 'BMW',
 'BUS LANE VIOLATION']

In [206]:
list(zip(column_names, data))

[('Summons Number', '4006478550'),
 ('Plate ID', 'VAD7274'),
 ('Registration State', 'VA'),
 ('Plate Type', 'PAS'),
 ('Issue Date', '10/5/2016'),
 ('Violation Code', '5'),
 ('Vehicle Body Type', '4D'),
 ('Vehicle Make', 'BMW'),
 ('Violation Description', 'BUS LANE VIOLATION')]

### data types to be set for each of these fields:

1. Summons Number: integers
2. Plate ID: string
3. Registration State: string
4. Plate Type: string
5. Issue Date: dates
6. Violation Code: integers
7. Vehicle Body Type: string
8. Vehicle Make: string
9. Violation Description: string

In [207]:
def read_data():
    with open(file_name) as f:
        next(f)
        yield from f

In [208]:
raw_data = read_data()
for _ in range(5):
    print(next(raw_data))

4006478550,VAD7274,VA,PAS,10/5/2016,5,4D,BMW,BUS LANE VIOLATION

4006462396,22834JK,NY,COM,9/30/2016,5,VAN,CHEVR,BUS LANE VIOLATION

4007117810,21791MG,NY,COM,4/10/2017,5,VAN,DODGE,BUS LANE VIOLATION

4006265037,FZX9232,NY,PAS,8/23/2016,5,SUBN,FORD,BUS LANE VIOLATION

4006535600,N203399C,NY,OMT,10/19/2016,5,SUBN,FORD,BUS LANE VIOLATION



### Column parser to set appropriate format

In [209]:
def parse_int(value,default=None):
    
    
    try:
        return int(value)
    except ValueError:
        return default

In [210]:
from datetime import datetime

def parse_date(value,default=None):
    
    date_format='%m/%d/%Y'
    try:
        return datetime.strptime(value, date_format).date()
    except ValueError:
        return default

In [211]:
def parse_string(value,default=None):
    
    try:
        clean = str(value).strip()
        if not clean:
            return default
        else:
            return clean
    except ValueError:
        return default

In [212]:
# default is None

column_parsers = (parse_int,  
                  parse_string, 
                  parse_string,  
                  parse_string, 
                  parse_date,  
                  parse_int,  
                  parse_string, 
                  parse_string, 
                  parse_string 
                 )

### row parser to applied above column parser to each column`s value

In [213]:
def parse_row(row):
    
    fields = row.strip('\n').split(',')
    parsed_data = (func(field) 
                   for func, field in zip(column_parsers, fields))
    return parsed_data

In [214]:
rows = read_data()

for _ in range(5):
    row = next(rows)
    parsed_data = parse_row(row)
    print(list(parsed_data))

[4006478550, 'VAD7274', 'VA', 'PAS', datetime.date(2016, 10, 5), 5, '4D', 'BMW', 'BUS LANE VIOLATION']
[4006462396, '22834JK', 'NY', 'COM', datetime.date(2016, 9, 30), 5, 'VAN', 'CHEVR', 'BUS LANE VIOLATION']
[4007117810, '21791MG', 'NY', 'COM', datetime.date(2017, 4, 10), 5, 'VAN', 'DODGE', 'BUS LANE VIOLATION']
[4006265037, 'FZX9232', 'NY', 'PAS', datetime.date(2016, 8, 23), 5, 'SUBN', 'FORD', 'BUS LANE VIOLATION']
[4006535600, 'N203399C', 'NY', 'OMT', datetime.date(2016, 10, 19), 5, 'SUBN', 'FORD', 'BUS LANE VIOLATION']


## create generator 

In [215]:
def parse_rows_gen():
    for row in read_data():
        yield zip(column_names,parse_row(row))

In [216]:
parsed_rows = parse_rows_gen()
for i in range(5):
    print(list(next(parsed_rows)))

[('Summons Number', 4006478550), ('Plate ID', 'VAD7274'), ('Registration State', 'VA'), ('Plate Type', 'PAS'), ('Issue Date', datetime.date(2016, 10, 5)), ('Violation Code', 5), ('Vehicle Body Type', '4D'), ('Vehicle Make', 'BMW'), ('Violation Description', 'BUS LANE VIOLATION')]
[('Summons Number', 4006462396), ('Plate ID', '22834JK'), ('Registration State', 'NY'), ('Plate Type', 'COM'), ('Issue Date', datetime.date(2016, 9, 30)), ('Violation Code', 5), ('Vehicle Body Type', 'VAN'), ('Vehicle Make', 'CHEVR'), ('Violation Description', 'BUS LANE VIOLATION')]
[('Summons Number', 4007117810), ('Plate ID', '21791MG'), ('Registration State', 'NY'), ('Plate Type', 'COM'), ('Issue Date', datetime.date(2017, 4, 10)), ('Violation Code', 5), ('Vehicle Body Type', 'VAN'), ('Vehicle Make', 'DODGE'), ('Violation Description', 'BUS LANE VIOLATION')]
[('Summons Number', 4006265037), ('Plate ID', 'FZX9232'), ('Registration State', 'NY'), ('Plate Type', 'PAS'), ('Issue Date', datetime.date(2016, 8, 23

## Goal 2: 
Calculating Number of Violations by Car Make

In [217]:
parsed_rows = parse_rows_gen()
for i in range(10):
    print(list(next(parsed_rows))[7])

('Vehicle Make', 'BMW')
('Vehicle Make', 'CHEVR')
('Vehicle Make', 'DODGE')
('Vehicle Make', 'FORD')
('Vehicle Make', 'FORD')
('Vehicle Make', 'FRUEH')
('Vehicle Make', 'HONDA')
('Vehicle Make', 'LINCO')
('Vehicle Make', 'TOYOT')
('Vehicle Make', 'TOYOT')


In [218]:
violations_counts= {}

In [219]:
for rows in parsed_rows:
    data = list(next(parsed_rows))[7]
    #print(data)
    if data[1] in violations_counts:
        #print(data[0])
        violations_counts[data[1]] += 1
    else:
        violations_counts[data[1]] = 1
        

In [220]:
violations_counts

{'CHEVR': 40,
 'CHRYS': 7,
 'DODGE': 21,
 'FIR': 1,
 'FORD': 51,
 'HONDA': 51,
 'HYUND': 18,
 'JAGUA': 3,
 'JEEP': 11,
 'LEXUS': 13,
 'ME/BE': 18,
 'MERCU': 3,
 'NISSA': 34,
 'TOYOT': 53,
 'HIN': 5,
 'WORKH': 1,
 'AUDI': 7,
 'BMW': 15,
 'FRUEH': 21,
 'GMC': 18,
 'INTER': 11,
 'ISUZU': 4,
 'KENWO': 3,
 'NS/OT': 9,
 'OLDSM': 1,
 'SUBAR': 10,
 'VOLVO': 6,
 'SATUR': 2,
 'INFIN': 6,
 'PETER': 1,
 'ACURA': 7,
 'CADIL': 4,
 'KIA': 3,
 'BUICK': 3,
 'LINCO': 6,
 'MAZDA': 2,
 'SMART': 2,
 'VOLKS': 4,
 'YAMAH': 1,
 'ROVER': 2,
 'MINI': 1,
 'SPRI': 1,
 'PLYMO': 1,
 'SCION': 1,
 'MITSU': 4,
 'PORSC': 1,
 'UPS': 1,
 'UD': 1,
 None: 1,
 'STAR': 1,
 'SAAB': 1,
 'AM/T': 1,
 'HINO': 1,
 'MI/F': 1}