<br>
Let's import our datafile mpg.csv, which contains fuel economy data for 234 cars.

* mpg : miles per gallon
* class : car classification
* cty : city mpg
* cyl : # of cylinders
* displ : engine displacement in liters
* drv : f = front-wheel drive, r = rear wheel drive, 4 = 4wd
* fl : fuel (e = ethanol E85, d = diesel, r = regular, p = premium, c = CNG)
* hwy : highway mpg
* manufacturer : automobile manufacturer
* model : model of car
* trans : type of transmission
* year : model year

reading CSV file:

In [25]:
import csv

%precision 2

with open("datasets\mpg.csv") as csvFile:
    mpg = list(csv.DictReader(csvFile))
    
mpg[:3]

[{'': '1',
  'manufacturer': 'audi',
  'model': 'a4',
  'displ': '1.8',
  'year': '1999',
  'cyl': '4',
  'trans': 'auto(l5)',
  'drv': 'f',
  'cty': '18',
  'hwy': '29',
  'fl': 'p',
  'class': 'compact'},
 {'': '2',
  'manufacturer': 'audi',
  'model': 'a4',
  'displ': '1.8',
  'year': '1999',
  'cyl': '4',
  'trans': 'manual(m5)',
  'drv': 'f',
  'cty': '21',
  'hwy': '29',
  'fl': 'p',
  'class': 'compact'},
 {'': '3',
  'manufacturer': 'audi',
  'model': 'a4',
  'displ': '2',
  'year': '2008',
  'cyl': '4',
  'trans': 'manual(m6)',
  'drv': 'f',
  'cty': '20',
  'hwy': '31',
  'fl': 'p',
  'class': 'compact'}]

writing CSV file:

In [26]:
# %precision 2

# column_names= ['', 'manufacturer', 'model', 'displ', 'year', 'cyl', 'trans', 'drv', 'cty', 'hwy', 'fl', 'class']

# data = [{'' : "235", 'manufacturer' : "volkswagen", 'model': "passat", 'displ' : "3.5", 'year' : "2007", 'cyl' : "4", 'trans' : "auto(s6)", 'drv' : "f", 'cty' : "18", 'hwy' : "28", 'fl' : "p", 'class' : "midsize"}]
# with open("datasets\mpg.csv", 'w') as csvFile:
#     writing= csv.DictWriter(csvFile, fieldnames= column_names)
#     writing.writeheader()
#     for record in data:
#         writing.writerow(record)

<br>
`csv.Dictreader` has read in each row of our csv file as a dictionary. `len` shows that our list is comprised of 234 dictionaries.

In [27]:
len(mpg)

234

<br>
`keys` gives us the column names of our csv.

In [28]:
mpg[0].keys()

dict_keys(['', 'manufacturer', 'model', 'displ', 'year', 'cyl', 'trans', 'drv', 'cty', 'hwy', 'fl', 'class'])

in the sets items are unique.

In [29]:
set([1,2,3,4,1])

{1, 2, 3, 4}

in the dictionaries labels are unique.

In [30]:
x= {record['manufacturer'] : record['cty'] for record in mpg}
x

{'audi': '16',
 'chevrolet': '17',
 'dodge': '11',
 'ford': '14',
 'honda': '21',
 'hyundai': '17',
 'jeep': '11',
 'land rover': '11',
 'lincoln': '12',
 'mercury': '13',
 'nissan': '12',
 'pontiac': '16',
 'subaru': '20',
 'toyota': '16',
 'volkswagen': '17'}

<br>
This is how to find the average cty fuel economy across all cars. All values in the dictionaries are strings, so we need to convert to float.

#### note:
Because the type for all the values in our dictionary are strings, we need to convert to float to perform mathematical operations.

In [31]:
sum(float(record['cty']) for record in mpg) / len(mpg)

16.86


Similarly this is how to find the average hwy fuel economy across all cars.

In [32]:
sum(float(record['hwy']) for record in mpg) / len(mpg)

23.44

list comprehension

In [33]:
x= [(record['manufacturer'] , record['cty'], record['cyl'])  for record in mpg if record['cyl'] == '4']
x


[('audi', '18', '4'),
 ('audi', '21', '4'),
 ('audi', '20', '4'),
 ('audi', '21', '4'),
 ('audi', '18', '4'),
 ('audi', '16', '4'),
 ('audi', '20', '4'),
 ('audi', '19', '4'),
 ('chevrolet', '19', '4'),
 ('chevrolet', '22', '4'),
 ('dodge', '18', '4'),
 ('honda', '28', '4'),
 ('honda', '24', '4'),
 ('honda', '25', '4'),
 ('honda', '23', '4'),
 ('honda', '24', '4'),
 ('honda', '26', '4'),
 ('honda', '25', '4'),
 ('honda', '24', '4'),
 ('honda', '21', '4'),
 ('hyundai', '18', '4'),
 ('hyundai', '18', '4'),
 ('hyundai', '21', '4'),
 ('hyundai', '21', '4'),
 ('hyundai', '19', '4'),
 ('hyundai', '19', '4'),
 ('hyundai', '20', '4'),
 ('hyundai', '20', '4'),
 ('nissan', '21', '4'),
 ('nissan', '19', '4'),
 ('nissan', '23', '4'),
 ('nissan', '23', '4'),
 ('subaru', '18', '4'),
 ('subaru', '18', '4'),
 ('subaru', '20', '4'),
 ('subaru', '19', '4'),
 ('subaru', '20', '4'),
 ('subaru', '18', '4'),
 ('subaru', '21', '4'),
 ('subaru', '19', '4'),
 ('subaru', '19', '4'),
 ('subaru', '19', '4'),
 ('s

dictionary comprehension

In [34]:
x= {record['manufacturer'] : record['cty']  for record in mpg if record['cyl'] == '4'}
x


{'audi': '19',
 'chevrolet': '22',
 'dodge': '18',
 'honda': '21',
 'hyundai': '20',
 'nissan': '23',
 'subaru': '20',
 'toyota': '17',
 'volkswagen': '21'}

In [35]:
len(x)

9

In [36]:
z= filter(None, [record for record in mpg if record['cyl'] == '4'])
list(z)

[{'': '1',
  'manufacturer': 'audi',
  'model': 'a4',
  'displ': '1.8',
  'year': '1999',
  'cyl': '4',
  'trans': 'auto(l5)',
  'drv': 'f',
  'cty': '18',
  'hwy': '29',
  'fl': 'p',
  'class': 'compact'},
 {'': '2',
  'manufacturer': 'audi',
  'model': 'a4',
  'displ': '1.8',
  'year': '1999',
  'cyl': '4',
  'trans': 'manual(m5)',
  'drv': 'f',
  'cty': '21',
  'hwy': '29',
  'fl': 'p',
  'class': 'compact'},
 {'': '3',
  'manufacturer': 'audi',
  'model': 'a4',
  'displ': '2',
  'year': '2008',
  'cyl': '4',
  'trans': 'manual(m6)',
  'drv': 'f',
  'cty': '20',
  'hwy': '31',
  'fl': 'p',
  'class': 'compact'},
 {'': '4',
  'manufacturer': 'audi',
  'model': 'a4',
  'displ': '2',
  'year': '2008',
  'cyl': '4',
  'trans': 'auto(av)',
  'drv': 'f',
  'cty': '21',
  'hwy': '30',
  'fl': 'p',
  'class': 'compact'},
 {'': '8',
  'manufacturer': 'audi',
  'model': 'a4 quattro',
  'displ': '1.8',
  'year': '1999',
  'cyl': '4',
  'trans': 'manual(m5)',
  'drv': '4',
  'cty': '18',
  'hwy

In [37]:
x1= [record for record in mpg if record['cyl'] == '4']
print(len(x1))

x2= [record for record in mpg if record['cyl'] == '5']
print(len(x2))

x3= [record for record in mpg if record['cyl'] == '6']
print(len(x3))

x4= [record for record in mpg if record['cyl'] == '8']
print(len(x4))

81
4
79
70


In [38]:
# the numbers of n cylinder cars
#4    5   6    8  = n
81 + 4 + 79 + 70

234

#### example 1
<br>
Here's a more complex example where we are grouping the cars by the number of cylinder, and finding the average cty mpg for each group.


solution 1 for ex 1 :

In [39]:
def group_n_cylinder_cars(cyl: int):
    filtered_mpg= []
    
    for record in mpg:
        if record['cyl'] == f"{cyl}":
            filtered_mpg.append(record)
            
    return filtered_mpg

<br>
the MPG of n cylinder cars in highway
<br>
n : 4 = 21.01,   5 = 20.50,   6 = 16.22,  8 = 12.57

In [40]:
def avg_ctyMPG_n_cyl(cyl: int):
    filterd_li= group_n_cylinder_cars(cyl)
    avg = sum(float(record["cty"]) for record in filterd_li) / len(filterd_li)
    return avg

avg_ctyMPG_n_cyl(8)

12.57

<br>
the MPG of n cylinder cars in highway
<br>
n : 4 = 28.80,   5 = 28.75,   6 = 22.82,  8 = 17.63

In [41]:
def avg_hwyMPG_n_cyl(cyl: int):
    filterd_li= group_n_cylinder_cars(cyl)
    avg = sum(float(record["hwy"]) for record in filterd_li) / len(filterd_li)
    return avg

avg_hwyMPG_n_cyl(8)

17.63

<br>
Use `set` to return the unique values for the number of cylinders the cars in our dataset have.

In [42]:
cylinders= set(record['cyl'] for record in mpg)
cylinders

{'4', '5', '6', '8'}

solution 2 for ex 1 :

In [43]:
ctyMpgByCyl = []

for c in cylinders: # iterate over all the cylinder levels
    sumMPg = 0
    cylTypeCount = 0
    for d in mpg: # iterate over all dictionaries
        if d['cyl'] == c: # if the cylinder level type matches,
            sumMPg += float(d['cty']) # add the cty mpg
            cylTypeCount += 1 # increment the count
    ctyMpgByCyl.append((c, sumMPg / cylTypeCount)) # append the tuple ('cylinder', 'avg mpg')

ctyMpgByCyl.sort(key=lambda x: x[0]) # sort the list from the lowest number of cylinders to highest
ctyMpgByCyl

[('4', 21.01), ('5', 20.50), ('6', 16.22), ('8', 12.57)]

<br>
Use `set` to return the unique values for the class types in our dataset.

In [44]:
vehicleClasses = set(record["class"] for record in mpg)
vehicleClasses

{'2seater', 'compact', 'midsize', 'minivan', 'pickup', 'subcompact', 'suv'}

#### example 2
<br>
And here's an example of how to find the average hwy mpg for each class of vehicle in our dataset.

solution 1 for ex 2 :

In [45]:
def group_type_car_class(cls: str):
    filtered_mpg= []
    
    for record in mpg:
        if record['class'] == f"{cls}":
            filtered_mpg.append(record)
            
    return filtered_mpg

<br> 
the MPG of car classes in city
<br>
class type : 2seater= 24.80, compact= 28.30, midsize= 27.29, minivan= 22.36, pickup= 16.88, subcompact= 28.14, suv= 18.13
<br>      
              

In [46]:
def avg_ctyMPG_type_cls(cls: str):
    filterd_li= group_type_car_class(cls)
    try:
        avg = sum(float(record["cty"]) for record in filterd_li) / len(filterd_li)
    except ZeroDivisionError:
        return 0
    else:
        return avg

avg_ctyMPG_type_cls("suv")

13.50

<br> 
the MPG of car classes in highway
<br>
class type : 2seater= 15.40,   compact= 20.13,   midsize= 18.76,   minivan= 15.82,   pickup= 13.00,   subcompact= 20.37,   suv= 13.50
<br>      

In [47]:
def avg_hwyMPG_type_cls(cls: str):
    filterd_li= group_type_car_class(cls)
    avg = 0
    try:
        li_ctyMPG = []
        for record in filterd_li:
            li_ctyMPG.append(float(record['cty']))
            avg = sum(li_ctyMPG) / len(filterd_li)
    except ZeroDivisionError:
        return avg
    else:
        return avg

avg_hwyMPG_type_cls("subcompact")

20.37

solution 2 for ex 2 :

In [48]:
hwyMpgByClass = []

for t in vehicleClasses: # iterate over all the vehicle classes
    sumMPg = 0
    vClassCount = 0
    for d in mpg: # iterate over all dictionaries
        if d['class'] == t: # if the cylinder amount type matches,
            sumMPg += float(d['hwy']) # add the hwy mpg
            vClassCount += 1 # increment the count
    hwyMpgByClass.append((t, sumMPg / vClassCount)) # append the tuple ('class', 'avg mpg')

hwyMpgByClass.sort(key=lambda x: x[1]) # sort the list from the lowest number of mpg to highest
hwyMpgByClass

[('pickup', 16.88),
 ('suv', 18.13),
 ('minivan', 22.36),
 ('2seater', 24.80),
 ('midsize', 27.29),
 ('subcompact', 28.14),
 ('compact', 28.30)]