## Class 2 Python for Data Science
### Python Dictionary
### List Comprehension
### Reading CSV file and fix data errors

One of Python's built−in datatypes is the dictionary, which defines one−to−one relationships between keys and values.

"Like lists dictionaries can easily be changed, can be shrunk and grown ad libitum at run time. They shrink and grow without the necessity of making copies. Dictionaries can be contained in lists and vice versa. But what's the difference between lists and dictionaries? Lists are ordered sets of objects, whereas dictionaries are <b>unordered sets.</b> But the main difference is that items in dictionaries are accessed via keys and not via their position."

<br>
A pair of braces creates an empty dictionary: {}. Placing a comma-separated list of key:value pairs within the braces adds initial key: value pairs to the dictionary; this is also the way dictionaries are written on output.

In [8]:
.
dict1

{'fruit': [75, 'orange'], 'vegetable': 'onion, mushroom, lettuce'}

### Keys

Get the keys from "dict1"

In [10]:
dict1.keys()

dict_keys(['fruit', 'vegetable'])

### Indexing With Keys?

What happens if you try to run "<b>dict1[0]</b>"? Why?


In [11]:
dict1["fruit"]

[75, 'orange']

OR

In [12]:
dict1.get("fruit")

[75, 'orange']

### ii.Values

Get the values from "dict1"

In [13]:
dict1.values()

dict_values([[75, 'orange'], 'onion, mushroom, lettuce'])

### Indexing With Values?
A little more complicated

In [19]:
V = 'orange, watermelon, grape'

print(dict1)
K = "Error : Key is not found."
for key, value in dict1.items():
    if value == V:
        K = key
print(K)

{'fruit': [75, 'orange'], 'vegetable': 'onion, mushroom, lettuce'}
Error : Key is not found.


### iii. Length of Dictionary

Returns the number of stored entries, i.e. the number of (key,value) pairs.

In [20]:
len(dict1)

2

### iv. Remove key and value

In [21]:
del dict1["vegetable"]
print(dict1)

{'fruit': [75, 'orange']}


### v. Add new value

In [22]:
dict1["new"] = 0
print(dict1)

{'fruit': [75, 'orange'], 'new': 0}


### vi. Concatenating Dictionaries
<i>*Note: Keys must be unique</i>

In [283]:
dict1 = {"fruit" : "orange, watermelon, grape", "vegetable":"onion, mushroom, lettuce"}
dict2 = {"fruit1": [5,6,7]}
dict1.update(dict2)
dict1

{'fruit': 'orange, watermelon, grape',
 'vegetable': 'onion, mushroom, lettuce',
 'fruit1': [5, 6, 7]}

### <font color = "coral">Exercise 1: Create a new dictionary</font>
<font color = "coral">Your keys should be "Country","State","City","ZipCode"

Fill in the values according to the keys.

In [24]:
#Your code here
newdict = {"Country" : "US, Canada, Mexico", "State" : "CA, CanS1, Mex1", "City" : "San Jose, CanCity1, MexCity1", "ZipCode" : "95134, 12345, 54321"}
newdict

{'Country': 'US, Canada, Mexico',
 'State': 'CA, CanS1, Mex1',
 'City': 'San Jose, CanCity1, MexCity1',
 'ZipCode': '95134, 12345, 54321'}

## Multi-dimensional Array

In [285]:
a = [[0,  1, 2, 3, 4, 5],
     [10,11,12,13,14,15],
     [20,21,22,23,24,25],
     [30,31,32,33,34,35],
     [40,41,42,43,44,45],
     [50,51,52,53,54,55]]

In [286]:
a[0]

[0, 1, 2, 3, 4, 5]

In [287]:
a[4:6]

[[40, 41, 42, 43, 44, 45], [50, 51, 52, 53, 54, 55]]

In [288]:
a[5][5]

55

### List Comprehensions

In [289]:
[x**2 for x in range(15)]

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196]

 SAME AS BELOW

In [290]:
original = list(range(15))

squares = []

for x in original:
    squares.append(x**2)
    
squares

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196]

#### What is happening in this loop?

In [291]:
new = []
for x in squares:
    if x < 100:
        new.append(x**2)
new

[0, 1, 16, 81, 256, 625, 1296, 2401, 4096, 6561]

In [292]:
new = [i**2 for i in squares if i < 100]
new 

[0, 1, 16, 81, 256, 625, 1296, 2401, 4096, 6561]

### <font color = "coral">Exercise 2:
<font color = "coral">
Turn this for loop into a nested for list comprehension (Should only be one line).

In [293]:
mystery = []
for i in range(1000):
    if i%5 == 0:
        mystery.append(i)
print(mystery)

[0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 255, 260, 265, 270, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350, 355, 360, 365, 370, 375, 380, 385, 390, 395, 400, 405, 410, 415, 420, 425, 430, 435, 440, 445, 450, 455, 460, 465, 470, 475, 480, 485, 490, 495, 500, 505, 510, 515, 520, 525, 530, 535, 540, 545, 550, 555, 560, 565, 570, 575, 580, 585, 590, 595, 600, 605, 610, 615, 620, 625, 630, 635, 640, 645, 650, 655, 660, 665, 670, 675, 680, 685, 690, 695, 700, 705, 710, 715, 720, 725, 730, 735, 740, 745, 750, 755, 760, 765, 770, 775, 780, 785, 790, 795, 800, 805, 810, 815, 820, 825, 830, 835, 840, 845, 850, 855, 860, 865, 870, 875, 880, 885, 890, 895, 900, 905, 910, 915, 920, 925, 930, 935, 940, 945, 950, 955, 960, 965, 970, 975, 980, 985, 990, 995]


In [71]:
#Your code here
#new_list = [expression(i) for i in old_list if filter(i)]
mystery1 = [i for i in range(1000) if i%5 == 0]
print(mystery1)

[0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 255, 260, 265, 270, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350, 355, 360, 365, 370, 375, 380, 385, 390, 395, 400, 405, 410, 415, 420, 425, 430, 435, 440, 445, 450, 455, 460, 465, 470, 475, 480, 485, 490, 495, 500, 505, 510, 515, 520, 525, 530, 535, 540, 545, 550, 555, 560, 565, 570, 575, 580, 585, 590, 595, 600, 605, 610, 615, 620, 625, 630, 635, 640, 645, 650, 655, 660, 665, 670, 675, 680, 685, 690, 695, 700, 705, 710, 715, 720, 725, 730, 735, 740, 745, 750, 755, 760, 765, 770, 775, 780, 785, 790, 795, 800, 805, 810, 815, 820, 825, 830, 835, 840, 845, 850, 855, 860, 865, 870, 875, 880, 885, 890, 895, 900, 905, 910, 915, 920, 925, 930, 935, 940, 945, 950, 955, 960, 965, 970, 975, 980, 985, 990, 995]


 <h1> <b><font color = coral>&#9998; <font color = coral>EXERCISE 3:</h1></font>

<font color = "coral">
Not that you have all this knowledge on different operators, data types, and loops create a loop that removes all the unwanted information from our list.

<b>1) Create a loop where you get rid of all the odd numbers.
<br><br>
2) Put all the numbers in order from smallest to largest.<br><br>
3) Once you only have a list of ordered even numbers convert all these integers into strings.<br><br>
4) Now print your number strings as a single string with comma separation.</b>



In [34]:
lst = [4,6,3,2,6,8,9,7,23,4,465,7,6,8,454,5,876,567,54,76,34,55,
       33,7653,234234,7857,23432,4353,4,345,4667,23235,1212,221,
       335,2323,21,45,76,5432,54645645,123212245346342,67,34563,2]

#Your code here

# 1) Create a loop where you get rid of all the odd numbers.
newlst = []
for i in range(len(lst)):
    if lst[i] % 2 != 0:
        continue
    newlst.append(lst[i])

# 2) Put all the numbers in order from smallest to largest.
newlst.sort()
print("Original sorted even number list")
print(newlst)

# 3) Once you only have a list of ordered even numbers convert all these integers into strings.
for i in range(len(newlst)):
    newlst[i] = str(newlst[i])
print("Converted list for even number string")
print(newlst)

# 4) Now print your number strings as a single string with comma separation.
newstr = ""
for i in range(len(newlst)):
    newstr = newstr + newlst[i]
    if i != len(newlst)-1:
        newstr = newstr + ","
print("Single string for even number list.")
print(newstr)


Original sorted even number list
[2, 2, 4, 4, 4, 6, 6, 6, 8, 8, 34, 54, 76, 76, 454, 876, 1212, 5432, 23432, 234234, 123212245346342]
Converted list for string
['2', '2', '4', '4', '4', '6', '6', '6', '8', '8', '34', '54', '76', '76', '454', '876', '1212', '5432', '23432', '234234', '123212245346342']
Single string list.
2,2,4,4,4,6,6,6,8,8,34,54,76,76,454,876,1212,5432,23432,234234,123212245346342


 <h1> <b><font color = coral>&#9998; <font color = coral>EXERCISE 4:</h1></font>

<font color = "coral">If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.

Find the sum of all the multiples of 3 or 5 below 10,000.

In [50]:
#Your answer here
mul35 = []
for i in range(1,10000):
    if i%3 == 0 or i%5 == 0:
        mul35.append(i)
print("The sum of all the multiples of 3 or 5 below 10,000.")
print(sum(mul35))


The sum of all the multiples of 3 or 5 below 10,000.
23331668


 <h1> <b><font color = coral>&#9998; <font color = coral>EXERCISE 5:</h1></font>

<font color = "coral">
Calculate all square numbers (1,4,9,16,...) below 1,000. What's their sum?

In [57]:
#Your answer here
snum_list = []
for i in range(1,1000):
    snum = i * i
    if snum >= 1000:
        break
    snum_list.append(snum)
print("Square number list below 1,000")
print(snum_list)
print("Sum of all the square numbers below 1,000")
print(sum(snum_list))

Square number list below 1,000
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361, 400, 441, 484, 529, 576, 625, 676, 729, 784, 841, 900, 961]
Sum of all the square numbers below 1,000
10416


 <h1> <b><font color = coral>&#9998; <font color = coral>EXERCISE 6:</h1></font>

<font color = "coral">
Write a function to calculate the mean (average) of "lst". Do not use the built-in "mean" functions that Python offers.

In [63]:
lst = [4,6,3,2,6,8,9,7,23,4,465,7,6,8,454,5,876,567,54,76,34,55]
           
#Your answer here
sum = 0
for i in range(len(lst)):
    sum = sum + lst[i]
avg = sum / len(lst)
print("Average value for the list")
print(avg)


Average value for th list
121.77272727272727


 <h1> <b><font color = coral>&#9998; <font color = coral>EXERCISE 7:</h1></font>

<font color = "coral">
Write a function to calculate the median of "lst". Do not use the built-in "median" functions that Python offers.

In [68]:
lst = [4,6,3,2,6,8,9,7,23,4,465,7,6,8,454,5,876,567,54,76,34,55,
       33,7653,234234,7857,23432,4353,4,345,4667,23235,1212,221,
       335,2323,21,45,76,5432,54645645,123212245346342,67,34563,2]

#Your answer here
lst.sort()
mid_index = int(len(lst)/2)

if len(lst) == 0:
    median_value = 0
elif len(lst) % 2 != 0:
    # This is the case for number of list is odd number
    median_value = lst[mid_index]
else:
    # This is the case for number of list is even number
    median_value = (lst[mid_index-1] + lst[mid_index]) / 2

print("The number of list is %d." % len(lst))
print("The median value from the list is %d." % median_value)

The number of list is 45.
The median value from the list is 67.


 <h1> <b><font color = coral>&#9998; <font color = coral>EXERCISE 8:</h1></font>

<font color = "coral">Write a function to calculate the mode of "lst". Do not use the built-in "mode" functions that Python offers.

In [80]:
lst = [4,6,3,2,6,8,9,7,23,4,465,7,6,8,454,5,876,567,54,76,34,55,
       33,7653,234234,7857,23432,4353,4,345,4667,23235,1212,221,
       335,2323,21,45,76,5432,54645645,123212245346342,67,34563,2]

#Your answer here
INDEX_LIST_VALUE = 1
INDEX_LIST_CNT = 0

cnt_lst = []
for i in range(len(lst)):
    found = 0
    for j in range (len(cnt_lst)):
        if cnt_lst[j][INDEX_LIST_VALUE] == lst[i]:
            found = 1
            cnt_lst[j][INDEX_LIST_CNT] = cnt_lst[j][INDEX_LIST_CNT] + 1
    if found != 1:
        cnt_lst.append([1, lst[i]])

print("Original list")
print(cnt_lst)
print("Sorted list")
cnt_lst.sort()
print(cnt_lst)
print("Most frequent data value : %d" % cnt_lst[-1][INDEX_LIST_VALUE])
print("The number of the frequent value count : %d" % cnt_lst[-1][0])
    

Original list
[[3, 4], [3, 6], [1, 3], [2, 2], [2, 8], [1, 9], [2, 7], [1, 23], [1, 465], [1, 454], [1, 5], [1, 876], [1, 567], [1, 54], [2, 76], [1, 34], [1, 55], [1, 33], [1, 7653], [1, 234234], [1, 7857], [1, 23432], [1, 4353], [1, 345], [1, 4667], [1, 23235], [1, 1212], [1, 221], [1, 335], [1, 2323], [1, 21], [1, 45], [1, 5432], [1, 54645645], [1, 123212245346342], [1, 67], [1, 34563]]
Sorted list
[[1, 3], [1, 5], [1, 9], [1, 21], [1, 23], [1, 33], [1, 34], [1, 45], [1, 54], [1, 55], [1, 67], [1, 221], [1, 335], [1, 345], [1, 454], [1, 465], [1, 567], [1, 876], [1, 1212], [1, 2323], [1, 4353], [1, 4667], [1, 5432], [1, 7653], [1, 7857], [1, 23235], [1, 23432], [1, 34563], [1, 234234], [1, 54645645], [1, 123212245346342], [2, 2], [2, 7], [2, 8], [2, 76], [3, 4], [3, 6]]
Most frequent data value : 6
The number of the frequent value count : 3


# Reading CSV file -- bayarea_home_prices data

In [81]:
"""
Dataset description
1) HomeID = Home ID number
2) HomeAge = Age of home in years
3) HomeSqft = Square footage of home
4) LotSize = LotSize
5) BedRooms = Num bedrooms as per county data
6) HighSchoolAPI = API for nearest high school
7) ProxFwy = Distance in miles to Freeway
8) CarGarage = Number of cars in garage; 0 = no garage
9) ZipCode = Postal zip code for the home
10)HomePriceK = Home price in $K (Target)
-------------------------------------------
9 X Variables; 1 Y variable (Target)
Data Points = 100

Data errors:
1) Few ZipCode have starting digit to be 8, it should be 9
2) Few HighSchoolApi scores have two digits, the ending digit 0 is missing
3) Few CarGarage numbers were entered as letter "l", it should be integer 1 
"""

'\nDataset description\n1) HomeID = Home ID number\n2) HomeAge = Age of home in years\n3) HomeSqft = Square footage of home\n4) LotSize = LotSize\n5) BedRooms = Num bedrooms as per county data\n6) HighSchoolAPI = API for nearest high school\n7) ProxFwy = Distance in miles to Freeway\n8) CarGarage = Number of cars in garage; 0 = no garage\n9) ZipCode = Postal zip code for the home\n10)HomePriceK = Home price in $K (Target)\n-------------------------------------------\n9 X Variables; 1 Y variable (Target)\nData Points = 100\n\nData errors:\n1) Few ZipCode have starting digit to be 8, it should be 9\n2) Few HighSchoolApi scores have two digits, the ending digit 0 is missing\n3) Few CarGarage numbers were entered as letter "l", it should be integer 1 \n'

In [91]:
## Reading csv files
def read_file(filename):
    file_open = open(filename,"r")
    data_array = []
    for line in iter(file_open):
        if "HomeID" in line:
            continue
        line_no_newline = line.rstrip() # delete white space characters
        line_split = line_no_newline.split(",")
        data_array.append(line_split)
    return data_array

In [92]:
housing_data = read_file("bayarea_home_prices.csv")
print(housing_data[0:6])

[['1', '24', '1757', '6056', '2', '899', '3', '3', '94085', '894'], ['2', '10', '1563', '6085', '2', '959', '4', '3', '94085', '861'], ['3', '14', '1344', '6089', '2', '865', '4', '3', '94085', '831'], ['4', '14', '1215', '6129', '3', '959', '4', '2', '94085', '809'], ['5', '24', '1866', '6141', '3', '877', '4', '1', '94085', '890'], ['6', '18', '1589', '6148', '2', '920', '3', '0', '84085', '867']]


In [93]:
len_housing_data = len(housing_data)
print(len_housing_data)

100


In [305]:
list_HomeAge = []
# for all rows, extract only column 1
for k in range(0,len_housing_data):
    list_HomeAge.append(housing_data[k][1])    

In [306]:
print(list_HomeAge) 
# they are still strings, cannot do numerical calculations with strings 

['24', '10', '14', '14', '24', '18', '13', '19', '17', '24', '12', '22', '15', '25', '10', '20', '23', '16', '10', '13', '17', '10', '15', '10', '21', '12', '13', '10', '17', '24', '10', '18', '11', '19', '12', '14', '13', '22', '22', '15', '23', '21', '17', '11', '15', '11', '21', '22', '12', '19', '19', '25', '23', '12', '10', '11', '11', '19', '22', '19', '13', '19', '25', '12', '14', '25', '24', '12', '21', '16', '19', '24', '25', '17', '14', '12', '17', '25', '17', '11', '18', '19', '24', '25', '22', '19', '18', '22', '21', '14', '16', '18', '25', '21', '13', '11', '10', '21', '19', '11']


In [307]:
# How to convert zipcodes from text to numbers
for k in range(0,len_housing_data):
    housing_data[k][8] = int(housing_data[k][8])  # convert to integer data type and over-write

In [308]:
print(housing_data[0:5]) # Zipcode is without quotes and not strings; they are now integers

[['1', '24', '1757', '6056', '2', '899', '3', '3', 94085, '894'], ['2', '10', '1563', '6085', '2', '959', '4', '3', 94085, '861'], ['3', '14', '1344', '6089', '2', '865', '4', '3', 94085, '831'], ['4', '14', '1215', '6129', '3', '959', '4', '2', 94085, '809'], ['5', '24', '1866', '6141', '3', '877', '4', '1', 94085, '890']]


In [180]:
## Reading csv files, how to fix errors in data, replace 84085 with 94085
def read_file_housing(filename):
    file_open = open(filename,"r")
    data_array = []
    for line in iter(file_open):
        if "HomeID" in line:
            continue
        line_no_newline = line.rstrip()
        line2 = line_no_newline.replace("84085","94085")
        line2 = line2.replace("84087", "94087")
        line2 = line2.replace("85014", "95014")
        line2 = line2.replace("85051", "95051")
        line_split = line2.split(",")
        # fix api value less than 100
        if int(line_split[5]) < 100:
            #print("api fixed", int(line_split[5]))
            line_split[5] = str(int(line_split[5])*10)
        # fix wrong car garage value ('l' to 1)
        if line_split[7] == "l":
            line_split[7] = str(1)
        data_array.append(line_split)
    return data_array

In [181]:
housing_data2 = read_file_housing("bayarea_home_prices.csv")
print(housing_data2[0:6])

[['1', '24', '1757', '6056', '2', '899', '3', '3', '94085', '894'], ['2', '10', '1563', '6085', '2', '959', '4', '3', '94085', '861'], ['3', '14', '1344', '6089', '2', '865', '4', '3', '94085', '831'], ['4', '14', '1215', '6129', '3', '959', '4', '2', '94085', '809'], ['5', '24', '1866', '6141', '3', '877', '4', '1', '94085', '890'], ['6', '18', '1589', '6148', '2', '920', '3', '0', '94085', '867']]


In [182]:
len_housing_data2 = len(housing_data2)
print(len_housing_data2)

100


In [183]:
list_ZipCode2 = []
# for all rows, extract all zipcodes
for k in range(0,len_housing_data2):
    list_ZipCode2.append(int(housing_data2[k][8]))  

In [184]:
print(list_ZipCode2) # Converted to numbers

[94085, 94085, 94085, 94085, 94085, 94085, 94085, 94085, 94085, 94085, 94085, 94085, 94085, 94085, 94085, 95051, 94085, 94085, 95051, 94085, 94085, 94085, 94085, 95051, 94085, 94085, 94085, 95051, 94085, 95051, 95051, 95051, 95051, 95051, 95051, 95051, 95051, 95051, 95051, 95051, 94087, 94087, 95051, 95051, 95051, 94087, 95051, 94087, 94087, 95051, 95051, 95051, 95051, 94087, 95051, 94087, 94087, 94087, 95051, 94087, 94087, 94087, 94087, 94087, 94087, 95014, 94087, 94087, 94087, 94087, 95014, 94087, 95014, 95014, 94087, 94087, 95014, 94087, 94087, 94087, 95014, 95014, 95014, 95014, 95014, 95014, 95014, 95014, 95014, 95014, 95014, 95014, 95014, 95014, 95014, 95014, 95014, 95014, 95014, 95014]


<h1> <b><font color = coral>&#9998; <font color = coral>EXERCISE 9:</h1></font>

In [185]:
"""
The above example shows correcting 84085 -> 94085
Perform other zip code corrections: 
84087 -> 94087,
85014 -> 95014,
85051 -> 95051
Create a table for zip code distribution after corrections:
After:Zipcode,House_Count
94085,25
94087,25
95051,25
95014,25
"""
#Your answer here
zip_cnt = {}
for i in range(len(housing_data2)):
    found = 0
    zipcode = housing_data2[i][8]
    for k, v in zip_cnt.items():
        if zipcode == k:
            zip_cnt[k] = v + 1
            found = 1
            break
    if found == 0:
        zip_cnt[zipcode] = 1
print(zip_cnt)
for k, v in zip_cnt.items():
    print("%s,%d" %(k,v))

{'94085': 25, '95051': 25, '94087': 25, '95014': 25}
94085,25
95051,25
94087,25
95014,25


<h1> <b><font color = coral>&#9998; <font color = coral>EXERCISE 10:</h1></font>

In [186]:
"""
Modify function read_file_housing to multiply incorrect SchoolAPI by 10.
Assume API value to be incorrect if it is a two digit number.
Calculate average School API by zipcode. Print the following:
Average_SchoolAPI,Cnt_of_homes,ZipCode
xyz,mn,abc
"""
#Your answer here
zip_cnt_api = {}
for i in range(len(housing_data2)):
    found = 0
    zipcode = housing_data2[i][8]
    api = int(housing_data2[i][5])
    for k, v in zip_cnt_api.items():
        if zipcode == k:
            v[0] = v[0] + api
            v[1] = v[1] + 1
            zip_cnt_api[k] = v
            found = 1
            break
    if found == 0:
        zip_cnt_api[zipcode] = [api, 1]
print(zip_cnt_api)
for k, v in zip_cnt_api.items():
    print("%d,%d,%s" %(v[0]/v[1],v[1],k))

{'94085': [22675, 25], '95051': [22917, 25], '94087': [22481, 25], '95014': [22370, 25]}
907,25,94085
916,25,95051
899,25,94087
894,25,95014


<h1> <b><font color = coral>&#9998; <font color = coral>EXERCISE 11:</h1></font>

In [188]:
"""
Modify function read_file_housing to replace CarGarage value of 'l' with integer 1
Calculate and print the following:
Car_Garage,Cnt_of_homes
0,m
1,n
2,o
3,p
"""
garage_cnt = {}
for i in range(len(housing_data2)):
    found = 0
    garage = housing_data2[i][7]
    for k, v in garage_cnt.items():
        if garage == k:
            garage_cnt[k] = v + 1
            found = 1
            break
    if found == 0:
        garage_cnt[garage] = 1
print(garage_cnt)
for k, v in sorted(garage_cnt.items()):
    print("%s,%d" %(k, v))

{'3': 32, '2': 19, '1': 18, '0': 31}
0,31
1,18
2,19
3,32


<h1> <b><font color = coral>&#9998; <font color = coral>EXERCISE 12:</h1></font>

In [192]:
"""
Find the average price of a home in this four zip codes area.
Zipcode,Avg_Price,Cnt_of_homes
"""
zip_cnt_price = {}
for i in range(len(housing_data2)):
    found = 0
    zipcode = housing_data2[i][8]
    price = int(housing_data2[i][9])
    for k, v in zip_cnt_price.items():
        if zipcode == k:
            v[0] = v[0] + price
            v[1] = v[1] + 1
            zip_cnt_price[k] = v
            found = 1
            break
    if found == 0:
        zip_cnt_price[zipcode] = [price, 1]
print(zip_cnt_price)
print("Zipcode,Avg_Price(K$),Cnt_of_homes")
for k, v in zip_cnt_price.items():
    print("%s,%d,%d" %(k, v[0]/v[1], v[1]))

{'94085': [22149, 25], '95051': [25580, 25], '94087': [28787, 25], '95014': [31583, 25]}
Zipcode,Avg_Price(K$),Cnt_of_homes
94085,885,25
95051,1023,25
94087,1151,25
95014,1263,25


<h1> <b><font color = coral>&#9998; <font color = coral>EXERCISE 13:</h1></font>

In [195]:
"""
Find the average price of a home in Sunnyvale (94087 and 94085).
Print the output as follows:
The average house price in Sunnyvale based on xx homes is $yyy (thousands).
"""
sunnyvale_cnt_price = {}
cnt  = 0
sum = 0
for i in range(len(housing_data2)):
    zipcode = housing_data2[i][8]
    price = int(housing_data2[i][9])
    if zipcode == "94087" or zipcode == "94085":
        cnt = cnt + 1
        sum = sum + price
print("The average house price in Sunnyvale based on xx homes is $%d (thousands)." % (sum/cnt))

The average house price in Sunnyvale based on xx homes is $1018 (thousands).
