## Class 2 Python for Data Science
### Python Dictionary
### List Comprehension
### Reading CSV file and fix data errors

One of Python's built−in datatypes is the dictionary, which defines one−to−one relationships between keys and values.

"Like lists dictionaries can easily be changed, can be shrunk and grown ad libitum at run time. They shrink and grow without the necessity of making copies. Dictionaries can be contained in lists and vice versa. But what's the difference between lists and dictionaries? Lists are ordered sets of objects, whereas dictionaries are <b>unordered sets.</b> But the main difference is that items in dictionaries are accessed via keys and not via their position."

<br>
A pair of braces creates an empty dictionary: {}. Placing a comma-separated list of key:value pairs within the braces adds initial key: value pairs to the dictionary; this is also the way dictionaries are written on output.

In [30]:
dict1 = {"fruit" : [75,"orange"], "vegetable":"onion, mushroom, lettuce"}
dict1

{'fruit': [75, 'orange'], 'vegetable': 'onion, mushroom, lettuce'}

### Keys

Get the keys from "dict1"

In [31]:
dict1.keys()

dict_keys(['fruit', 'vegetable'])

### Indexing With Keys?

What happens if you try to run "<b>dict1[0]</b>"? Why?


In [3]:
# You get a NameError as below, as dict1 is not a list
# NameError: name 'dict1' is not defined

NameError: name 'dict1' is not defined

In [32]:
dict1["fruit"]

[75, 'orange']

OR

In [33]:
dict1.get("fruit")

[75, 'orange']

### ii.Values

Get the values from "dict1"

In [34]:
dict1.values()

dict_values([[75, 'orange'], 'onion, mushroom, lettuce'])

### Indexing With Values?
A little more complicated

In [35]:
V = 'onion, mushroom, lettuce'

for key, value in dict1.items():
    if value == V:
        K = key
print(K)

vegetable


### iii. Length of Dictionary

Returns the number of stored entries, i.e. the number of (key,value) pairs.

In [36]:
len(dict1)

2

### iv. Remove key and value

In [37]:
# dict_name[key]
del dict1["vegetable"]
print(dict1)

{'fruit': [75, 'orange']}


### v. Add new value

In [38]:
# dict_name[key] = value
dict1["new"] = 0
print(dict1)

{'fruit': [75, 'orange'], 'new': 0}


### vi. Concatenating Dictionaries
<i>*Note: Keys must be unique</i>

In [39]:
# To add dict2 to dict1

dict1 = {"fruit" : "orange, watermelon, grape", "vegetable":"onion, mushroom, lettuce"}
dict2 = {"fruit1": [5,6,7]}
dict1.update(dict2)
dict1

{'fruit': 'orange, watermelon, grape',
 'vegetable': 'onion, mushroom, lettuce',
 'fruit1': [5, 6, 7]}

### <font color = "coral">Exercise 1: Create a new dictionary</font>
<font color = "coral">Your keys should be "Country","State","City","ZipCode"

Fill in the values according to the keys.

In [40]:
#Your code here - COMPLETED?
# DO I need to create a list with multiple dictionaries in them?

dict_geo = {"Country":"USA", "State":"California", "City":"San Francisco", "Zip":"94122"}

## Multi-dimensional Array

In [41]:
a = [[0,  1, 2, 3, 4, 5],
     [10,11,12,13,14,15],
     [20,21,22,23,24,25],
     [30,31,32,33,34,35],
     [40,41,42,43,44,45],
     [50,51,52,53,54,55]]

In [42]:
a[0]

[0, 1, 2, 3, 4, 5]

In [43]:
a[4:6]

[[40, 41, 42, 43, 44, 45], [50, 51, 52, 53, 54, 55]]

In [44]:
a[5][5]

55

In [4]:
list(range(15))


[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]

### List Comprehensions

In [4]:
bbb = [x**2 for x in range(15)]
bbb

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196]

 SAME AS BELOW

In [8]:
original = list(range(15))
print(original)
squares = []

for x in original:
    squares.append(x**2)
    
print(squares)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196]


#### What is happening in this loop?

In [9]:
new = []
for x in squares:
    if x < 100:
        new.append(x**2)
new

[0, 1, 16, 81, 256, 625, 1296, 2401, 4096, 6561]

In [12]:
new = [i**2 for i in squares if i < 100]
new 

[0, 1, 16, 81, 256, 625, 1296, 2401, 4096, 6561]

### <font color = "coral">Exercise 2:
<font color = "coral">
Turn this for loop into a nested for list comprehension (Should only be one line).

In [49]:
mystery = []
for i in range(1000):
    if i%5 == 0:
        mystery.append(i)
print(mystery)

[0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 255, 260, 265, 270, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350, 355, 360, 365, 370, 375, 380, 385, 390, 395, 400, 405, 410, 415, 420, 425, 430, 435, 440, 445, 450, 455, 460, 465, 470, 475, 480, 485, 490, 495, 500, 505, 510, 515, 520, 525, 530, 535, 540, 545, 550, 555, 560, 565, 570, 575, 580, 585, 590, 595, 600, 605, 610, 615, 620, 625, 630, 635, 640, 645, 650, 655, 660, 665, 670, 675, 680, 685, 690, 695, 700, 705, 710, 715, 720, 725, 730, 735, 740, 745, 750, 755, 760, 765, 770, 775, 780, 785, 790, 795, 800, 805, 810, 815, 820, 825, 830, 835, 840, 845, 850, 855, 860, 865, 870, 875, 880, 885, 890, 895, 900, 905, 910, 915, 920, 925, 930, 935, 940, 945, 950, 955, 960, 965, 970, 975, 980, 985, 990, 995]


In [8]:
outer_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

nt = [i for i in range(len(outer_list[0]))]

transpose = [[row[i] for row in outer_list] for i in range(len(outer_list[0]))]

print(nt)
print(transpose)

[0, 1, 2]
[[1, 4, 7], [2, 5, 8], [3, 6, 9]]


In [23]:
#Your code here - COMPLETED
# IMPORTANT NOTE: If you want it to wrap, MUST use print, else it prints one long column. Test ie without print
print([i for i in range(1000) if i%5 == 0])
# [i for i in range(1000) if i%5 == 0]
#[ [row[i] for row in range(1000)] for i in range(1) if i%5 == 0 ]

[0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 255, 260, 265, 270, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350, 355, 360, 365, 370, 375, 380, 385, 390, 395, 400, 405, 410, 415, 420, 425, 430, 435, 440, 445, 450, 455, 460, 465, 470, 475, 480, 485, 490, 495, 500, 505, 510, 515, 520, 525, 530, 535, 540, 545, 550, 555, 560, 565, 570, 575, 580, 585, 590, 595, 600, 605, 610, 615, 620, 625, 630, 635, 640, 645, 650, 655, 660, 665, 670, 675, 680, 685, 690, 695, 700, 705, 710, 715, 720, 725, 730, 735, 740, 745, 750, 755, 760, 765, 770, 775, 780, 785, 790, 795, 800, 805, 810, 815, 820, 825, 830, 835, 840, 845, 850, 855, 860, 865, 870, 875, 880, 885, 890, 895, 900, 905, 910, 915, 920, 925, 930, 935, 940, 945, 950, 955, 960, 965, 970, 975, 980, 985, 990, 995]


 <h1> <b><font color = coral>&#9998; <font color = coral>EXERCISE 3:</h1></font>

<font color = "coral">
Not that you have all this knowledge on different operators, data types, and loops create a loop that removes all the unwanted information from our list.

<b>1) Create a loop where you get rid of all the odd numbers.
<br><br>
2) Put all the numbers in order from smallest to largest.<br><br>
3) Once you only have a list of ordered even numbers convert all these integers into strings.<br><br>
4) Now print your number strings as a single string with comma separation.</b>



In [25]:
# list for Ex 3
lst = [4,6,3,2,6,8,9,7,23,4,465,7,6,8,454,5,876,567,54,76,34,55,
       33,7653,234234,7857,23432,4353,4,345,4667,23235,1212,221,
       335,2323,21,45,76,5432,54645645,123212245346342,67,34563,2]

In [66]:
#Your code here
print("lst =", lst)
EvenLst = [lst[i] for i in range(len(lst)) if lst[i]%2 == 0]
print("Even List =", EvenLst)
EvenLst.sort()
print("Sorted Even List =", EvenLst)
type(EvenLst[0])
StrEvenLst = [ str(EvenLst[i]) for i in range(len(EvenLst))]
print("Type String Sorted Even List =", StrEvenLst)
S = ', '.join(StrEvenLst)
print("The sorted even number list from lst as a string is: ", S)

lst = [4, 6, 3, 2, 6, 8, 9, 7, 23, 4, 465, 7, 6, 8, 454, 5, 876, 567, 54, 76, 34, 55]
Even List = [4, 6, 2, 6, 8, 4, 6, 8, 454, 876, 54, 76, 34]
Sorted Even List = [2, 4, 4, 6, 6, 6, 8, 8, 34, 54, 76, 454, 876]
Type String Sorted Even List = ['2', '4', '4', '6', '6', '6', '8', '8', '34', '54', '76', '454', '876']
The sorted even number list from lst as a string is:  2, 4, 4, 6, 6, 6, 8, 8, 34, 54, 76, 454, 876


 <h1> <b><font color = coral>&#9998; <font color = coral>EXERCISE 4:</h1></font>

<font color = "coral">If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.

Find the sum of all the multiples of 3 or 5 below 10,000.

In [65]:
#Your answer here COMPLETED
print([ i for i in range(1, 1000) if (i%3 ==0) or (i%5 == 0)])
print("\nThe sum of numbers divisible by 3,5 below 1000 is: ", sum([ i for i in range(1, 1000) if (i%3 ==0) or (i%5 == 0)]) )

[3, 5, 6, 9, 10, 12, 15, 18, 20, 21, 24, 25, 27, 30, 33, 35, 36, 39, 40, 42, 45, 48, 50, 51, 54, 55, 57, 60, 63, 65, 66, 69, 70, 72, 75, 78, 80, 81, 84, 85, 87, 90, 93, 95, 96, 99, 100, 102, 105, 108, 110, 111, 114, 115, 117, 120, 123, 125, 126, 129, 130, 132, 135, 138, 140, 141, 144, 145, 147, 150, 153, 155, 156, 159, 160, 162, 165, 168, 170, 171, 174, 175, 177, 180, 183, 185, 186, 189, 190, 192, 195, 198, 200, 201, 204, 205, 207, 210, 213, 215, 216, 219, 220, 222, 225, 228, 230, 231, 234, 235, 237, 240, 243, 245, 246, 249, 250, 252, 255, 258, 260, 261, 264, 265, 267, 270, 273, 275, 276, 279, 280, 282, 285, 288, 290, 291, 294, 295, 297, 300, 303, 305, 306, 309, 310, 312, 315, 318, 320, 321, 324, 325, 327, 330, 333, 335, 336, 339, 340, 342, 345, 348, 350, 351, 354, 355, 357, 360, 363, 365, 366, 369, 370, 372, 375, 378, 380, 381, 384, 385, 387, 390, 393, 395, 396, 399, 400, 402, 405, 408, 410, 411, 414, 415, 417, 420, 423, 425, 426, 429, 430, 432, 435, 438, 440, 441, 444, 445, 447, 450,

 <h1> <b><font color = coral>&#9998; <font color = coral>EXERCISE 5:</h1></font>

<font color = "coral">
Calculate all square numbers (1,4,9,16,...) below 1,000. What's their sum?

In [64]:
# COMPLETED
#Your answer here
print([i**2 for i in range(1000) if i**2 < 1000])
# FINAL ANSWER
print("\nThe sum of all squares below 1000 is: ", sum([i**2 for i in range(1000) if i**2 < 1000]) )

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361, 400, 441, 484, 529, 576, 625, 676, 729, 784, 841, 900, 961]

The sum of all squares below 1000 is:  10416


 <h1> <b><font color = coral>&#9998; <font color = coral>EXERCISE 6:</h1></font>

<font color = "coral">
Write a function to calculate the mean (average) of "lst". Do not use the built-in "mean" functions that Python offers.

In [374]:
# COMPLETED
lst = [4,6,3,2,6,8,9,7,23,4,465,7,6,8,454,5,876,567,54,76,34,55]
           
#Your answer here
print( "The average of lst is: ", round( ( sum(lst) / len(lst) ), 2 ) )

The average of lst is:  121.77


 <h1> <b><font color = coral>&#9998; <font color = coral>EXERCISE 7:</h1></font>

<font color = "coral">
Write a function to calculate the median of "lst". Do not use the built-in "median" functions that Python offers.

In [375]:
# COMPLETED
lst = [4,6,3,2,6,8,9,7,23,4,465,7,6,8,454,5,876,567,54,76,34,55,
       33,7653,234234,7857,23432,4353,4,345,4667,23235,1212,221,
       335,2323,21,45,76,5432,54645645,123212245346342,67,34563,2]

l = [1,2,3,4,5,6]
#Your answer here
# [x for i in range(1, 10) for x in (i,i**2)]
# print(len(lst))
# print(lst)
#[ l[len(l)//2] if (len(l) % 2 == 1) else ()]
# [x+1 if x >= 5 else x+5 for x in l]
# l[len(l)//2] if (len(l)%2 == 1) else ( (l[len(l)//2] + (l[len(l)//2] - 1) )/2 ) 

# FINAL ANSWER BELOW:
lst.sort()
print("\nThe Median of lst = ", lst[len(lst)//2] if (len(lst)%2 == 1) else ( (lst[len(lst)//2] + (lst[len(lst)//2] - 1) )/2 ) )


The Median of lst =  67


 <h1> <b><font color = coral>&#9998; <font color = coral>EXERCISE 8:</h1></font>

<font color = "coral">Write a function to calculate the mode of "lst". Do not use the built-in "mode" functions that Python offers.

In [376]:
lst = [4,6,3,2,6,8,9,7,23,4,465,7,6,8,454,5,876,567,54,76,34,55,
       33,7653,234234,7857,23432,4353,4,345,4667,23235,1212,221,
       335,2323,21,45,76,5432,54645645,123212245346342,67,34563,2]

#Your answer here
# [ lst.count(lst[i]) for i in range(len(lst)) ]
# To get ist element of a Tuple: Tup()[0]
d = {}
drev = []
d = { lst[i]:lst.count(lst[i]) for i in range(len(lst)) } # 1. Make a dictionary of key:count
# print(len(lst))
# print(len(d))
# print(d)
drev = [ (v,k) for k,v in sorted(d.items()) ] # 2. sort dictionary by key and make a list of tuples with (v,k)
# print("\nSorted v,k tuples is: ", drev)
drev.sort(reverse = True)    # 3. Sort tuples by value in reverse order
# print("\nTuples in ascending order by value: ", drev)
# print(drev[0][1])
print("The Mode of lst is : ", [ drev[i] for i in range(len(drev)) if (drev[i][0] == drev[0][0])  ] ) # 4. Print mode while the value is the same
# [ drev[i] for i in range(len(drev)) if (drev[i][0] == drev[0][0])  ]
# for i in range(len(drev)):
#     if (drev[i][0] == drev[0][0]):
#         print(drev[i])

The Mode of lst is :  [(3, 6), (3, 4)]


# Reading CSV file -- bayarea_home_prices data

In [58]:
"""
Dataset description
1) HomeID = Home ID number
2) HomeAge = Age of home in years
3) HomeSqft = Square footage of home
4) LotSize = LotSize
5) BedRooms = Num bedrooms as per county data
6) HighSchoolAPI = API for nearest high school
7) ProxFwy = Distance in miles to Freeway
8) CarGarage = Number of cars in garage; 0 = no garage
9) ZipCode = Postal zip code for the home
10)HomePriceK = Home price in $K (Target)
-------------------------------------------
9 X Variables; 1 Y variable (Target)
Data Points = 100

Data errors:
1) Few ZipCode have starting digit to be 8, it should be 9
2) Few HighSchoolApi scores have two digits, the ending digit 0 is missing
3) Few CarGarage numbers were entered as letter "l", it should be integer 1 
"""

'\nDataset description\n1) HomeID = Home ID number\n2) HomeAge = Age of home in years\n3) HomeSqft = Square footage of home\n4) LotSize = LotSize\n5) BedRooms = Num bedrooms as per county data\n6) HighSchoolAPI = API for nearest high school\n7) ProxFwy = Distance in miles to Freeway\n8) CarGarage = Number of cars in garage; 0 = no garage\n9) ZipCode = Postal zip code for the home\n10)HomePriceK = Home price in $K (Target)\n-------------------------------------------\n9 X Variables; 1 Y variable (Target)\nData Points = 100\n\nData errors:\n1) Few ZipCode have starting digit to be 8, it should be 9\n2) Few HighSchoolApi scores have two digits, the ending digit 0 is missing\n3) Few CarGarage numbers were entered as letter "l", it should be integer 1 \n'

In [158]:
## Reading csv files
def read_file(filename):
    file_open = open(filename,"r")
    data_array = []
    for line in iter(file_open):
        if "HomeID" in line:
            continue
        line_no_newline = line.rstrip() # delete white space characters
        line_split = line_no_newline.split(",")
        # print(line_split)
        data_array.append(line_split)
    return data_array

In [159]:
housing_data = read_file("bayarea_home_prices.csv")
print(housing_data[0:6])

[['1', '24', '1757', '6056', '2', '899', '3', '3', '94085', '894'], ['2', '10', '1563', '6085', '2', '959', '4', '3', '94085', '861'], ['3', '14', '1344', '6089', '2', '865', '4', '3', '94085', '831'], ['4', '14', '1215', '6129', '3', '959', '4', '2', '94085', '809'], ['5', '24', '1866', '6141', '3', '877', '4', '1', '94085', '890'], ['6', '18', '1589', '6148', '2', '920', '3', '0', '84085', '867']]


In [160]:
len_housing_data = len(housing_data)
print(len_housing_data)

100


In [163]:
list_HomeAge = []
# for all rows, extract only column 1   <--- REMEMBER THIS
for k in range(0,len_housing_data):
    list_HomeAge.append(housing_data[k][1])    

In [164]:
print(list_HomeAge) 
# they are still strings, cannot do numerical calculations with strings 

['24', '10', '14', '14', '24', '18', '13', '19', '17', '24', '12', '22', '15', '25', '10', '20', '23', '16', '10', '13', '17', '10', '15', '10', '21', '12', '13', '10', '17', '24', '10', '18', '11', '19', '12', '14', '13', '22', '22', '15', '23', '21', '17', '11', '15', '11', '21', '22', '12', '19', '19', '25', '23', '12', '10', '11', '11', '19', '22', '19', '13', '19', '25', '12', '14', '25', '24', '12', '21', '16', '19', '24', '25', '17', '14', '12', '17', '25', '17', '11', '18', '19', '24', '25', '22', '19', '18', '22', '21', '14', '16', '18', '25', '21', '13', '11', '10', '21', '19', '11']


In [165]:
# How to convert zipcodes from text to numbers
for k in range(0,len_housing_data):
    housing_data[k][8] = int(housing_data[k][8])  # convert to integer data type and over-write

In [166]:
print(housing_data[0:5]) # Zipcode is without quotes and not strings; they are now integers

[['1', '24', '1757', '6056', '2', '899', '3', '3', 94085, '894'], ['2', '10', '1563', '6085', '2', '959', '4', '3', 94085, '861'], ['3', '14', '1344', '6089', '2', '865', '4', '3', 94085, '831'], ['4', '14', '1215', '6129', '3', '959', '4', '2', 94085, '809'], ['5', '24', '1866', '6141', '3', '877', '4', '1', 94085, '890']]


In [167]:
## Reading csv files, how to fix errors in data, replace 84085 with 94085
def read_file_housing(filename):
    file_open = open(filename,"r")
    data_array = []
    for line in iter(file_open):
        if "HomeID" in line:
            continue
        line_no_newline = line.rstrip()
        line2 = line_no_newline.replace("84085","94085") # Can do .replace after ) to do another .replace on the same line 
        line_split = line2.split(",")
        data_array.append(line_split)
    return data_array

In [168]:
housing_data2 = read_file_housing("bayarea_home_prices.csv")
print(housing_data2[0:6])

[['1', '24', '1757', '6056', '2', '899', '3', '3', '94085', '894'], ['2', '10', '1563', '6085', '2', '959', '4', '3', '94085', '861'], ['3', '14', '1344', '6089', '2', '865', '4', '3', '94085', '831'], ['4', '14', '1215', '6129', '3', '959', '4', '2', '94085', '809'], ['5', '24', '1866', '6141', '3', '877', '4', '1', '94085', '890'], ['6', '18', '1589', '6148', '2', '920', '3', '0', '94085', '867']]


In [169]:
len_housing_data2 = len(housing_data2)
print(len_housing_data2)

100


In [170]:
list_ZipCode2 = []
# for all rows, extract all zipcodes
for k in range(0,len_housing_data2):
    list_ZipCode2.append(int(housing_data2[k][8]))  

In [171]:
print(list_ZipCode2) # Converted to numbers

[94085, 94085, 94085, 94085, 94085, 94085, 94085, 94085, 94085, 94085, 94085, 94085, 94085, 94085, 94085, 95051, 94085, 94085, 95051, 94085, 94085, 94085, 94085, 95051, 94085, 94085, 94085, 95051, 94085, 95051, 95051, 95051, 95051, 95051, 95051, 95051, 85051, 95051, 95051, 95051, 94087, 94087, 95051, 95051, 95051, 94087, 95051, 94087, 94087, 95051, 95051, 95051, 85051, 94087, 95051, 94087, 94087, 94087, 95051, 94087, 94087, 94087, 94087, 94087, 94087, 95014, 94087, 94087, 94087, 94087, 95014, 94087, 95014, 95014, 84087, 84087, 95014, 94087, 94087, 94087, 95014, 95014, 95014, 95014, 85014, 95014, 95014, 95014, 85014, 95014, 95014, 95014, 95014, 95014, 95014, 95014, 95014, 95014, 95014, 95014]


<h1> <b><font color = coral>&#9998; <font color = coral>EXERCISE 9:</h1></font>

In [368]:
# COMPLETED 
"""
The above example shows correcting 84085 -> 94085
Perform other zip code corrections: 
84087 -> 94087,
85014 -> 95014,
85051 -> 95051
Create a table for zip code distribution after corrections:
After:Zipcode,House_Count
94085,25
94087,25
95051,25
95014,25
"""
#Your answer here

def read_file_housingmy(filename):
    file_open = open(filename,"r")
    data_array = []
    for line in iter(file_open):
        if "HomeID" in line:
            continue
        line_no_newline = line.rstrip()
        line2 = line_no_newline.replace("84085","94085").replace("84087", "94087").replace("85014", "95014").replace("85051", "95051") # Can do .replace after ) to do another .replace on the same line 
        line_split = line2.split(",")
        data_array.append(line_split)
    return data_array

housing_datamy = read_file_housingmy("bayarea_home_prices.csv")
# print(type(housing_datamy[0][8]))

Zlist = [ housing_datamy[k][8] for k in range(0,len(housing_datamy)) ]
# print(Zlist)

print("Zipcode \t House_Count\n")
Slist = []
for i in range (len(Zlist)):
    if Zlist[i] not in Slist:
        Slist.append(Zlist[i])
        print(Zlist[i], "\t\t", Zlist.count(Zlist[i]))
        
#[ Zlist.count(Zlist[i]) for i in range (len(Zlist)) if Zlist[i] not in Slist Slist.append(Zlist[i]) ]

Zipcode 	 House_Count

94085 		 25
95051 		 25
94087 		 25
95014 		 25


<h1> <b><font color = coral>&#9998; <font color = coral>EXERCISE 10:</h1></font>

In [265]:
# MY NOTES. SOLUTION IN NEXT CELL - COMPLETED - SEE NEXT CELL
"""
Modify function read_file_housing to multiply incorrect SchoolAPI by 10.
Assume API value to be incorrect if it is a two digit number.
Calculate average School API by zipcode. Print the following:
Average_SchoolAPI,Cnt_of_homes,ZipCode
xyz,mn,abc
"""
#Your answer here
# [x+1 if x >= 5 else x+5 for x in l]
# def correct_API(t):
#     for i in range(t):
#         print( len(t[i][5]) )

# #correct_API(housing_datamy)
# [int(x[5])*10 if int(x[5])/100 < 1 else int(x[5]) for x in housing_datamy ]

#l = [['1', '24', '1757', '6056', '2', '89', '3', '3', '94085', '894'], ['2', '10', '1563', '6085', '2', '95', '4', '3', '94085', '861'], ['3', '14', '1344', '6089', '2', '865', '4', '3', '94085', '831'], ['4', '14', '1215', '6129', '3', '959', '4', '2', '94085', '809'], ['5', '24', '1866', '6141', '3', '877', '4', '1', '94085', '890'], ['6', '18', '1589', '6148', '2', '920', '3', '0', '94085', '867']]
#l.replace(int(x[5])*10 if int(x[5])/100 < 1 else int(x[5]) for x in housing_datamy)

# FINAL ANSWER BELOW
# [ housing_datamy for i in range(len(housing_datamy)) if int(housing_datamy[i][5])/100 < 1   housing_datamy[i][5] = (int(housing_datamy[i][5])*10)   ] # WHY??

# for i in range(len(housing_datamy)):
#     if int(housing_datamy[i][5])/100 < 1:
#         housing_datamy[i][5] = int(housing_datamy[i][5])*10
#     else:
#         continue
# print ([x[5] for x in housing_datamy]) # OBSERVE THERE ARE NO TWO DIGIT APIs

['899', '959', '865', '959', '877', '920', '959', '905', '884', 950, '931', '904', '872', '857', '884', '894', '965', '935', '853', '851', '922', '904', '851', '911', '900', '856', '966', '892', 900, '933', '918', '854', '959', '939', '974', '925', '960', '975', '942', '891', '917', '867', '913', '859', 960, '916', '901', '890', '889', '949', '948', '867', '857', '888', 890, '876', '950', '883', '953', '903', 900, '868', '928', '868', 850, '902', '857', '940', '962', '881', '889', '893', 850, '875', '940', '954', '852', '927', '851', '883', '894', '877', '941', '931', '861', '924', '881', '929', '881', '879', '941', 920, '942', '862', '856', '912', '942', '915', '857', '857']


In [373]:
# COMPLETED

for i in range(len(housing_datamy)):
    if int(housing_datamy[i][5])/100 < 1:
        housing_datamy[i][5] = int(housing_datamy[i][5])*10
    else:
        continue
print ([x[5] for x in housing_datamy]) # OBSERVE THERE ARE NO TWO DIGIT APIs

Ziplist = []
for i in range(len(housing_datamy)):
    z = housing_datamy[i][8]
    if z not in Ziplist:
        Ziplist.append(z)

print("\nZipcode\t\t Count of Homes \t Average API\n")
APINumZip =[]

for k in range(len(Ziplist)):
    myzip = Ziplist.pop(0)
    API =[]
    APIList = []
    for i in range(len(housing_datamy)):
        if( housing_datamy[i][8] == myzip):
            API.append(int(housing_datamy[i][5]))
        c = len(API)
    print(myzip, "\t\t", c, "\t\t\t", sum(API)/c)
# FOLLOWING SECTION IS NEEDED FOR NEXT EXERCISE
    APIList.append(myzip)
    APIList.append(c)
    APIList.append(sum(API)/c)
    APINumZip.append(APIList)

['899', '959', '865', '959', '877', '920', '959', '905', '884', 950, '931', '904', '872', '857', '884', '894', '965', '935', '853', '851', '922', '904', '851', '911', '900', '856', '966', '892', 900, '933', '918', '854', '959', '939', '974', '925', '960', '975', '942', '891', '917', '867', '913', '859', 960, '916', '901', '890', '889', '949', '948', '867', '857', '888', 890, '876', '950', '883', '953', '903', 900, '868', '928', '868', 850, '902', '857', '940', '962', '881', '889', '893', 850, '875', '940', '954', '852', '927', '851', '883', '894', '877', '941', '931', '861', '924', '881', '929', '881', '879', '941', 920, '942', '862', '856', '912', '942', '915', '857', '857']

Zipcode		 Count of Homes 	 Average API

94085 		 25 			 907.0
95051 		 25 			 916.68
94087 		 25 			 899.24
95014 		 25 			 894.8


<h1> <b><font color = coral>&#9998; <font color = coral>EXERCISE 11:</h1></font>

In [369]:
# COMPLETED - CHECK HOW TO DO IT IN SINGLE LINE
# a=[x.append('a') or x for x in a]
"""
Modify function read_file_housing to replace CarGarage value of 'l' with integer 1
Calculate and print the following:
Car_Garage,Cnt_of_homes
0,m
1,n
2,o
3,p
"""

# [x[7] for x in housing_datamy] # GIVES THE CAR GARAGE NUMBERS
# print([x[7] for x in housing_datamy].count('l')) # Gives the number of 'l's in the dataset
for i in range(len(housing_datamy)):
    if(housing_datamy[i][7] == 'l'):
        housing_datamy[i][7] = '1'

CarList = [ housing_datamy[k][7] for k in range(0,len(housing_datamy)) ]
#print(GarageList)

print("# Car_Garage \t Cnt_of_homes\n")
Glist = []
for i in range (len(CarList)):
    if CarList[i] not in Glist:
        Glist.append(CarList[i])
        print(CarList[i], "\t\t", CarList.count(CarList[i]))

# Car_Garage 	 Cnt_of_homes

3 		 32
2 		 19
1 		 18
0 		 31


<h1> <b><font color = coral>&#9998; <font color = coral>EXERCISE 12:</h1></font>

In [370]:
# COMPLETED
# a=[x.append('a') or x for x in a]
"""
Find the average price of a home in this four zip codes area.
Zipcode,Avg_Price,Cnt_of_homes
"""
Ziplist = []
for i in range(len(housing_datamy)):
    z = housing_datamy[i][8]
    if z not in Ziplist:
        Ziplist.append(z)

print("Zipcode\t\t Count of Homes \t Average Price($K)\n")
SmallList =[]

for k in range(len(Ziplist)):
    myzip = Ziplist.pop(0)
    Price =[]
    sl = []
    for i in range(len(housing_datamy)):
        if( housing_datamy[i][8] == myzip):
            Price.append(int(housing_datamy[i][9]))
        c = len(Price)
    print(myzip, "\t\t", c, "\t\t\t", sum(Price)/c)
# FOLLOWING SECTION IS NEEDED FOR NEXT EXERCISE
    sl.append(myzip)
    sl.append(c)
    sl.append(sum(Price)/c)
    SmallList.append(sl)

Zipcode		 Count of Homes 	 Average Price($K)

94085 		 25 			 885.96
95051 		 25 			 1023.2
94087 		 25 			 1151.48
95014 		 25 			 1263.32


<h1> <b><font color = coral>&#9998; <font color = coral>EXERCISE 13:</h1></font>

In [359]:
# COMPLETED
"""
Find the average price of a home in Sunnyvale (94087 and 94085).
Print the output as follows:
The average house price in Sunnyvale based on xx homes is $yyy (thousands).
"""
#print(SmallList)
NumHomes = 0
j = 0
P = []
for i in range(len(SmallList)):
    if (SmallList[i][0] == '94087') or (SmallList[i][0] == '94085'):
        NumHomes += SmallList[i][1]
        j += 1
        P.append(SmallList[i][2])
print( "The average house price in Sunnyvale based on", NumHomes, "homes is $", sum(P)/j, "(K)" )
#        print(SmallList[i], NumHomes, j, P, sum(P)/j)

The average house price in Sunnyvale based on 50 homes is $ 1018.72 (K)
