# Week 11

This week's topics:
* More OOP
* Advanced Files, CSVs, and Modules

## Question 1

Create a class named `DecimalConvert`. The object should have an attribute  called `decimal`.  Then, write a class method (call it `convert`) to convert the positive base-10 integer (the `decimal` attribute) to binary. The class method should return a string of 0s and 1s corresponding to the binary value of the value of the integer. You may have to do a bit of research to figure out how to convert a base-10 to integer to binary. The class method should just retrun `None` if the input is not a positive integer.

Then outside the class definition, write a program to prompt the user for input, then create an instance of `DecimalConvert` to convert the user input to binary, and print the results. 

In [16]:
class DecimalConvert:
    def __init__(self, n):
        self.decimal1 = n

    def convert(self):
        bStr = ''
        if self.decimal1 < 0:
            return 'Must be a positive integer'
        
        temporary_variable = self.decimal1 # avoid modifying the attribute
        
        while temporary_variable > 0:
            bStr += str(temporary_variable % 2)
            temporary_variable //= 2
        
        return bStr[::-1] # return the reversed string

### Execution

In [17]:
test_decimal = DecimalConvert(25)
print(test_decimal.convert())

11001


## Question 2

If you want to buy a house, you need to find houses that are for sale. The multiple listing service (MLS) is a real estate listings service in Toronto that is updated throughout the day with new listings. The service has limited capabilities with housing listed by district, rather than particular streets. It would be much more convenient for the user to be able to input a list of streets and see all the houses for sale on those streets. You goal is to take the current listings and create such an algorithm.

You have basic MLS real estate data in a CSV file called `real_estate.csv`. Write an algorithm that does the following:
```
    Enter a street name (type exit when done): wells
    Enter a street name (type exit when done): perth
    Enter a street name (type exit when done): exit
    
    Houses on wells
    No houses on wells
    Houses on perth
    
    Address: 436 perth Size: 900 Price: $ 479900
    Address: 115 perth Size: 1100 Price: $ 699900
    Address: 516 perth Size: 800 Price: $ 498900
    Address: 288 perth Size: 1300 Price: $ 699000
    Address: 179 perth Size: 1100 Price: $ 699000
```

In other words, you want the user to input a number of street names and you should print out the information on available houses on each street.

A very high-level Algorithm Plan could be the following:
* Read in the data and store it somehow
* Get the user input and store it somehow
* For each street entered, find houses on the street
* Print-out the address, size, and price of each house found

You should try to write your algo in two ways:
* Store the real estate data in a list.
* Store the real estate data in a dictionary where the key is the street name and the value is a list of the data for each house.

In [18]:
import csv

### Using a list

In [19]:
def print_house(house):
    # display showing house address, size and price
    # Entries are: number, street, type, size, floors, bedrooms, bathrooms, 
    #              lot-size, parking, facing, age, taxes, price

    print("Address:", house[0], house[1], "Size:",
          house[3], "Price:", house[-1])


def get_MLS_data(filename):
    '''
    (str)->list of lists of string
    Opens <filename> as a CSV file, reads in each row and returns the list of rows
    '''
    
    # read in the database
    MLS_data = [] 
    with open(filename, 'r') as csvfile:
        real_estate_reader = csv.reader(csvfile)

        for row in real_estate_reader:
            MLS_data.append(row)
            
    return MLS_data

def get_street_queries():
    '''
    None -> set of strings
    Prompts user to enter street names to query database
    '''
    street_set = set()

    street = input("Enter a street name (type exit to end): ")
    while street != "exit":
        street_set.add(street)
        street = input("Enter a street name (type exit to end): ")
        
    return street_set

def process_queries(streets, MLS):
    '''
    (set of str, lists of list) -> None
    Looks up each entry in streets in MLS and prints the house info or an error message
    '''
    MLS_street_index = 1
    for street in street_set:
        print("Houses on", street)  
        found_house = False        # this is the "flag"
        for house in MLS_data:
            if house[MLS_street_index] == street:
                print_house(house)
                found_house = True
                
        if not found_house:  # if the flag hasn't been reset, then we didn't find a house
            print("No houses on", street)

### Execution

In [20]:
# Read in MLS data and convert to dictionary
MLS_data = get_MLS_data("real_estate.csv")

# Get the streets from the user and store them in a set
street_set = get_street_queries()

# Run the queries on the MLS database
process_queries(street_set, MLS_data) 

Houses on wells
No houses on wells
Houses on perth
Address: 436 perth Size: 900 Price: 479900
Address: 115 perth Size: 1100 Price: 699900
Address: 516 perth Size: 800 Price: 498900
Address: 288 perth Size: 1300 Price: 699000
Address: 179 perth Size: 1100 Price: 699000


### Using a dictionary

In [21]:
def print_house(house):
    # display showing house address, size and price
    # Entries are: number, street, type, size, floors, bedrooms, bathrooms, 
    #              lot-size, parking, facing, age, taxes, price

    print("Address:", house[0], house[1], "Size:",
          house[3], "Price:", house[-1])

def MLS_to_dict(MLS_list):
    """ (list of lists) -> (dict of list of lists)
    creates a dict from MLS_list with key being the street name
    and the value being the list of lists of data for each house
    on the street
    """  

    house_dict = {}
    for house in MLS_list:
        if house[1] in house_dict:
            house_dict[house[1]].append(house)
        else:
            house_dict[house[1]] = [house]
            
    return house_dict

def get_MLS_data(filename):
    '''
    (str)->list of lists of string
    Opens <filename> as a CSV file, reads in each row and returns the list of rows
    '''
    
    # read in the database
    MLS_data = [] 
    with open(filename, 'r') as csvfile:
        real_estate_reader = csv.reader(csvfile)

        for row in real_estate_reader:
            MLS_data.append(row)
            
    return MLS_data

def get_street_queries():
    '''
    None -> set of strings
    Prompts user to enter street names to query database
    '''
    street_set = set()

    street = input("Enter a street name (type exit to end): ")
    while street != "exit":
        street_set.add(street)
        street = input("Enter a street name (type exit to end): ")
        
    return street_set

def process_queries(streets, MLS):
    '''
    (set of str, dictionary of lists of list) -> None
    Looks up each entry in streets in MLS and prints the house info or an error message
    '''
    for street in streets:
        if street in MLS:
            print("Houses on", street)
            for house in MLS[street]:
                print_house(house)
        else:
            print("No houses on", street)

### Execution

In [22]:
# Read in MLS data and convert to dictionary
MLS_data = get_MLS_data("real_estate.csv")
MLS_dict = MLS_to_dict(MLS_data[1:])

# Get the streets from the user and store them in a set
street_set = get_street_queries()

# Run the queries on the MLS database
process_queries(street_set, MLS_dict) 

No houses on wells
Houses on perth
Address: 436 perth Size: 900 Price: 479900
Address: 115 perth Size: 1100 Price: 699900
Address: 516 perth Size: 800 Price: 498900
Address: 288 perth Size: 1300 Price: 699000
Address: 179 perth Size: 1100 Price: 699000


## Question 3

In the real estate problem above, it is more natural to represent each house as an object
(of class House) and have each of the pieces of data represented by attributes (i.e., a street
number attribute, a street attribute, etc.).
Redo the Q2 in an object-oriented way. Start with the code you already have and
* create a `House` object
* write an `__init__` function that accepts one entry from the MLS_list and fills in the attributes appropriately
*  edit the functions you wrote above to deal with this new way of representing the houses. For example, rather than setting up a dictionary (indexed by street name) whose values are the lists of dictionaries, you should set-up a dictionary whose values are lists of `House` objects.
*  Bonus question: research the `__str__` function and then write one for the `House` object to print out the information about the house.

In [23]:
class House:
  
  def __init__(self, MLS_entry):
    '''(House, list) -> None
    Create a House object representing all the data in the MLS_entry
    '''
    self.number = MLS_entry[0]
    self.street = MLS_entry[1]
    self.type = MLS_entry[2]
    self.size = MLS_entry[3]
    self.floors = MLS_entry[4]
    self.bedrooms = MLS_entry[5]
    self.bathrooms = MLS_entry[6]
    self.lot = MLS_entry[7]
    self.parking = MLS_entry[8]
    self.facing = MLS_entry[9]
    self.age = MLS_entry[10]
    self.taxes = MLS_entry[11]
    self.price = MLS_entry[12]

  def __str__(self):
    '''(House) -> str
    Returns string showing house address, size and price
    '''
    return 'Address: ' + str(self.number) + ' ' + str(house.street) + \
           ' Size: ' + str(house.size) + ' Price: $' + str(house.price)
  
  
def get_MLS_data(filename):
  '''
  (str)->list of lists of string
  Opens <filename> as a CSV file, reads in each row and returns the list of rows
  '''
  
  # read in the database
  MLS_data = [] 
  with open(filename, 'r') as csvfile:
    real_estate_reader = csv.reader(csvfile)
    
    for row in real_estate_reader:
      MLS_data.append(row)
      
  return MLS_data
    
def MLS_to_Houses(MLS_list):
  """ (list of lists) -> (list of House)
  create new list where each entry is a House object.
  """  
  
  house_list = []

  # for each house, create an entry (of what type?) in house_list
  # need code here
  for house in MLS_list:
        
    # create a Houuse object and add it to the list
    house_list.append(House(house))

  return house_list

def create_street_searchable_houses(house_list):
  """ (list of House) -> (dictionary of list of House)
  create new dictionary indexed by street name where each entry is a 
  list Houses on that street.
  """  
  
  # create a new dictionary with street as key and a list of houses
  # as the value 
  street_dict = {}
  
  # search through all dictionary items and organize by key field
  for house in house_list:
    if house.street not in street_dict:
      # A new street that we haven't yet used as a key - create the entry 
      street_dict[house.street] = [house]
    else:
      # If the street already exists, append the new house onto the 
      # corresponding list
      street_dict[house.street].append(house)
        
  return street_dict


### Execution

In [24]:
# Convert MLS list to a list of houses 
MLS_data = get_MLS_data("real_estate.csv")
house_list = MLS_to_Houses(MLS_data[1:])

# Convert list of houses into dictionary indexed by street
houses_by_street = create_street_searchable_houses(house_list)

# Prompt user for list of streets and store them in  a list
street_list = []
done = False
while(not done):
  street = input('Enter a street name (type exit when done): ')
  if street !=  'exit':
    street_list.append(street)
  else:
    done = True

# print out the info for each house in the streets indicated by the user
for street in street_list:
  print("\nHouses on", street)
  if street in houses_by_street.keys():
    for house in houses_by_street[street]:
      print(house)
  else:
    print('No houses on', street)


Houses on wells
No houses on wells

Houses on perth
Address: 436 perth Size: 900 Price: $479900
Address: 115 perth Size: 1100 Price: $699900
Address: 516 perth Size: 800 Price: $498900
Address: 288 perth Size: 1300 Price: $699000
Address: 179 perth Size: 1100 Price: $699000


## Question 4
This is a problem that arose a few years ago in the processing of the marks for Midterm 1 because Blackboard (the equivalent of Quercus that we were using at the time) and Crowdmark did not talk to each other perfectly.

You are given two CSV files: A Blackboard file (`bb.csv`) containing an entry for each student and a Crowdmark file (`crowdmarks.csv`) also containing a set of entries for each student. Both contain the mark for Midterm #1 (see the files) but unfortunately, due to some system mis-matches:
1. Not all students in the Blackboard file are in the Crowdmark file, and
2. Some of the marks for Midterm #1 do not match.
You need to write code to read in both files and print out the id# of each student who is not in the Crowdmark file and each student whose marks do not match. For example, the beginning of your output may look like:
```
    Error: Mark mismatch for 277ccd0d-0efb-40c5-a6a2-22cab7601823 41.0 2.0
    Error: Mark mismatch for b8ca83f6-a4e7-41e8-97d9-85588fb4108f 43.0 9.0
    Error: Mark mismatch for 48755b18-6f1b-4c4c-8ff2-420dd1be402c 34.0 38.0
    Error: Mark mismatch for 67a42c7b-18cd-4762-86e6-25b6ee3cd1c7 38.0 2.0
    Error: Mark mismatch for 62a597fa-ef5e-42d0-9475-843be3c6b473 33.0 32.0
    Error: Mark mismatch for c2c950af-ddf4-47ef-978a-c125e96b529e 9.0 37.0
    67fe5434-2eb9-41ec-b977-f508bd819f22 not in Crowdmark file.
    Error: Mark mismatch for 66eb7226-7582-4f52-b366-15351abb53c6 24.0 33.0
```

The order of the students in each file may be different.
 
You probably want to define and implement a number of functions. Try to write clear and understandable code.

Need a hint? Post some ideas to piazza and we'll give you some hints.

In [25]:
def load_bb_file(bb_file):
    ''' (str) -> dictionary {id : mark}
    Return a dictionary containing a subset of the the elements of 
    the CSV file bb_file indexed by row[id_index]
    '''
    
    id_index = 1
    mark_index = 3
    
    print("*** Processing", bb_file)
    
    database = {}
    with open(bb_file, 'r') as csvfile:
        grades_reader = csv.reader(csvfile)
    
        for row in grades_reader:  
            if row[0] != "Last Name": # skip header
                
                # get the two fields that we want and stick them into the dictionary
                id = row[id_index]
                mark = float(row[mark_index])
                #print(id, mark)
                
                database[id] = mark
    
    print("*** Done processing", bb_file)
    return database

def load_crowdmark_file(crowdmark_file):
    ''' (str) -> dictionary {id : mark}
    Return a dictionary containing a subset of the the elements of 
    the CSV file crowdmark_file indexed by row[id_index]
    '''
    
    print("*** Processing", crowdmark_file)
    
    id_index = 2
    mark_index = 11
    
    database = {}
    with open(crowdmark_file, 'r') as csvfile:
        grades_reader = csv.reader(csvfile)
    
        for row in grades_reader:  
            if row[0] != "Crowdmark ID": # skip header

                # get the two fields that we want and stick them into the dictionary
                id = row[id_index]
                mark = float(row[mark_index])
                #print(id, mark)
                
                database[id] = mark
    
    print("*** Done processing", crowdmark_file)
    return database

### Execution

In [26]:
# read in the two files 
crowdmarks = load_crowdmark_file("crowdmarks.csv")
bb_marks = load_bb_file("bb.csv")

for student in bb_marks.keys():
    # for each students in bb_marks, check existence and mark accuracy in the
    # crowdmarks file
    if student not in crowdmarks:
        print(student, "not in Crowdmark file.")
    elif bb_marks[student] != crowdmarks[student]:
        print("Error: Mark mismatch for", student, bb_marks[student], crowdmarks[student])

*** Processing crowdmarks.csv
*** Done processing crowdmarks.csv
*** Processing bb.csv
*** Done processing bb.csv
Error: Mark mismatch for 1aa40c83-e1c6-4cc5-9c9c-4c234b7f6a75 34.0 23.0
Error: Mark mismatch for f734ae2e-8b37-4218-8e76-b247bfdbfa3e 44.0 34.0
Error: Mark mismatch for 8fd01d64-de14-482b-bf5a-a51d4dcd3f62 23.0 4.0
Error: Mark mismatch for fb207a9a-d9af-4cb2-9752-918e68920b42 28.0 12.0
Error: Mark mismatch for 765adfe8-e8da-4e6c-b4d6-f0f34cb9be15 19.0 35.0
Error: Mark mismatch for 64ab0233-0044-44bf-8295-7a8dfb924261 0.0 20.0
Error: Mark mismatch for 83a45bca-c9a3-4bd5-8cb9-23d575162623 17.0 5.0
Error: Mark mismatch for 5db0c84a-9272-4601-a713-2451e788d7d1 39.0 14.0
Error: Mark mismatch for 8c0b8bb6-946c-4b75-9f8a-c3295d833d2f 1.0 24.0
Error: Mark mismatch for 63efdd06-e46b-43a2-8fc9-167efe93ec60 25.0 3.0
Error: Mark mismatch for 60963b14-a01a-409a-942e-9af2d642a82a 42.0 8.0
Error: Mark mismatch for 386ebc92-9773-473a-a590-f434c6b60f77 27.0 21.0
Error: Mark mismatch for b8c