# CSCI2000U - Scientific Data Analysis
## Tutorial 04: Functional Programming 

**Goal**
1. Applying functional analysis of the Boston housing dataset

In this tutorial we will explore a the Boston housing dataset. You are given the code that loads the `json` file with the data. After loading the data, solve each of the tasks outlined in this document to complete you tutorial assignment.

*Please note that you need to upload the dataset/json file into your jupyter notebook server and modify the path depending on the location where the file was uploaded.*

- Importing libraries

In [None]:
import json
from functools import reduce

- loading the dataset into `data` from the `path`

In [None]:
path = 'boston_housing.json'
with open(path) as f:
    data = json.load(f)

- exploring the keys of `data`

In [None]:
data.keys()

- Accessing the data field `descr` which shows the name and description of each of the data columns. We can refer to them as *attributes*.

In [None]:
data['descr']

- accessing the data field `rows`

In [None]:
rows = data['rows']
len(rows)

- To access a data field/column of a given row, we can use 
```
rows[<row index>][<attribute name>]
``` 
See example:

In [None]:
rows[0]['PRICE']

### TASK 1

**Compute the average price of all houses**

 We will use reduce because we are converting a list of elements into a single number

 We want to compute the total first, and divide by `len(rows)`. Store the result in `average_price`.

In [2]:
# Your solution
# n(rows)
import json
from functools import reduce

path = 'boston_housing.json'
with open(path) as f:
    data = json.load(f)

rows = data['rows']
def get_prices(x, y):
    return x + y['PRICE']
    
total = reduce(get_prices, rows, 0)

average_price = total / len(rows)

In [3]:
print("AVERAGE PRICE: ", average_price)

AVERAGE PRICE:  22.532806324110698


### TASK 2
**Find the houses that are *under* the average price.**

We should create a predicate function to the test if the price is under `average_price`.

Then, display the first 3 results as a list using `filter`

In [4]:
# 
# Find the houses that are *under* the average price.ce
def is_under_avg(n):
    return n['PRICE'] < average_price

under_avg_houses = list(filter(is_under_avg, rows))
    

In [5]:
# List three houses that are UNDER the average price.
print(under_avg_houses[:3])

[{'CRIM': 0.02731, 'ZN': 0.0, 'INDUS': 7.07, 'CHAS': 0.0, 'NOX': 0.469, 'RM': 6.421, 'AGE': 78.9, 'DIS': 4.9671, 'RAD': 2.0, 'TAX': 242.0, 'PTRATIO': 17.8, 'B': 396.9, 'LSTAT': 9.14, 'PRICE': 21.6}, {'CRIM': 0.21124, 'ZN': 12.5, 'INDUS': 7.87, 'CHAS': 0.0, 'NOX': 0.524, 'RM': 5.631, 'AGE': 100.0, 'DIS': 6.0821, 'RAD': 5.0, 'TAX': 311.0, 'PTRATIO': 15.2, 'B': 386.63, 'LSTAT': 29.93, 'PRICE': 16.5}, {'CRIM': 0.17004, 'ZN': 12.5, 'INDUS': 7.87, 'CHAS': 0.0, 'NOX': 0.524, 'RM': 6.004, 'AGE': 85.9, 'DIS': 6.5921, 'RAD': 5.0, 'TAX': 311.0, 'PTRATIO': 15.2, 'B': 386.71, 'LSTAT': 17.1, 'PRICE': 18.9}]


### TASK 3
**Find the houses with prices *between* `average_price-std_dev` and `average_price-std_dev` where `std_dev` is the standand deviation.**

>standard deviation = square root of `variance`

> variance = [sum of (`house["PRICE"]` - `average_price`) ** 2] / `len(houses)`

Strategy
1. We will use `reduce` because we are converting a list of elements into a single number `sum_variance`.
2. We want to compute `sum_variance` first, and divide it by `len(rows)`. Store the result in `variance_price`.
3. We can use the `variance_price` to calculate the `std_dev`.
4. We should create a predicate function to the test if the price is between `average_price` more or less the `std_dev`.

Then, display the first 3 results as a list using `filter`.

In [12]:
# Calculate the varaice / std_dev
import math
def sum_fn(x, y):
    return x + (y['PRICE'] - average_price)**2
    
sum_variance = (reduce(sum_fn, rows, 0))
variance_price = sum_variance / len(rows)
std_dev = math.sqrt(variance_price)
print(std_dev)

9.188011545278206


In [13]:
# Find house prices between average_price -/+ std_dev
def is_within_std_dev(n):
    return (n['PRICE'] < (average_price + std_dev) and n['PRICE'] > (average_price - std_dev))

In [14]:
# List first 3 houses with prices between average_price-std_dev and average_price+std_dev
average_houses = list(filter(is_within_std_dev, rows))
print(average_houses[:3])

[{'CRIM': 0.00632, 'ZN': 18.0, 'INDUS': 2.31, 'CHAS': 0.0, 'NOX': 0.538, 'RM': 6.575, 'AGE': 65.2, 'DIS': 4.09, 'RAD': 1.0, 'TAX': 296.0, 'PTRATIO': 15.3, 'B': 396.9, 'LSTAT': 4.98, 'PRICE': 24.0}, {'CRIM': 0.02731, 'ZN': 0.0, 'INDUS': 7.07, 'CHAS': 0.0, 'NOX': 0.469, 'RM': 6.421, 'AGE': 78.9, 'DIS': 4.9671, 'RAD': 2.0, 'TAX': 242.0, 'PTRATIO': 17.8, 'B': 396.9, 'LSTAT': 9.14, 'PRICE': 21.6}, {'CRIM': 0.02985, 'ZN': 0.0, 'INDUS': 2.18, 'CHAS': 0.0, 'NOX': 0.458, 'RM': 6.43, 'AGE': 58.7, 'DIS': 6.0622, 'RAD': 3.0, 'TAX': 222.0, 'PTRATIO': 18.7, 'B': 394.12, 'LSTAT': 5.21, 'PRICE': 28.7}]


### TASK 4

**Generate a report of CRIME and PRICE for the 297 houses that are below average.**

The `houses` should be sorted by `PRICE`. 

The result should be reported using the format: `"CRIME: %.2f, ROOMS: %d, PRICE: %.2f"`

Strategy:
 - Use filter from **Task 2** to get the `houses` below average.
 - Sorted the `houses` using 
<!--          ``` -->
         sorted(<iterable>, key=<key>)
<!--          ``` -->
 - Use `map` to map each of the houses to a report message.
 
 Reference for `sorted` function: https://www.w3schools.com/python/ref_func_sorted.asp

In [25]:
# report message
report = "CRIME: %.2f, ROOMS: %d, PRICE: %.2f"

In [28]:
# 1. Get the list of houses
def is_under_avg(n):
    return n['PRICE'] < average_price

under_avg_houses = list(filter(is_under_avg, rows))

# 2. Sort by their price
def sort_by_price(n):
    return n['PRICE']

price_sorted = sorted(under_avg_houses, key=sort_by_price)

# 3. Generate the report of the houses souses))
def get_report(n):
    return report % (n['CRIM'], n['RM'], n['PRICE'])
    
houses_w_reports = map(get_report, price_sorted)
print(list(houses_w_reports))

['CRIME: 38.35, ROOMS: 5, PRICE: 5.00', 'CRIME: 67.92, ROOMS: 5, PRICE: 5.00', 'CRIME: 25.05, ROOMS: 5, PRICE: 5.60', 'CRIME: 9.92, ROOMS: 5, PRICE: 6.30', 'CRIME: 45.75, ROOMS: 4, PRICE: 7.00', 'CRIME: 0.18, ROOMS: 5, PRICE: 7.00', 'CRIME: 16.81, ROOMS: 5, PRICE: 7.20', 'CRIME: 14.24, ROOMS: 6, PRICE: 7.20', 'CRIME: 18.08, ROOMS: 6, PRICE: 7.20', 'CRIME: 22.60, ROOMS: 5, PRICE: 7.40', 'CRIME: 10.83, ROOMS: 6, PRICE: 7.50', 'CRIME: 0.21, ROOMS: 5, PRICE: 8.10', 'CRIME: 24.80, ROOMS: 5, PRICE: 8.30', 'CRIME: 15.86, ROOMS: 5, PRICE: 8.30', 'CRIME: 11.81, ROOMS: 6, PRICE: 8.40', 'CRIME: 13.68, ROOMS: 5, PRICE: 8.40', 'CRIME: 7.67, ROOMS: 5, PRICE: 8.50', 'CRIME: 41.53, ROOMS: 5, PRICE: 8.50', 'CRIME: 15.18, ROOMS: 6, PRICE: 8.70', 'CRIME: 20.08, ROOMS: 4, PRICE: 8.80', 'CRIME: 73.53, ROOMS: 5, PRICE: 8.80', 'CRIME: 9.34, ROOMS: 6, PRICE: 9.50', 'CRIME: 14.42, ROOMS: 6, PRICE: 9.60', 'CRIME: 11.58, ROOMS: 5, PRICE: 9.70', 'CRIME: 17.87, ROOMS: 6, PRICE: 10.20', 'CRIME: 14.33, ROOMS: 4, PRI

## Tutorial Report

At the **end of this tutorial session**, you will deliver a report via Canvas. 

Your report will be the compiled version of this notebook with your solution. You **MUST** submit:
- both the `ipynb` and `PDF` (`File/Download as>PDF`) versions of this notebook.
- both named `<lastname-firstname>-tutorial04`
- Contain your Full name and student ID


*Late tutorial submission policy:*
- All tutorial reports are due at the **end of your tutorial session**.
- Late tutorial reports will be accepted without penalty by (before) your next tutorial session, **no late reports will be accepted after**.

*TA grading and feedback inquiries*
- Your report grades will be posted via Canvas using the rubric provided by the instructor. You are encouraged to ask your TA on MS teams about feedback, as needed, as soon as your grades are published.
