# Boston housing data
1.  CRIM      per capita crime rate by town
2.  ZN        proportion of residential land zoned for lots over 25,000 sq.ft.
3.  INDUS     proportion of non-retail business acres per town
4.  CHAS      Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
5.  NOX       nitric oxides concentration (parts per 10 million)
6.  RM        average number of rooms per dwelling
7.  AGE       proportion of owner-occupied units built prior to 1940
8.  DIS       weighted distances to five Boston employment centres
9.  RAD       index of accessibility to radial highways
10. TAX      full-value property-tax rate per $10,000
11. PTRATIO  pupil-teacher ratio by town
12. B        1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
13. LSTAT    % lower status of the population
14. MEDV     Median value of owner-occupied homes in $1000's

## Julia
- Install julia from  https://julialang.org/downloads/
- To add packages, switch to package mode by typing `]`, then `add <package>`
- Reopen vscode and select Julia kernel
- Can activate environment using `activate home_project` and this will create Project and Manifest files

In [2]:
# setup environment using toml files in current directory
cd(@__DIR__)
using Pkg
Pkg.activate(".")
Pkg.add("DataFrames")
Pkg.add("Plots")
Pkg.add("PyPlot")

#Pkg.add("StatsPlots")

[32m[1m  Activating[22m[39m environment at `~/ManningLiveProjects/julia/Project.toml`
[32m[1m    Updating[22m[39m registry at `~/.julia/registries/General`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/ManningLiveProjects/julia/Project.toml`
[32m[1m  No Changes[22m[39m to `~/ManningLiveProjects/julia/Manifest.toml`
[32m[1mPrecompiling[22m[39m project...
[32m  ✓ [39m[90mPlots[39m
[32m  ✓ [39mStatsPlots
  2 dependencies successfully precompiled in 75 seconds (172 already precompiled)
[32m[1m   Resolving[22m[39m package versions...
[32m[1m    Updating[22m[39m `~/ManningLiveProjects/julia/Project.toml`
 [90m [91a5bcdd] [39m[92m+ Plots v1.19.4[39m
[32m[1m  No Changes[22m[39m to `~/ManningLiveProjects/julia/Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m   Installed[22m[39m PyPlot ─ v2.9.0
[32m[1m   Installed[22m[39m PyCall ─ v1.92.3
[32m[1m    Updating[22m[39m `~/ManningL

1.1 Download dataset

In [2]:
using Downloads

function f()    
    if isfile("housing.data") == false
        Downloads.download("https://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data","./housing.data")
        Downloads.download("https://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.names","./housing.names")
    end
end

f()

1.2 Compute and verify hash of downloaded file


In [3]:
using SHA

expected = bytes2hex([0xad, 0xfa, 0x6b, 0x6d, 0xca,
0x24, 0xa6, 0x3f, 0xe1, 0x66,
0xa9, 0xe7, 0xfa, 0x01, 0xce,
0xe4, 0x33, 0x58, 0x57, 0xd1])

#open downloaded file and compute hash
open("housing.data") do f
    result = bytes2hex(sha1(f))
    println("hash match: ", result == expected, " = ", result)    
end

hash match: true = adfa6b6dca24a63fe166a9e7fa01cee4335857d1


1.3 Loading data into dataframe: https://github.com/bkamins/
Julia-DataFrames-Tutorial

In [4]:
# need to install DataFrames package for the following

using DataFrames, Random, DelimitedFiles

df = DataFrame(readdlm("housing.data", Float64), [:CRIM,:ZN,:INDUS,:CHAS,:NOX,:RM,:AGE,:DIS,:RAD,:TAX,:PTRATIO,:B,:LSTAT,:MEDV])



Unnamed: 0_level_0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD
Unnamed: 0_level_1,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64
1,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0
2,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0
3,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0
4,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0
5,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0
6,0.02985,0.0,2.18,0.0,0.458,6.43,58.7,6.0622,3.0
7,0.08829,12.5,7.87,0.0,0.524,6.012,66.6,5.5605,5.0
8,0.14455,12.5,7.87,0.0,0.524,6.172,96.1,5.9505,5.0
9,0.21124,12.5,7.87,0.0,0.524,5.631,100.0,6.0821,5.0
10,0.17004,12.5,7.87,0.0,0.524,6.004,85.9,6.5921,5.0


In [5]:
describe(df, :all)

Unnamed: 0_level_0,variable,mean,std,min,q25,median,q75,max,nunique
Unnamed: 0_level_1,Symbol,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Nothing
1,CRIM,3.61352,8.60155,0.00632,0.082045,0.25651,3.67708,88.9762,
2,ZN,11.3636,23.3225,0.0,0.0,0.0,12.5,100.0,
3,INDUS,11.1368,6.86035,0.46,5.19,9.69,18.1,27.74,
4,CHAS,0.06917,0.253994,0.0,0.0,0.0,0.0,1.0,
5,NOX,0.554695,0.115878,0.385,0.449,0.538,0.624,0.871,
6,RM,6.28463,0.702617,3.561,5.8855,6.2085,6.6235,8.78,
7,AGE,68.5749,28.1489,2.9,45.025,77.5,94.075,100.0,
8,DIS,3.79504,2.10571,1.1296,2.10018,3.20745,5.18843,12.1265,
9,RAD,9.54941,8.70726,1.0,4.0,5.0,24.0,24.0,
10,TAX,408.237,168.537,187.0,279.0,330.0,666.0,711.0,


1.4. Variable characterisitics
- Nominal variables
    - CHAS
- Continuous are the rest

In [6]:
Pkg.add("StatsPlots")
p = histogram(df)

[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/ManningLiveProjects/julia/Project.toml`
[32m[1m  No Changes[22m[39m to `~/ManningLiveProjects/julia/Manifest.toml`


LoadError: UndefVarError: histogram not defined