<a href="https://colab.research.google.com/github/Harsh-0-7/2nd-repo/blob/master/The_Boston_Housing_Dataset.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The Boston Housing Dataset

The Boston Housing Dataset is a derived from information collected by the U.S. Census Service concerning housing in the area of [Boston MA](http://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html). The following describes the dataset columns:

*  CRIM - per capita crime rate by town
*  ZN - proportion of residential land zoned for lots over 25,000 sq.ft.
*  INDUS - proportion of non-retail business acres per town.
*  CHAS - Charles River dummy variable (1 if tract bounds river; 0 otherwise)
*  NOX - nitric oxides concentration (parts per 10 million)
*  RM - average number of rooms per dwelling
*  AGE - proportion of owner-occupied units built prior to 1940
*  DIS - weighted distances to five Boston employment centres
*  RAD - index of accessibility to radial highways
*  TAX - full-value property-tax rate per \$10,000
*  PTRATIO - pupil-teacher ratio by town
*  B - 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
*  LSTAT - % lower status of the population
*  MEDV - Median value of owner-occupied homes in \$1000's [Target Variable]



In [115]:
from google.colab import files
uploaded = files.upload()

In [116]:
import pandas as pd
import numpy as np
import torch
import torch.nn as nn


In [117]:
dataset=pd.read_csv('housing.csv', delim_whitespace=' ', header=None,dtype='float32')

In [118]:
dataset.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.199997,4.09,1.0,296.0,15.3,396.899994,4.98,24.0
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.900002,4.9671,2.0,242.0,17.799999,396.899994,9.14,21.6
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.099998,4.9671,2.0,242.0,17.799999,392.829987,4.03,34.700001
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.799999,6.0622,3.0,222.0,18.700001,394.630005,2.94,33.400002
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.200001,6.0622,3.0,222.0,18.700001,396.899994,5.33,36.200001


In [119]:
dataset.describe()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13
count,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0
mean,3.613523,11.363636,11.136797,0.06917,0.554696,6.284636,68.574921,3.795043,9.549407,408.237152,18.455584,356.674561,12.653064,22.532806
std,8.601545,23.32239,6.860355,0.253993,0.115878,0.702617,28.148869,2.105711,8.707269,168.53717,2.164946,91.294838,7.141063,9.197104
min,0.00632,0.0,0.46,0.0,0.385,3.561,2.9,1.1296,1.0,187.0,12.6,0.32,1.73,5.0
25%,0.082045,0.0,5.19,0.0,0.449,5.8855,45.025,2.100175,4.0,279.0,17.4,375.377487,6.95,17.025
50%,0.25651,0.0,9.69,0.0,0.538,6.2085,77.5,3.20745,5.0,330.0,19.05,391.440002,11.36,21.200001
75%,3.677083,12.5,18.1,0.0,0.624,6.6235,94.074999,5.188425,24.0,666.0,20.200001,396.225006,16.954999,25.0
max,88.976196,100.0,27.74,1.0,0.871,8.78,100.0,12.1265,24.0,711.0,22.0,396.899994,37.970001,50.0


In [120]:
dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 506 entries, 0 to 505
Data columns (total 14 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   0       506 non-null    float32
 1   1       506 non-null    float32
 2   2       506 non-null    float32
 3   3       506 non-null    float32
 4   4       506 non-null    float32
 5   5       506 non-null    float32
 6   6       506 non-null    float32
 7   7       506 non-null    float32
 8   8       506 non-null    float32
 9   9       506 non-null    float32
 10  10      506 non-null    float32
 11  11      506 non-null    float32
 12  12      506 non-null    float32
 13  13      506 non-null    float32
dtypes: float32(14)
memory usage: 27.8 KB


In [128]:
input=dataset.iloc[:,:dataset.shape[1]-2]
output=dataset.iloc[:,dataset.shape[1]-1]
print(input.head())
print(output.head())

        0     1     2    3      4   ...      7    8      9          10          11
0  0.00632  18.0  2.31  0.0  0.538  ...  4.0900  1.0  296.0  15.300000  396.899994
1  0.02731   0.0  7.07  0.0  0.469  ...  4.9671  2.0  242.0  17.799999  396.899994
2  0.02729   0.0  7.07  0.0  0.469  ...  4.9671  2.0  242.0  17.799999  392.829987
3  0.03237   0.0  2.18  0.0  0.458  ...  6.0622  3.0  222.0  18.700001  394.630005
4  0.06905   0.0  2.18  0.0  0.458  ...  6.0622  3.0  222.0  18.700001  396.899994

[5 rows x 12 columns]
0    24.000000
1    21.600000
2    34.700001
3    33.400002
4    36.200001
Name: 13, dtype: float32


In [122]:
from torch.utils.data import TensorDataset

In [127]:
input=torch.tensor(input)
output=torch.tensor(output)

train_ds = TensorDataset(input, output)

KeyError: ignored

In [None]:
from torch.utils.data import DataLoader
batch_size = 5
train_dl = DataLoader(train_ds, batch_size, shuffle=True)

In [None]:
model = nn.Linear(input.shape[0],input.shape[1])
print(model.weight)
print(model.bias)

In [None]:
list(model.parameters())

In [None]:
import torch.nn.functional as F
loss_fn = F.mse_loss

In [None]:
print(output)
#loss = loss_fn(model(input), output)
#print(loss)