# **Description of Dataset**

1. Title: Boston Housing Data

2. Sources:
   (a) Origin:  This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University.
   (b) Creator:  Harrison, D. and Rubinfeld, D.L. 'Hedonic prices and the demand for clean air', J. Environ. Economics & Management, vol.5, 81-102, 1978.
   (c) Date: July 7, 1993

3. Past Usage:
   -   Used in Belsley, Kuh & Welsch, 'Regression diagnostics ...', Wiley, 
       1980.   N.B. Various transformations are used in the table on
       pages 244-261.
    -  Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning.
       In Proceedings on the Tenth International Conference of Machine 
       Learning, 236-243, University of Massachusetts, Amherst. Morgan
       Kaufmann.

4. Relevant Information:

   Concerns housing values in suburbs of Boston.

5. Number of Instances: 506

6. Number of Attributes: 13 continuous attributes (including "class" attribute "MEDV"), 1 binary-valued attribute.

7. Attribute Information:

    1. CRIM-      per capita crime rate by town
    2. ZN-        proportion of residential land zoned for lots over 25,000 sq.ft.
    3. INDUS-     proportion of non-retail business acres per town
    4. CHAS-      Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
    5. NOX-       nitric oxides concentration (parts per 10 million)
    6. RM-        average number of rooms per dwelling
    7. AGE-       proportion of owner-occupied units built prior to 1940
    8. DIS-       weighted distances to five Boston employment centres
    9. RAD-       index of accessibility to radial highways
    10. TAX-      full-value property-tax rate per $10,000
    11. PTRATIO-  pupil-teacher ratio by town
    12. B-        1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
    13. LSTAT-    % lower status of the population
    14. MEDV-     Median value of owner-occupied homes in $1000's

8. Missing Attribute Values:  None.





# **Importing Libraries**

In [2]:
import pandas as pd
import matplotlib.pyplot as plt

# **Uploading Data**

In [3]:
from sklearn.datasets import load_boston
boston = load_boston()

# **Transforming Data to Dataframe**

In [4]:
df_x = pd.DataFrame(data = boston.data, columns = boston.feature_names)
df_y = pd.DataFrame(data = boston.target, columns = ["Price per $1000"])

In [5]:
df = pd.concat([df_x, df_y],ignore_index=True,axis = 1)

In [6]:
df.columns = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD',
       'TAX', 'PTRATIO', 'B', 'LSTAT','Price per $1000']

In [7]:
df

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,Price per $1000
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.0900,1.0,296.0,15.3,396.90,4.98,24.0
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.90,9.14,21.6
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.90,5.33,36.2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
501,0.06263,0.0,11.93,0.0,0.573,6.593,69.1,2.4786,1.0,273.0,21.0,391.99,9.67,22.4
502,0.04527,0.0,11.93,0.0,0.573,6.120,76.7,2.2875,1.0,273.0,21.0,396.90,9.08,20.6
503,0.06076,0.0,11.93,0.0,0.573,6.976,91.0,2.1675,1.0,273.0,21.0,396.90,5.64,23.9
504,0.10959,0.0,11.93,0.0,0.573,6.794,89.3,2.3889,1.0,273.0,21.0,393.45,6.48,22.0


In [8]:
df.to_csv("/content/drive/MyDrive/Project/House_Price_Prediction/Day0/boston_data.csv")