# Descriptive Analysis of Housing Dataset

In [1]:
#Importing the required libraries

import pandas as pd
import numpy as np

### Importing the Dataset 

In [2]:
#Importing "HousingNew" datset

df=pd.read_csv("C:\\Users\\91988\\Downloads\\HousingNew.csv")
df

Unnamed: 0,ID,price,lotsize,bedrooms,bathrms,stories,garagepl,recroom,fullbase,airco
0,1,42000.0,5850,3,1,2,1,no,yes,no
1,2,38500.0,4000,2,1,1,0,no,no,no
2,3,49500.0,3060,3,1,1,0,no,no,no
3,4,60500.0,6650,3,1,2,0,yes,no,no
4,5,61000.0,6360,2,1,1,0,no,no,no
...,...,...,...,...,...,...,...,...,...,...
541,542,91500.0,4800,3,2,4,0,yes,no,yes
542,543,94000.0,6000,3,2,4,0,no,no,yes
543,544,103000.0,6000,3,2,4,1,yes,no,yes
544,545,105000.0,6000,3,2,2,1,yes,no,yes


<span style="color:red">Understanding the raw Dataset</span>

<span style="color:blue">From the above, We can see that the given dataset has 546 rows and 10 columns.</span>

<span style="color:blue">Price is the dependent variable or the target value which means Price is the value that we want to predict from the dataset.</span>

<span style="color:blue">Also, there are 9 other variables other than price which are known as the predictors' variables.</span>

In [3]:
#Using pandas "set_index" to replace the index with "ID variable" as it is not used in the further analysis

df.set_index("ID", inplace = True)
df.head()   

Unnamed: 0_level_0,price,lotsize,bedrooms,bathrms,stories,garagepl,recroom,fullbase,airco
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1,42000.0,5850,3,1,2,1,no,yes,no
2,38500.0,4000,2,1,1,0,no,no,no
3,49500.0,3060,3,1,1,0,no,no,no
4,60500.0,6650,3,1,2,0,yes,no,no
5,61000.0,6360,2,1,1,0,no,no,no


### Checking Data types of the columns to get the basic insights from the data 

In [4]:
#Using pandas "dtypes" function to get the data type of all columns in the datframe

df.dtypes

price       float64
lotsize       int64
bedrooms      int64
bathrms       int64
stories       int64
garagepl      int64
recroom      object
fullbase     object
airco        object
dtype: object

### Missing Values

__It is important to check the missing values in the dataframe because if the dataframe has missing values, the data can lose can lose expressiveness, which can lead to weak and biased analyses__

In [5]:
#Using pandas "isnull" function to check if there is any missing values in the dataframe

df.isnull().sum()

price       0
lotsize     0
bedrooms    0
bathrms     0
stories     0
garagepl    0
recroom     0
fullbase    0
airco       0
dtype: int64

<span style="color:blue">From the above results, it can be said that there are no missing values in the dataframe</span>

## Key Statistics  

### Describe Function  

In [6]:
# Using pandas "descrie" attribute to get the overall statistical summary of the given dataframe

df.describe(include="all")

Unnamed: 0,price,lotsize,bedrooms,bathrms,stories,garagepl,recroom,fullbase,airco
count,546.0,546.0,546.0,546.0,546.0,546.0,546,546,546
unique,,,,,,,2,2,2
top,,,,,,,no,no,no
freq,,,,,,,449,355,373
mean,68121.59707,5150.265568,2.965201,1.285714,1.807692,0.692308,,,
std,26702.670926,2168.158725,0.737388,0.502158,0.868203,0.861307,,,
min,25000.0,1650.0,1.0,1.0,1.0,0.0,,,
25%,49125.0,3600.0,2.0,1.0,1.0,0.0,,,
50%,62000.0,4600.0,3.0,1.0,2.0,0.0,,,
75%,82000.0,6360.0,3.0,2.0,2.0,1.0,,,


We get the statistical summary of the dataset using describe function of pandas library


<span style="color:red"> **KEY INSIGHTS** </span>


<span style="color:blue">From the above result, We can infer that the average price of the house is 68121.6. While, the minimum price is 25000 and maximum price is 190000. Moreover, 25% houses are having 49125 price while 75% are having 82000, this shows that the most of the house price values falling within the 49125 to 82000 range.</span>
    
<span style="color:blue">The average Lotsize is 5150.26 square meters. Also, if we examine lotsize for the houses, we can see that living space goes from fairly small to quite large.</span>

<span style="color:blue"> The Standard Deviation in the case of bathrooms in relatively small and close to zero which indicates that bathroom data points tend to be close to the mean

<span style="color:blue"> We can also notice that there are maximum 6 number of bedrooms and 4 bathrooms in some houses while the minimum is one bedroom and one bathroom. Also, There are maximum 3 Garage places in some houses. Whereas, Some do not even have a single Garage place. </span>

In [7]:
#Using pandas "mean" function to calculate the mean of the given dataframe

df.mean()

price       68121.597070
lotsize      5150.265568
bedrooms        2.965201
bathrms         1.285714
stories         1.807692
garagepl        0.692308
dtype: float64

<span style="color:blue">From the above, we can see that average price of the given houses is 68121. Whereas, the average lotsize is 5150 square meters.</span>

In [8]:
#Using pandas "mode" function to get the mode of the given dataframe

df.mode()

Unnamed: 0,price,lotsize,bedrooms,bathrms,stories,garagepl,recroom,fullbase,airco
0,50000.0,6000.0,3.0,1.0,2.0,0.0,no,no,no
1,60000.0,,,,,,,,


<span style="color:blue"> From the above, It can be said that most houses do not have Recreation room, Full basement and Air conditioning</span>

In [9]:
#Using pandas "median" function to get the median of the dataframe variables

df.median()

price       62000.0
lotsize      4600.0
bedrooms        3.0
bathrms         1.0
stories         2.0
garagepl        0.0
dtype: float64

<span style="color:blue"> The median price of houses is 62000, if we compare the median price with the average price,i.e. 68121 it shows that the price variable is right skewed as mean is more than median</span>

In [11]:
#Using pandas "corr" function to check the correlation among variables

df.corr()

Unnamed: 0,price,lotsize,bedrooms,bathrms,stories,garagepl
price,1.0,0.535796,0.366447,0.516719,0.42119,0.383302
lotsize,0.535796,1.0,0.151851,0.193833,0.083675,0.352872
bedrooms,0.366447,0.151851,1.0,0.373769,0.407974,0.139117
bathrms,0.516719,0.193833,0.373769,1.0,0.324066,0.178178
stories,0.42119,0.083675,0.407974,0.324066,1.0,0.043412
garagepl,0.383302,0.352872,0.139117,0.178178,0.043412,1.0


<span style= "color:blue"> From the above correlation results, we can infer that there is a **positive correlation** between lotsize, bathroom and price. Which means as lotsize increases the price of the house increases and as the number of bathrooms increases the price of the house goes high.</span>
    
<span style="color:blue">Thus, it can be said that among all independent variables, lotsize and bathroom play an important role in predicting the price of the houses.</span>

### KEY INSIGHTS 

1. The given dataset has __546 rows__ and __10 columns__.

2. __Price__ is the __depedent variable__ or the __target value__ which means __Price__ is the value that we want to predict from the dataset.

3. There are __9 other variables__ other than price which are known as the __predictors' variables__.

4. There are __no missing values__ in the Dataset

5. We can infer that __the average price__ of the house is __68121.6__. While, the __minimum price__ is __25000__ and __maximum price__ is __190000__. Moreover, 25% houses are having 49125 price while 75% are having 82000, this shows that the __most of the house price values falling within the 49125 to 82000 range.__

6. The __average Lotsize__ is __5150.26__ square meters. Also, if we examine lotsize for the houses, we can see that __living space goes from fairly small to quite large.__

7. The __Standard deviation__ in the case of bathrooms is relatively small and close to zero which indicates that bathroom data points tend to be close to the mean.

8. There are __maximum 6__ number of __bedrooms__ and __4 bathrooms__ in some houses while the __minimum__ is one bedroom and one bathroom. Also, There are __maximum 3 Garage places__ in some houses. Whereas, __Some do not even have a single Garage place__.

9. The most houses __do not__ have __Recreation room, Full basement and Air conditioning__

10. __The median price__ of houses is __62000__, if we __compare__ the __median price__ with the __average price__,i.e. __68121__ it shows that the __price variable__ is __right skewed__ as mean is more than median

11. There is a __positive correlation__ between __lotsize, bathroom and price__. Which means __as lotsize increases the price of the house increases__ and __as the number of bathrooms increases the price of the house goes high__

12. It can be said that among all independent variables, __lotsize__ and __bathroom__ play an important role in predicting the price of the houses and have greater influence on the house price.

# THANK YOU