# Data Analysis for Melbourne Houses
This data analysis project aims to conduct statistics and analysis of houses in the Melbourne area, digging into the relationship between different suburb, house properties, year of construction, sellers and other information and house prices, as well as possible time series based trend analysis and prediction of house prices. The analysis of this project will be based on time and space dimensions, statistical analysis and exploratory analysis of Melbourne houses, which will provide data support for the decision making of sellers, buyers, real estate agents and other stakeholders.

## Data Source
- melb_data.csv
- suburb-10-vic.geojson

## Changes Log
- 2023-01-05: started project

In [25]:
## load packages

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import re
import plotly

In [41]:
## load data
file = '/Users/lq/PycharmProjects/DA_Course/DataSource/DataSource_Melbourne_house/melb_data.csv'
houses = pd.read_csv(file)

houses.head()

Unnamed: 0,Suburb,Address,Rooms,Type,Price,Method,SellerG,Date,Distance,Postcode,...,Bathroom,Car,Landsize,BuildingArea,YearBuilt,CouncilArea,Lattitude,Longtitude,Regionname,Propertycount
0,Abbotsford,85 Turner St,2,h,1480000.0,S,Biggin,3/12/2016,2.5,3067.0,...,1.0,1.0,202.0,,,Yarra,-37.7996,144.9984,Northern Metropolitan,4019.0
1,Abbotsford,25 Bloomburg St,2,h,1035000.0,S,Biggin,4/02/2016,2.5,3067.0,...,1.0,0.0,156.0,79.0,1900.0,Yarra,-37.8079,144.9934,Northern Metropolitan,4019.0
2,Abbotsford,5 Charles St,3,h,1465000.0,SP,Biggin,4/03/2017,2.5,3067.0,...,2.0,0.0,134.0,150.0,1900.0,Yarra,-37.8093,144.9944,Northern Metropolitan,4019.0
3,Abbotsford,40 Federation La,3,h,850000.0,PI,Biggin,4/03/2017,2.5,3067.0,...,2.0,1.0,94.0,,,Yarra,-37.7969,144.9969,Northern Metropolitan,4019.0
4,Abbotsford,55a Park St,4,h,1600000.0,VB,Nelson,4/06/2016,2.5,3067.0,...,1.0,2.0,120.0,142.0,2014.0,Yarra,-37.8072,144.9941,Northern Metropolitan,4019.0


In [42]:
houses.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13580 entries, 0 to 13579
Data columns (total 21 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Suburb         13580 non-null  object 
 1   Address        13580 non-null  object 
 2   Rooms          13580 non-null  int64  
 3   Type           13580 non-null  object 
 4   Price          13580 non-null  float64
 5   Method         13580 non-null  object 
 6   SellerG        13580 non-null  object 
 7   Date           13580 non-null  object 
 8   Distance       13580 non-null  float64
 9   Postcode       13580 non-null  float64
 10  Bedroom2       13580 non-null  float64
 11  Bathroom       13580 non-null  float64
 12  Car            13518 non-null  float64
 13  Landsize       13580 non-null  float64
 14  BuildingArea   7130 non-null   float64
 15  YearBuilt      8205 non-null   float64
 16  CouncilArea    12211 non-null  object 
 17  Lattitude      13580 non-null  float64
 18  Longti

In [43]:
houses.describe()

Unnamed: 0,Rooms,Price,Distance,Postcode,Bedroom2,Bathroom,Car,Landsize,BuildingArea,YearBuilt,Lattitude,Longtitude,Propertycount
count,13580.0,13580.0,13580.0,13580.0,13580.0,13580.0,13518.0,13580.0,7130.0,8205.0,13580.0,13580.0,13580.0
mean,2.937997,1075684.0,10.137776,3105.301915,2.914728,1.534242,1.610075,558.416127,151.96765,1964.684217,-37.809203,144.995216,7454.417378
std,0.955748,639310.7,5.868725,90.676964,0.965921,0.691712,0.962634,3990.669241,541.014538,37.273762,0.07926,0.103916,4378.581772
min,1.0,85000.0,0.0,3000.0,0.0,0.0,0.0,0.0,0.0,1196.0,-38.18255,144.43181,249.0
25%,2.0,650000.0,6.1,3044.0,2.0,1.0,1.0,177.0,93.0,1940.0,-37.856822,144.9296,4380.0
50%,3.0,903000.0,9.2,3084.0,3.0,1.0,2.0,440.0,126.0,1970.0,-37.802355,145.0001,6555.0
75%,3.0,1330000.0,13.0,3148.0,3.0,2.0,2.0,651.0,174.0,1999.0,-37.7564,145.058305,10331.0
max,10.0,9000000.0,48.1,3977.0,20.0,8.0,10.0,433014.0,44515.0,2018.0,-37.40853,145.52635,21650.0


In [44]:
houses.columns

Index(['Suburb', 'Address', 'Rooms', 'Type', 'Price', 'Method', 'SellerG',
       'Date', 'Distance', 'Postcode', 'Bedroom2', 'Bathroom', 'Car',
       'Landsize', 'BuildingArea', 'YearBuilt', 'CouncilArea', 'Lattitude',
       'Longtitude', 'Regionname', 'Propertycount'],
      dtype='object')

In [45]:
houses.index

RangeIndex(start=0, stop=13580, step=1)

In [48]:
houses['Date'] = pd.to_datetime(houses['Date'], format='%d/%m/%Y', infer_datetime_format=True)

In [49]:
houses['Date'].max(), houses['Date'].min()

(Timestamp('2017-09-23 00:00:00'), Timestamp('2016-01-28 00:00:00'))

## Step 1: Problem Statement and Usecases Analysis

### 1. Problem Statement

1. Research the current state of the housing market in Melbourne, including prices, house types and areas.

2. Understand the distribution of housing and the factors influencing the different levels of sparsity of housing in different areas.

3. Identify the best areas to invest in, sell and build homes by comparing and compiling statistics on the price differences between homes in different areas and properties.

4. Study housing market trends and forecast future price and type trends in the housing market.

### 2. Usecases (User Stories)
1. As a real estate agent, I want to be aware of trends in the Melbourne housing market, including changes in prices, house types and areas, so that I can provide a better service to my clients.

2. As a home buyer, I want to be able to know the trends in the Melbourne housing market, including how prices have changed over the last few years, so that I can decide whether to buy now.

3. As a home buyer, I want to be able to know the price differences between areas so that I can compare prices in different areas and choose the most affordable option.

4. As an investor, I want to be able to know the price difference between areas so that I can compare the return on investment in different areas and choose the best value option.

5. As a real estate agent, I want to be able to know the price differences between areas so that I can provide my clients with up-to-date information on the housing market and help them make informed decisions.

6. As a real estate agent, I want to be able to predict future developments in the housing market, including changes in price and demand, so that I can provide better service to my clients.

7. As a home buyer, I want to be able to predict future developments in the housing market so that I can decide whether to buy a home now.

## Step 2: Data Preprocessing

### 1. Data Cleaning