# ImmoEliza Data Analysis (test)

**This repository is contains Data Analysis for a fictional real estate company called "ImmoEliza". I created it as part of my [BeCode](https://www.becode.org) AI Bootcamp training in 2023.**

The data used in this project was sourced from the repository [ImmoEliza: Collecting Data](https://github.com/DeFre/ImmoEliza-collecting-data) which was used to collect data on 10.000 properties from Immoweb.

In [41]:
import numpy as np
import matplotlib.pyplot as plt 
import seaborn
import pandas as pd
import time

## Import Data

In [21]:
properties_raw = pd.read_csv("scraped_data_10.csv")
print(properties_raw.head())

      Price               Address  Bedrooms Energy class  \
0         €        Grote Markt 22         4            C   
1   €469000  Heidestatiestraat 26         3            D   
2  €1395000      Rue de Wavre, 27         5            C   
3   €285000  Avenue de Longwy 340         2            D   
4   €285000  Avenue de Longwy 340         2            D   

  Primary energy consumption Furnished Terrace  Terrace surface  \
0                        218        No       0               30   
1                        390        No     Yes                0   
2                        178         0       0               60   
3                        299         0       0                0   
4                        299         0       0                0   

   Surface of the plot  Living room surface  ...  Building condition  \
0                    0                    0  ...              As new   
1                  760                   34  ...                   0   
2                 64

## Export Data

In [38]:
"""run this codeblock when you want to change the data you want to write to a file"""
data_to_save = properties_raw

In [39]:
"""This codeblock OVERWRITES THE EXISTING FILE in the same folder as this notebook"""
reference = ""  #add/change reference (datasource/user). Add leading underscore to increase readability
output_filename = "saved_data" + reference + ".csv" #assemble filename
data_to_save.to_csv(output_filename)

In [43]:
"""This codeblock saves the dataframe in a NEW FILE WITH TIMESTAMP in the /datadump folder"""
#data_to_save.to_csv("saved_data.csv") #uncomment this line if you want to overwrite saved_data.csv AND create a timestamped dump
reference = ""    #add/change reference (datasource/user). Add leading underscore to increase readability
timestamp = time.strftime("%Y%m%d-%H%M%S") #add date and time of creation
output_path = "datadump/"     #leave empty to save the file in the same folder as your code, 
output_filename = output_path + "saved_data_" + reference + timestamp  + ".csv" #assemble filename
data_to_save.to_csv(output_filename)

## Checking Data

In [22]:
print(properties_raw.isna().any())
#properties_raw.isna().sum().plot(kind="bar")

Price                         False
Address                       False
Bedrooms                      False
Energy class                  False
Primary energy consumption    False
Furnished                     False
Terrace                       False
Terrace surface               False
Surface of the plot           False
Living room surface           False
Number of frontages           False
Construction year             False
Building condition            False
Outdoor parking space         False
Bathrooms                     False
Shower rooms                  False
Office                        False
Toilets                       False
Kitchen type                  False
Heating type                  False
immo_code                     False
postal code                   False
dtype: bool


In [23]:
properties_raw.drop(properties_raw[(properties_raw.Price == 0) & (properties_raw.Address == 0) & (properties_raw.Bedrooms == 0) & (properties_raw.Furnished == 0)].index, inplace=True) #remove lines full of 0


      Price               Address  Bedrooms Energy class  \
0         €        Grote Markt 22         4            C   
1   €469000  Heidestatiestraat 26         3            D   
2  €1395000      Rue de Wavre, 27         5            C   
3   €285000  Avenue de Longwy 340         2            D   
4   €285000  Avenue de Longwy 340         2            D   

  Primary energy consumption Furnished Terrace  Terrace surface  \
0                        218        No       0               30   
1                        390        No     Yes                0   
2                        178         0       0               60   
3                        299         0       0                0   
4                        299         0       0                0   

   Surface of the plot  Living room surface  ...  Building condition  \
0                    0                    0  ...              As new   
1                  760                   34  ...                   0   
2                 64

In [25]:
display(properties_raw)

Unnamed: 0,Price,Address,Bedrooms,Energy class,Primary energy consumption,Furnished,Terrace,Terrace surface,Surface of the plot,Living room surface,...,Building condition,Outdoor parking space,Bathrooms,Shower rooms,Office,Toilets,Kitchen type,Heating type,immo_code,postal code
0,€,Grote Markt 22,4,C,218,No,0,30,0,0,...,As new,0,1,0,0,2,0,Gas,10666429,2300
1,€469000,Heidestatiestraat 26,3,D,390,No,Yes,0,760,34,...,0,0,1,0,0,2,Installed,0,10666428,2910
2,€1395000,"Rue de Wavre, 27",5,C,178,0,0,60,6468,50,...,Good,0,2,3,Yes,4,Hyper equipped,Fuel oil,10666426,1301
3,€285000,Avenue de Longwy 340,2,D,299,0,0,0,0,0,...,Good,0,2,0,0,2,Installed,Electric,10666424,6700
4,€285000,Avenue de Longwy 340,2,D,299,0,0,0,0,0,...,Good,0,2,0,0,2,Installed,Electric,10666423,6700
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10133,€1199000,Donksesteenweg 212,6,B,199,No,Yes,0,2790,0,...,As new,0,2,0,No,3,Installed,Gas,10642946,2970
10134,€450000,Herbert Hooverplein 14,3,B,120,No,0,9,0,0,...,Good,0,2,0,0,0,Hyper equipped,Gas,3360,20
10135,€699000,Avenue Alphonse Allard 288,3,Not specified,Not specified,No,0,0,0,0,...,0,0,2,0,0,0,0,0,1420,27
10136,€340000,Markeplaats 9,4,B,161,0,Yes,0,621,0,...,0,0,0,0,No,0,0,0,10642937,8560


In [37]:
#Value Count for Energy Class + Primary Energy consumption
print(properties_raw[["Building condition"]].value_counts())

Building condition
0                     3478
Good                  2727
As new                1879
To renovate            885
To be done up          730
Just renovated         388
To restore              51
Name: count, dtype: int64


In [None]:
print(properties_raw[["Energy class", "Primary energy consumption"]].value_counts())

### .info() & .describe()

In [35]:
properties_raw.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10138 entries, 0 to 10137
Data columns (total 22 columns):
 #   Column                      Non-Null Count  Dtype 
---  ------                      --------------  ----- 
 0   Price                       10138 non-null  object
 1   Address                     10138 non-null  object
 2   Bedrooms                    10138 non-null  int64 
 3   Energy class                10138 non-null  object
 4   Primary energy consumption  10138 non-null  object
 5   Furnished                   10138 non-null  object
 6   Terrace                     10138 non-null  object
 7   Terrace surface             10138 non-null  int64 
 8   Surface of the plot         10138 non-null  int64 
 9   Living room surface         10138 non-null  int64 
 10  Number of frontages         10138 non-null  int64 
 11  Construction year           10138 non-null  int64 
 12  Building condition          10138 non-null  object
 13  Outdoor parking space       10138 non-null  in

In [36]:
properties_raw.describe()

Unnamed: 0,Bedrooms,Terrace surface,Surface of the plot,Living room surface,Number of frontages,Construction year,Outdoor parking space,Bathrooms,Shower rooms,Toilets,immo_code,postal code
count,10138.0,10138.0,10138.0,10138.0,10138.0,10138.0,10138.0,10138.0,10138.0,10138.0,10138.0,10138.0
mean,2.7391,8.106924,788.0428,14.508384,1.849872,1122.162754,0.0,1.122213,0.308739,1.20507,9588205.0,4829.721641
std,1.896267,21.906764,11824.45,30.83819,1.538096,979.419282,0.0,1.071369,1.824922,1.306326,3181988.0,3981.973255
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-1.0,0.0,0.0,2.0
25%,2.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,10645680.0,1480.0
50%,3.0,0.0,65.0,0.0,2.0,1930.0,0.0,1.0,0.0,1.0,10653180.0,3920.0
75%,3.0,9.0,490.75,30.0,3.0,1985.0,0.0,1.0,0.0,2.0,10659850.0,8430.0
max,60.0,663.0,1090481.0,2340.0,26.0,2025.0,0.0,27.0,116.0,20.0,10667180.0,100000.0
