# Exploration of all buildings in the Netherlands
## by Yannick Mariman

## Preliminary Wrangling

> This dataset does preliminary explore the buildings situated in the Netherlands. The data and outcomes of this notebook should not be shared.

In [8]:
# import all packages and set plots to be embedded inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
from pathlib import Path

%matplotlib inline

> Load in your dataset and describe its properties through the questions below.
Try and motivate your exploration goals through this section.

In [52]:
df_raw = pd.read_csv(
    r'D:\OneDrive - MVGM\WerkbestandenYannick\Werkmap_Python\Projecten\wh\data\batch_huur_20210623\input\factiararems-23-06-21.csv'
, dtype={'SoortGarage':str, 'Land':str, 'HuurprijsConditie':str, 'HuurprijsSpecificatie':str,
       'KadastraalEigendom':str, 'KadastraalOmvang':str, 'MakelaarNaam':str, 'STATUS':str,
       'StatusVerhuurd':str})

df_raw = df_raw[['Bron', 'Bestemming', 'Bouwjaar', 'EnergieLabel', 'GebruiksOppervlakte', 'AanmeldDatum', 'Postcode',
       'TypeWoning', 'Looptijd', 'Ingetrokken', 'TransactieHuurPrijs',
       'OorspronkelijkeHuurPrijs', 'SoortAppartement', 'SoortObject',
       'TransactieDatumOndertekeningAkte',
       'Inhoud', 'OnderhoudsNiveauBinnen', 'OnderhoudsNiveauBuiten',
       'BuurtNaam', 'WijkNaam','GemeenteNaam']]

In [53]:
# Read the data
df = df_raw.copy(deep=True)

# Some data wrangling such that the data can be used for visualisation
df.EnergieLabel = df.EnergieLabel.fillna('C') # Median
df.EnergieLabel = df.EnergieLabel.replace({"A+++++":"A", "A++++":"A", "A+++":"A", "A++":"A", "A+":"A"})
df.GebruiksOppervlakte = df.GebruiksOppervlakte.round(0)
df.Inhoud = df.Inhoud.round(0)
df.AanmeldDatum = pd.to_datetime(df.AanmeldDatum)
df.TransactieDatumOndertekeningAkte = pd.to_datetime(df.TransactieDatumOndertekeningAkte)
df.TypeWoning = df.TypeWoning.where(df.TypeWoning.notna(), df.SoortAppartement)
df.drop(columns=['SoortAppartement'], inplace=True)

#Categorical values
ordinal_var_dict = {'EnergieLabel': ['G','F','E','D','C', 'B', 'A'],
#                     'color': ['J', 'I', 'H', 'G', 'F', 'E', 'D'],
#                     'clarity': ['I1', 'SI2', 'SI1', 'VS2', 'VS1', 'VVS2', 'VVS1', 'IF']
                   }

for var in ordinal_var_dict:
    ordered_var = pd.api.types.CategoricalDtype(ordered = True,
                                                    categories = ordinal_var_dict[var])
    df[var] = df[var].astype(ordered_var)

In [55]:
df.OnderhoudsNiveauBinnen.value_counts()

goed                   317305
uitstekend             118123
Goed                    64996
Uitstekend              37097
goed tot uitstekend     32642
Goed tot uitstekend     21096
Redelijk                15599
redelijk                13759
redelijk tot goed       10087
Slecht                   4309
Redelijk tot goed        3360
matig                     668
matig tot redelijk        583
Matig tot redelijk        106
Matig                      74
slecht                     61
slecht tot matig           32
Slecht tot matig            4
Name: OnderhoudsNiveauBinnen, dtype: int64

### What is the structure of your dataset?

In [39]:
print(df.shape)
print(df.dtypes)
display(df)

(737083, 18)
Bron                                        object
Bestemming                                  object
Bouwjaar                                   float64
EnergieLabel                                object
GebruiksOppervlakte                        float64
AanmeldDatum                        datetime64[ns]
Postcode                                    object
TypeWoning                                  object
Looptijd                                     int64
OorspronkelijkeHuurPrijs                   float64
SoortObject                                 object
TransactieDatumOndertekeningAkte    datetime64[ns]
Inhoud                                     float64
OnderhoudsNiveauBinnen                      object
OnderhoudsNiveauBuiten                      object
BuurtNaam                                   object
WijkNaam                                    object
GemeenteNaam                                object
dtype: object


Unnamed: 0,Bron,Bestemming,Bouwjaar,EnergieLabel,GebruiksOppervlakte,AanmeldDatum,Postcode,TypeWoning,Looptijd,OorspronkelijkeHuurPrijs,SoortObject,TransactieDatumOndertekeningAkte,Inhoud,OnderhoudsNiveauBinnen,OnderhoudsNiveauBuiten,BuurtNaam,WijkNaam,GemeenteNaam
0,TIARA,wonen,1997.0,B,150.0,2006-01-10,2851CH,2-onder-1-kapwoning,23,1275.00,Woonhuis,2006-02-02,350.0,goed,goed,Stein,Wijk 08 Haastrecht,Krimpenerwaard
1,TIARA,Woonruimte,1937.0,C,130.0,2006-01-02,2025RB,tussenwoning,25,950.00,Woonhuis,2006-01-27,375.0,goed,goed,Rivierenbuurt,Delftwijk,Haarlem
2,TIARA,woonhuis,2000.0,A,96.0,2006-01-02,3071AS,portiekflat,34,1600.00,Appartement,2006-02-05,235.0,goed,goed,Kop van Zuid - Entrepot,Feijenoord,Rotterdam
3,TIARA,woonhuis,1785.0,C,60.0,2006-01-10,2312XV,bovenwoning,163,675.00,Appartement,2006-06-22,165.0,goed,goed,Marewijk,Binnenstad-Noord,Leiden
4,TIARA,Woonruimte,1937.0,D,110.0,2006-01-10,9722GP,bovenwoning,133,850.00,Appartement,2006-05-23,300.0,goed,goed,Helpman,Helpman e.o.,Groningen
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
737078,REMS,3-K Flatwoning,,C,61.0,2014-09-01,6942AJ,3-K Flatwoning,0,651.06,Appartement,2014-09-01,,,,Didam-Zuid,Wijk 02 Didam,Montferland
737079,REMS,3-K Flatwoning,,C,61.0,2018-05-01,6942AJ,3-K Flatwoning,199,651.06,Appartement,2018-11-16,,,,Didam-Zuid,Wijk 02 Didam,Montferland
737080,REMS,3-K Flatwoning,,A,61.0,2010-09-09,6942AJ,3-K Flatwoning,0,575.37,Appartement,2010-09-09,,,,Didam-Zuid,Wijk 02 Didam,Montferland
737081,REMS,3-K Flatwoning,1989.0,A,61.0,2010-09-09,6942AJ,3-K Flatwoning,0,561.34,Appartement,2010-09-09,,,,Didam-Zuid,Wijk 02 Didam,Montferland


### What is/are the main feature(s) of interest in your dataset?

> Your answer here!

### What features in the dataset do you think will help support your investigation into your feature(s) of interest?

> Your answer here!

## Univariate Exploration

> In this section, investigate distributions of individual variables. If
you see unusual points or outliers, take a deeper look to clean things up
and prepare yourself to look at relationships between variables.

> Make sure that, after every plot or related series of plots, that you
include a Markdown cell with comments about what you observed, and what
you plan on investigating next.

### Discuss the distribution(s) of your variable(s) of interest. Were there any unusual points? Did you need to perform any transformations?

> Your answer here!

### Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

> Your answer here!

## Bivariate Exploration

> In this section, investigate relationships between pairs of variables in your
data. Make sure the variables that you cover here have been introduced in some
fashion in the previous section (univariate exploration).

### Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?

> Your answer here!

### Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

> Your answer here!

## Multivariate Exploration

> Create plots of three or more variables to investigate your data even
further. Make sure that your investigations are justified, and follow from
your work in the previous sections.

### Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

> Your answer here!

### Were there any interesting or surprising interactions between features?

> Your answer here!

> At the end of your report, make sure that you export the notebook as an
html file from the `File > Download as... > HTML` menu. Make sure you keep
track of where the exported file goes, so you can put it in the same folder
as this notebook for project submission. Also, make sure you remove all of
the quote-formatted guide notes like this one before you finish your report!