### 1. Import Libraries

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

### 2. Loading the Data
Load the data set `data/aarhusbolig_2023-04-19.csv` into a dataframe.

### 3. Analysing the Data
Show the dataframe to ensure that the data looks correct.

Show the details of the dataframe. You can use the following `show_details(dataframe)` method.<br>
In this notebook, you can use `show_details(dataframe)` at any point after this block to see what your dataframe looks like.<br>
You can also additionally use other methods that you know or find to get more details about the dataframe.

In [None]:
def isNull(x):
    return x.isna().sum()
        
def show_details(dataframe):
    return dataframe.agg(['dtype', 'count', 'nunique', isNull])
    
show_details(df)

### 4. Cleaning the Data
Clean the data with the following aspects (Not all of them may be needed):
- `.drop(columns, axis=1)` Remove __columns__ that you think are irrelevant.
- `.dropna(columns)` Remove __rows__ with __null values__ in that column if you think that the row NEEDS a value in that column.
- `.fillna(dictionary)` Fill null values with __default values__ if you think a default value makes sense.
- `.astype(type)` Parse data into a __different type__ (e.g. string to int) to be able to work on the proper typed values. This should be your last step!
- `...` Apply other cleanings if you feel they are needed. e.g. cleaning strings specifically, rounding values, etc.

Save as new file to `data/aarhusbolig_cleaned.csv`, so you don't have to repeat all the cleaning process again.<br>
> Make sure to set `index=False` to avoid having index numbers as a new column.

### 5. Exploratory Data Analysis (EDA)
Get insight on each column depending on its type:
- For strings: What are the different values? Is each row unique? Are they nominal or ordinal values?
- For numbers: Are the values discrete or continuous? What are the ranges (min, max, mean)?
- For dates/times: What are the ranges (lowest, highest)? How detailed are they?
- Can you see an index column? (Column which identifies a row uniquely. It does not need to be called *id*)?
> __Notice:__ Since you saved the cleaned dataset as a new file, you don't need to run steps 1-4 anymore to clean it every time. <br>
> You can just load the cleaned dataset and run your analysis and queries on that.

### 6. Visualisation
To create visuals, you create more queries and plot them.<br>
Plot information that is relevant for the housing market. Here are some ideas for plots:
- How many listings per listing type exist?
- What does the price distribution for all listings look like?
- What does the price distribution per listing type look like?
- What does the price distribution by room numbers look like?
- What does the price distribution by room numbers for only apartments look like?
- What is the price distribution per company?
- Hard question: Where are the apartments located? Show a map.

Choose the ones that interest you the most and plot them with an appropriate chart type (bar? horizontal bar? box-plot? etc.)

In [None]:
# This line applies a specific visual style to the plots. For a full list, you can call `plt.style.available`.
plt.style.use("seaborn-v0_8")

In [None]:
# With this, you can annotate bars in bar plot with values. It requires you to save "ax"
def show_bar_values(plot):
    for container in plot.containers:
        plot.bar_label(container, padding=5)

In [None]:
# This is an example for one query and plot. It shows how many listings each company has.
plot = df.groupby("company") \
    .size() \
    .sort_values() \
    .plot.barh(title="Listings per company")

show_bar_values(plot)

In [None]:
# How many listings per listing type exist?

In [None]:
# What does the price distribution for all listings look like?

In [None]:
# What does the price distribution per listing type look like?

In [None]:
# What does the price distribution by room numbers look like?

In [None]:
# What does the price distribution by room numbers for only apartments look like?

In [None]:
# What is the price distribution per company?

In [None]:
# Hard question: Where are the apartments located? Show a map.

### 7. Interpretation
Now having the results from your EDA and your plots, you can interpret the data and the visuals, and write your report. You can save the plots by simply dragging them into your folder. The report is usually not something you do in this notebook. Instead, you use a word document, google docs, latex, or similar. For this homework, use the box below to write down what you see and what it means for a company, or even someone who is looking for a place to live in Aarhus.
> Avoid "pretty tools" like Canva! These will make you waste time on the looks and end up looking unprofessional if you are not skilled yet.
> Most companies have their own templates, which are usually Word documents.