# Analyzing Data with Built-in Features of Python

In this small project, you use standard Python features to analyze the Miete data set. The data set contains the data of 1082 households interviewed for the Munich rent standard. A description of the attributes (column titles) is available in [RDocumentation](https://www.rdocumentation.org/packages/kknn/versions/1.3.1/topics/miete). But our data set contains only 16 (not 18) attributes. 

In [None]:
# Format
A data frame with 1082 observations on the following 18 variables.

nm
Net rent in DM.
wfl
Floor space in sqm.
bj
Year of construction.
bad0
Bathroom in apartment? 1 : no 0 : yes
zh
Central heating? 1 : yes 0 : no
ww0
Hot water supply? 1 : no 0 : yes
badkach
Tiled bathroom? 1 : yes 0 : no
fenster
Window type: 1 : plain windows 0 : state-of-the-art windows
kueche
Kitchen type 1 : well equipped kitchen 0 : plain kitchen
mvdauer
Lease duration in years.
bjkat
Age category of the building (bj categorized) 1 : built before 1919 2 : built between 1919 and 1948 3 : built between 1949 and 1965 4 : built between 1966 and 1977 5 : built between 1978 and 1983 6 : built after 1983
wflkat
Floor space category (wfl categorized): 1 : less than 50 sqm 2 : between 51 sqm and 80 sqm 3 : at least 81 sqm
nmqm
Net rent per sqm.
rooms
Number of rooms in household.
nmkat
Net rent category (nm categorized): 1 : less than 500 DM 2 : between 500 DM and 675 DM 3 : between 675 DM and 850 DM 4 : between 850 DM and 1150 DM 5 : at least 1150 DM
adr
Address type: 1 : bad 2 : average 3 : good
wohn
Residential type: 1 : bad 2 : average 3 : good

## Initialization

Run the next code cell first, which loads the data set from file `Miete.csv` and converts the data into a nested list `data`. The .csv file and this notebook file should be in the same folder. 

As `data` is a nested list, you can apply list indexing, slicing, comprehensions and enumeration. 

- List indexing is `0`-based.
- `data[0]` returns the attribute names. 
- `data[i]` returns the data of the `i-th` house. For instance, `data[1000]` returns the attributes of the 1000-th house. 
- `data[i][j]` returns the `(j+1)-th` attribute of the `i-th`. For instance, `data[80][2]` returns in which year the 80-th house was built.
- `for v in data[1:]` iterates over all houses in sequence. For instance, `[v[0] for v in data[1:] if v[3]]` returns a list of the rents of the houses with a bath. 

In [2]:
import pandas as pd
from pandas import DataFrame as DF
df = pd.read_csv('Miete.csv')
data = [df.columns.values.tolist()]+df.values.tolist()
# data preprocessing
for i in range(1, len(data)):
    for j in range(0, len(data[1])):
        if j == 0:
            data[i][j] = int(data[i][j] * 100) / 100
            continue
        if j == 12:
            data[i][j] = int(data[i][j] * 1000) / 1000
            continue
        data[i][j] = int(data[i][j])
# outputs the column titles and the first four observations
for i in range(5):
    print(data[i])

['Rent', 'FloorSpace', 'YearBuilt', 'Bath', 'CentralHeating', 'HotWater', 'TiledBath', 'WindowType', 'KitchenType', 'LeaseDuration', 'AgeCategory', 'FloorSpaceCategory', 'NetRentPerSqm', 'Rooms', 'AddressType', 'ResidenceType']
[693.29, 50, 1971, 0, 1, 0, 0, 0, 0, 2, 4, 1, 13.865, 1, 2, 2]
[736.6, 70, 1971, 0, 1, 0, 0, 0, 0, 26, 4, 2, 10.522, 3, 2, 2]
[732.23, 50, 1971, 0, 1, 0, 0, 0, 0, 1, 4, 1, 14.644, 1, 2, 2]
[1295.14, 55, 1893, 0, 1, 0, 0, 0, 0, 0, 1, 2, 23.548, 3, 2, 2]


## Data Retrieval

Use the output of a Python snippet to answer each question. For instance, the output of snippet `data[1][9]` answers question "How long was the lease duration of the first house?" Write your code under the `pass` statement in a TODO code cell.

### Question 1

In which year was the first house in the data set built?

In [5]:
### TODO 1 ####
pass
min([v[2] for v in data[1:]])

1800

### Question 2

How much was the rent of the tenth house in the data set?

In [3]:
### TODO 2 ####
pass
data[10][0]

657.44

### Question 3

How many rooms were there in the last house in the data set?

In [4]:
### TODO 3 ####
pass
data[-1][-3]

2

### Question 4

How much was the rent of each house built before 1880? Return a list of the rents. Sample output: `[644.4, 441.52, 837.9, ...]`. Hint: Use list comprehensions. Note that you should begin with `data[1]`, not `data[0]` as `data[0]` returns attribute names, not attribute values. 

In [7]:
#### TODO 4 ####
pass
[v[0] for v in data[1:] if v[2] < 1880]

[644.4, 441.52, 837.9, 1091.09, 808.25]

## Data Aggregation

Use the output of a Python snippet to answer each question. Write your code under the `pass` statement in a TODO code cell. You may call Python built-in functions `sum`, `len`, `min` and `max`. For instance, `max([v[0] for v in data[1:] if not v[3]])` returns the highest rent of the houses without a bathroom; while `sum([v[0] for v in data[1:]]) / len([v[0] for v in data[1:]])` returns the average rent. Do **not** use Numpy or pandas methods.

### Question 5

What was the lowest rent of all houses in the data set?

In [9]:
#### TODO 5 ####
pass
min(v[0] for v in data[1:])

127.06

### Question 6

What was the standard deviation of the rents?

In [13]:
### Question 6 ####
pass
mean = sum(v[0] for v in data[1:])/len(data[1:])
var = sum(((v[0]- mean)**2) for v in data[1:])/len(data[1:])
var**(.5)

408.1449040452738

### Question 7

Variable `AgeCategory` has six levels:  1 - built before 1919  2 - built between 1919 and 1948  3 - built between 1949 and 1965  4 - built between 1966 and 1977 5 - built between 1978 and 1983  6 - built after 1983. What was the average rent of the houses built between 1966 and 1977? 

In [29]:
#### TODO 7 ####
pass
sum([v[0] for v in data[1:] if v[10]==4])/len([v[0] for v in data[1:] if v[10]==4])
                                                     
                                                   

829.791902654867

### Question 8

Variable `Rooms` shows the number of rooms in a house. Variable `KitchenType` has two levels: 1 - well equipped kitchen  0 - plain kitchen. What was the highest rent of a house with 3 rooms and a well-equipped kitchen?

In [30]:
#### TODO 8 ####
pass
max([v[0] for v in data[1:] if v[8]==1 and v[13]==3])

2603.63

### Question 9

Variable `CentralHeating` has two levels: 1 - yes   0 - no. How many houses had central heating?

In [36]:
len([v[4] for v in data[1:] if v[4]==1])

880

### Question 10

In which year was the house with the highest rent constructed?

In [7]:
#### TODO 10 ###

#(v[2] for v in data[1:] if v[0] == max(v[0] for v in data[1:]))????


In [45]:
for v in data[1:]:
    if v[0] is max(v[0] for v in data[1:]):
        print(v[2])

1933


### Question 11

Derive the frequency distribution of the houses in terms of Rooms. Hint: apply the `list.count` method. Sample output:

|\# Rooms   |Frequency |
|:---------:|:---------:|
|1       |137    |
|2       |374    |
|...     |...    |
|8       |0      |
|9       |1      |

In [50]:
#### TODO 11 ####
pass

for i in range(1,10):
    n = list([v[-3] for v in data[1:]]).count(i)
    print(str(i), "rooms   " + str(n))
    

1 rooms   137
2 rooms   374
3 rooms   384
4 rooms   136
5 rooms   40
6 rooms   9
7 rooms   1
8 rooms   0
9 rooms   1


### Question 12

Variable `YearBuilt` shows in which year each house was constructed. What was the average rent of the houses constructed in the 1990s?

In [59]:
#### TODO 12 ####
pass
sum([v[0] for v in data[1:] if 1990 <= v[2]<=1999])/len([v[0] for v in data[1:] if 1990 <=v[2]<=1999])

1356.7849999999999

## Exploratory Data Analysis

In this section, you develop three questions about the `Miete` data set and then answer these questions with Python code. Write each question in the `Question` Markdown cell, and then write code under the `pass` statement in the corresponding TODO code cell. 

### Question 13

*Write down your question here.*

In [4]:


#### TODO 13 #### Count number of houses which built per year in the 1990s
pass


for i in range(1990,2000):
    n = list([v[2] for v in data[1:]]).count(i)
    print(str(i), "house were built number " + str(n))
    


1990 house were built number 0
1991 house were built number 9
1992 house were built number 5
1993 house were built number 0
1994 house were built number 0
1995 house were built number 0
1996 house were built number 0
1997 house were built number 0
1998 house were built number 0
1999 house were built number 0


### Question 14

*Write down your question here.*

In [5]:
#### TODO 14 #### Which year has the max of lease duration 
pass
for v in data[1:]:
    if v[-7] is max(v[-7] for v in data[1:]):
        print(v[2])

1893


### Question 15

*Write down your question here.*

In [6]:
#### TODO 15 #### MIN rent for houses with  bath and central heating, tiled bathroom, state-of-the-art windows, and a well-equipped kitchen.
pass
min(v[0] for v in data[1:] if v[3]==0 and v[4]==1)

130.69