#### What's Tabular Data?
<img src="tabularDAta.png" width="500"/>

##### Working with List

In [2]:
house_0_list = [115910,128,4]
house_0_list

[115910, 128, 4]

In [3]:
house_0_price =house_0_list[0]
house_0_area =house_0_list[1]
house_0_price_m2 = house_0_price / house_0_area
house_0_price_m2

905.546875

In [4]:
## append an itea to a list in python
house_0_list.append(house_0_price_m2)
house_0_list

[115910, 128, 4, 905.546875]

##### Nested List

In [5]:
houses_nested_list = [
     [115910.26, 128.0, 4.0],
    [48718.17, 210.0, 3.0],
    [28977.56, 58.0, 2.0],
    [36932.27, 79.0, 3.0],
    [83903.51, 111.0, 3.0],
]
houses_nested_list

[[115910.26, 128.0, 4.0],
 [48718.17, 210.0, 3.0],
 [28977.56, 58.0, 2.0],
 [36932.27, 79.0, 3.0],
 [83903.51, 111.0, 3.0]]

In [6]:
### append the price per sqm to each observation
for house in houses_nested_list:
    price_m2 =house[0] /house[1]
    house.append(price_m2)
houses_nested_list


[[115910.26, 128.0, 4.0, 905.54890625],
 [48718.17, 210.0, 3.0, 231.9912857142857],
 [28977.56, 58.0, 2.0, 499.61310344827587],
 [36932.27, 79.0, 3.0, 467.4970886075949],
 [83903.51, 111.0, 3.0, 755.8874774774774]]

##### Working with Dictionaries

- Lists only store values, so it’s hard to know what each number means. 
For example, `[115910.26, 128.0, 4]` 
doesn’t tell us which is price, area, or rooms. A dictionary is better because it uses keys, making the data clear, like:

```python
house_0 = {"price": 115910.26, "area": 128.0, "rooms": 4}
```

This way, each value has meaning. 


In [7]:
house_0_dict = {
    "price_aprox_usd": 115910.26,
    "surface_covered_in_m2": 128,
    "rooms" : 4
}
house_0_dict

{'price_aprox_usd': 115910.26, 'surface_covered_in_m2': 128, 'rooms': 4}

In [8]:
# Add "price_per_m2" key-value pair to `house_0_dict`
house_0_dict["price_per_m2"]= house_0_dict["price_aprox_usd"] / house_0_dict["surface_covered_in_m2"]

In [9]:
house_0_dict

{'price_aprox_usd': 115910.26,
 'surface_covered_in_m2': 128,
 'rooms': 4,
 'price_per_m2': 905.54890625}

In [14]:
# Declare variable `houses_rowwise`
houses_rowwise = [
    {
        "price_approx_usd": 115910.26,
        "surface_covered_in_m2": 128,
        "rooms": 4,
    },
    {
        "price_approx_usd": 48718.17,
        "surface_covered_in_m2": 210,
        "rooms": 3,
    },
    {
        "price_approx_usd": 28977.56,
        "surface_covered_in_m2": 58,
        "rooms": 2,
    },
    {
        "price_approx_usd": 36932.27,
        "surface_covered_in_m2": 79,
        "rooms": 3,
    },
    {
        "price_approx_usd": 83903.51,
        "surface_covered_in_m2": 111,
        "rooms": 3,
    },
]

houses_rowwise

[{'price_approx_usd': 115910.26, 'surface_covered_in_m2': 128, 'rooms': 4},
 {'price_approx_usd': 48718.17, 'surface_covered_in_m2': 210, 'rooms': 3},
 {'price_approx_usd': 28977.56, 'surface_covered_in_m2': 58, 'rooms': 2},
 {'price_approx_usd': 36932.27, 'surface_covered_in_m2': 79, 'rooms': 3},
 {'price_approx_usd': 83903.51, 'surface_covered_in_m2': 111, 'rooms': 3}]

In [13]:
for house in houses_rowwise:
    house["price_per_m2"]= house["price_approx_usd"] / house["surface_covered_in_m2"]
houses_rowwise

[{'price_approx_usd': 115910.26,
  'surface_covered_in_m2': 128,
  'rooms': 4,
  'price_per_m2': 905.54890625},
 {'price_approx_usd': 48718.17,
  'surface_covered_in_m2': 210,
  'rooms': 3,
  'price_per_m2': 231.9912857142857},
 {'price_approx_usd': 28977.56,
  'surface_covered_in_m2': 58,
  'rooms': 2,
  'price_per_m2': 499.61310344827587},
 {'price_approx_usd': 36932.27,
  'surface_covered_in_m2': 79,
  'rooms': 3,
  'price_per_m2': 467.4970886075949},
 {'price_approx_usd': 83903.51,
  'surface_covered_in_m2': 111,
  'rooms': 3,
  'price_per_m2': 755.8874774774774}]

In [21]:
house_price =[]
for house in houses_rowwise:
    house_price.append(house["price_approx_usd"])
house_mean_price =sum(house_price) /len(house_price)
house_mean_price

62888.35399999999

One way to make calculations easier is to organize data by features instead of observations. We’ll still use dictionaries and lists, just structured differently.


In [22]:
# Declare variable `houses_columnwise`
houses_columnwise = {
    "price_approx_usd": [115910.26, 48718.17, 28977.56, 36932.27, 83903.51],
    "surface_covered_in_m2": [128.0, 210.0, 58.0, 79.0, 111.0],
    "rooms": [4.0, 3.0, 2.0, 3.0, 3.0],
}
houses_columnwise

{'price_approx_usd': [115910.26, 48718.17, 28977.56, 36932.27, 83903.51],
 'surface_covered_in_m2': [128.0, 210.0, 58.0, 79.0, 111.0],
 'rooms': [4.0, 3.0, 2.0, 3.0, 3.0]}

In [23]:
# mean house price

mean_house_price =sum(houses_columnwise["price_approx_usd"]) / len(houses_columnwise["price_approx_usd"])
mean_house_price


62888.35399999999

In [40]:
## create price_per_m2 in houses_columnwise
price= houses_columnwise["price_approx_usd"]
area =houses_columnwise["surface_covered_in_m2"]

price_per_m2=[]
for p,a in zip(price ,area):
    price_m2 =p / a
    price_per_m2.append(price_m2)
houses_columnwise["price_per_m2"] = price_per_m2
houses_columnwise


{'price_approx_usd': [115910.26, 48718.17, 28977.56, 36932.27, 83903.51],
 'surface_covered_in_m2': [128.0, 210.0, 58.0, 79.0, 111.0],
 'rooms': [4.0, 3.0, 2.0, 3.0, 3.0],
 'price_per_m2': [905.54890625,
  231.9912857142857,
  499.61310344827587,
  467.4970886075949,
  755.8874774774774]}

JSON is great for organizing data, but it has downsides. Each dictionary is like a row, so row-wise tasks (e.g., price per sq. meter) are easy. But column-wise tasks (like mean house price) require extra steps—first gather all prices into a list, then calculate the mean.

##### Tabular Data and pandas DataFrames

In [1]:
import pandas as pd

In [2]:
data = {
     "price_approx_usd": [115910.26, 48718.17, 28977.56, 36932.27, 83903.51],
    "surface_covered_in_m2": [128.0, 210.0, 58.0, 79.0, 111.0],
    "rooms": [4.0, 3.0, 2.0, 3.0, 3.0],
}
df_houses =pd.DataFrame(data)
df_houses

Unnamed: 0,price_approx_usd,surface_covered_in_m2,rooms
0,115910.26,128.0,4.0
1,48718.17,210.0,3.0
2,28977.56,58.0,2.0
3,36932.27,79.0,3.0
4,83903.51,111.0,3.0
