## Analiza podatkov stanovanj na Bolhi 


V tem dokumentu imamo namen analizirati podatke o prodaji stanovanj, ki smo jih dobili na Bolhi. Imamo naslednji načrt:
* Najprej bomo popravili nekatere napake, ki so se pojavile pri prenosu podatkov iz spletne strani.
* Nato bomo dodali nekatere nove stolpce, ki bodo služili kot dodatna pomoč pri analizi.
* Lotili se bomo analize in sklepali zaključke o trenutnem stanju na tržišču.

In [138]:
import pandas as pd
import os.path

In [139]:
%matplotlib

stanovanja = pd.read_csv(os.path.join('podatki','stanovanja_Bolha.csv'))

Using matplotlib backend: TkAgg


Prikažimo, kako trenutno izgleda naša tabela.

In [140]:
stanovanja

Unnamed: 0,price,location,size,date
0,,"Postojna, Postojna",Unknown,14.11.2022
1,680.0,"Ljubljana Šiška, Šiška",45,14.11.2022
2,,"Maribor, Tabor",37,14.11.2022
3,600.0,"Primorsko-goranska, Rijeka",60,14.11.2022
4,700.0,"Ljubljana Bežigrad, Bežigrad",Unknown,13.11.2022
...,...,...,...,...
436,0.0,"Maribor, Tabor",60,07.10.2020
437,,"Maribor, Kamnica",40,30.05.2020
438,160.0,"Laško, Laško",60,15.02.2020
439,120.0,"Kočevje, Kočevje",300,30.08.2019


Poglejmo, koliko je cena najdražjega stanovanja

In [141]:
stanovanja[["price"]].max()

price    995.0
dtype: float64

Opazimo, da je cena precej nizka. Pričakovali bi, da je najdražje stanovanje dražje od 1000 evrov. Sklepamo, da so stanovanja dražja od 1000 evrov zakodirana pod veliko nižjo ceno. To popravimo bomo poskusili odpraviti.

In [142]:
stanovanja = stanovanja.sort_values(by="price")

stanovanja

Unnamed: 0,price,location,size,date
282,0.0,"Izola, Jagodje",Unknown,23.10.2022
341,0.0,"Maribor, Center",Unknown,16.10.2022
245,0.0,"Sevnica, Sevnica",130,26.10.2022
31,0.0,"Lendava, Lendavske Gorice",70,11.11.2022
237,0.0,"Vransko, Vransko",35,27.10.2022
...,...,...,...,...
356,,"Šempeter pri Gorici, Šempeter pri Gorici",25,15.10.2022
371,,"Radovljica, Radovljica",Unknown,27.09.2022
431,,"Maribor, Tabor",58,04.01.2021
433,,"Maribor, Center",1,27.10.2020


V naslednjem koraku bomo stanovanjem s nedefinirano ceno podelili ceno 0. 

In [143]:
stanovanja["price"] = stanovanja["price"].fillna(0)

Ponovno urednimo stanovanja

In [144]:
stanovanja = stanovanja.sort_values(by="price")
stanovanja

Unnamed: 0,price,location,size,date
282,0.0,"Izola, Jagodje",Unknown,23.10.2022
108,0.0,"Žalec, Žalec",26,07.11.2022
109,0.0,"Šoštanj, Šoštanj",55,07.11.2022
111,0.0,"Kranjska Gora, Gozd Martuljek",40,07.11.2022
136,0.0,"Logatec, Grčarevec",80,05.11.2022
...,...,...,...,...
397,950.0,"Ljubljana Center, Center",49,12.09.2022
411,950.0,"Bled, Bohinjska Bela",Unknown,30.07.2022
9,980.0,"Ljubljana Center, Stara Ljubljana",25,13.11.2022
97,990.0,"Maribor, Tabor",114,08.11.2022


Sedaj si bomo naredili posebno tabelo, ki bo vsebovala le stanovanja, ki imajo neničelno ceno.

In [145]:
st = stanovanja[stanovanja.price > 0]
st

Unnamed: 0,price,location,size,date
256,1.0,"Maribor, Center",122.9,25.10.2022
358,1.0,"Primorsko-goranska, Rijeka",94,12.10.2022
377,1.0,"Primorsko-goranska, Rijeka",90,23.09.2022
348,1.0,"Izola, Jagodje",80,16.10.2022
379,1.0,"Primorsko-goranska, Kostrena",70,22.09.2022
...,...,...,...,...
397,950.0,"Ljubljana Center, Center",49,12.09.2022
411,950.0,"Bled, Bohinjska Bela",Unknown,30.07.2022
9,980.0,"Ljubljana Center, Stara Ljubljana",25,13.11.2022
97,990.0,"Maribor, Tabor",114,08.11.2022


Stanovanjem v tej novi tabeli podelimo nove indekse.

In [146]:
st = st.reset_index(drop=True)
st

Unnamed: 0,price,location,size,date
0,1.0,"Maribor, Center",122.9,25.10.2022
1,1.0,"Primorsko-goranska, Rijeka",94,12.10.2022
2,1.0,"Primorsko-goranska, Rijeka",90,23.09.2022
3,1.0,"Izola, Jagodje",80,16.10.2022
4,1.0,"Primorsko-goranska, Kostrena",70,22.09.2022
...,...,...,...,...
322,950.0,"Ljubljana Center, Center",49,12.09.2022
323,950.0,"Bled, Bohinjska Bela",Unknown,30.07.2022
324,980.0,"Ljubljana Center, Stara Ljubljana",25,13.11.2022
325,990.0,"Maribor, Tabor",114,08.11.2022


Kot smo opazili že prej, imamo največjo ceno stanovananja pod 1000 evrov. Če se sprehodimo po spletni strani vidimo, da imamo cene nekje med 10 evrov do 9000 evrov. Sklepamo, da je prvih nekaj vrstic, kjer je cena nižja od 10 evrov, program narobe prebral. Zato te vrednosti pomnožimo s 1000.

In [147]:
st.loc[st["price"] < 10, "price"] = st.price.mul(1000)
st

Unnamed: 0,price,location,size,date
0,1000.0,"Maribor, Center",122.9,25.10.2022
1,1000.0,"Primorsko-goranska, Rijeka",94,12.10.2022
2,1000.0,"Primorsko-goranska, Rijeka",90,23.09.2022
3,1000.0,"Izola, Jagodje",80,16.10.2022
4,1000.0,"Primorsko-goranska, Kostrena",70,22.09.2022
...,...,...,...,...
322,950.0,"Ljubljana Center, Center",49,12.09.2022
323,950.0,"Bled, Bohinjska Bela",Unknown,30.07.2022
324,980.0,"Ljubljana Center, Stara Ljubljana",25,13.11.2022
325,990.0,"Maribor, Tabor",114,08.11.2022


Tabelo "st" ponovno urednimo po ceni in ji posodobimo indekse.

In [148]:
st = st.sort_values(by="price")
st = st.reset_index(drop=True)
st

Unnamed: 0,price,location,size,date
0,10.0,"Kidričevo, Spodnji Gaj pri Pragerskem",Unknown,06.11.2022
1,10.0,"Kidričevo, Kungota pri Ptuju",45,20.10.2022
2,30.0,"Hrastnik, Hrastnik",80,24.10.2022
3,30.0,"Celje, Center",35,25.10.2022
4,45.0,"Trebnje, Trebnje",24,09.11.2022
...,...,...,...,...
322,2700.0,"Ljubljana Center, Center",151.4,01.02.2022
323,2800.0,"Ljubljana Center, Stara Ljubljana",Unknown,28.09.2022
324,2900.0,"Ljubljana Moste Polje, Zelena jama",234.3,26.10.2022
325,3100.0,"Ljubljana Vič Rudnik, Vič - Dolgi most",140,28.06.2022
