# Exploring Housing Prices using array manipulation

For this tutorial, we'll be using the [housing prices dataset from Kaggle](https://www.kaggle.com/datasets/yasserh/housing-prices-dataset). The dataset lists the price, area and other attributes for a collection of houses from different areas.

We'll first load our library along with numpy and pandas, to make some manipulation of the datasets easier

In [1]:
import sys
import os
import numpy as np
import pandas as pd
module_path = os.path.abspath(os.path.join('..'))
sys.path.append(module_path+"\\src")

#Load package functions
from mds_array_manipulation.search_array import search_array
from mds_array_manipulation.argmax import argmax
from mds_array_manipulation.sort_array import sort_array
from mds_array_manipulation.count_nonzero_elements import count_nonzero_elements

In [2]:
housing_data = pd.read_csv("Housing.csv")
housing_data.head()

Unnamed: 0,price,area,bedrooms,bathrooms,stories,mainroad,guestroom,basement,hotwaterheating,airconditioning,parking,prefarea,furnishingstatus
0,13300000,7420,4,2,3,yes,no,no,no,yes,2,yes,furnished
1,12250000,8960,4,4,4,yes,no,no,no,yes,3,no,furnished
2,12250000,9960,3,2,2,yes,no,yes,no,no,2,yes,semi-furnished
3,12215000,7500,4,2,2,yes,no,yes,no,yes,3,yes,furnished
4,11410000,7420,4,1,2,yes,yes,yes,no,yes,2,no,furnished


The data is loaded from our docs folder, and we are going to explore different columns in the dataframe for each of the functions. We can first looks at the bedroom count in each house.

In [3]:
bedroom_data = housing_data['bedrooms'].to_numpy()
search_array(bedroom_data, 6)

112

We convert the housing data to a numpy array, to allow us to search it using our `search_array` function, then see if we have a house which fits our criteria (6 bedrooms)

In [4]:
housing_data.loc[112]

price                 6083000
area                     4300
bedrooms                    6
bathrooms                   2
stories                     2
mainroad                  yes
guestroom                  no
basement                   no
hotwaterheating            no
airconditioning            no
parking                     0
prefarea                   no
furnishingstatus    furnished
Name: 112, dtype: object

We can index back into our original dataframe to find additional information on this house, to see if it's otherwise suitable for us. What about even more bedrooms?

In [5]:
search_array(bedroom_data, 7)

-1

There's no houses with 7 bedrooms, so we get a `-1` for the index

Loading parking column data into a variable to find out number of houses that have parking spaces

In [10]:
parking_data = housing_data['parking'].to_numpy()

parking_houses = count_nonzero_elements(parking_data)
noparking_houses = len(parking_data) - parking_houses['Total Non-Zero Elements in Array']
print("Houses with Parking space : " ,parking_houses['Total Non-Zero Elements in Array'])
print("Houses with No Parking Space : " ,noparking_houses)

Houses with Parking space :  246
Houses with No Parking Space :  299


We have 246 houses with parking spaces and 299 houses with no parking.