# Example Usage

In this tutorial, you’ll learn a few techniques for array manipulation using mds_array_manipulation. 

This package is designed to perform array manipulation functions such as Searching, Sorting, Counting non-zero elements, Finding indices of max value.

For this tutorial, we'll be using the [housing prices dataset from Kaggle](https://www.kaggle.com/datasets/yasserh/housing-prices-dataset). The dataset lists the price, area and other attributes for a collection of houses from different areas.

## Imports

We'll first load our library along with numpy and pandas, to make some manipulations of the dataset easier.

In [1]:
import sys
import os
import numpy as np
import pandas as pd
module_path = os.path.abspath(os.path.join('..'))
sys.path.append(module_path+"\\src")

#Load package functions
from mds_array_manipulation.search_array import search_array
from mds_array_manipulation.argmax import argmax
from mds_array_manipulation.sort_array import sort_array
from mds_array_manipulation.count_nonzero_elements import count_nonzero_elements

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


## Load Housing Prices Data

We'll load housing price data into a pandas DataFrame and take a quick overview of the dataset's structure and initial entries.

We are going to explore different columns in the dataframe using each of the functions in mds_array_manipulation package.

In [2]:
housing_data = pd.read_csv("Housing.csv")
housing_data.head()

Unnamed: 0,price,area,bedrooms,bathrooms,stories,mainroad,guestroom,basement,hotwaterheating,airconditioning,parking,prefarea,furnishingstatus
0,13300000,7420,4,2,3,yes,no,no,no,yes,2,yes,furnished
1,12250000,8960,4,4,4,yes,no,no,no,yes,3,no,furnished
2,12250000,9960,3,2,2,yes,no,yes,no,no,2,yes,semi-furnished
3,12215000,7500,4,2,2,yes,no,yes,no,yes,3,yes,furnished
4,11410000,7420,4,1,2,yes,yes,yes,no,yes,2,no,furnished


## Search Array

Imagine you're searching for a house suitable for you. With many kids, you need a house with 6 bedrooms.

We can start by looking at the bedroom count (i.e., column `bedrooms` in the housing price dataset) for the first 30 houses.

In [3]:
bedroom_data = housing_data['bedrooms'].to_numpy()
bedroom_data[0:30]

array([4, 4, 3, 4, 4, 3, 4, 5, 4, 3, 3, 4, 4, 4, 3, 4, 4, 3, 3, 3, 3, 3,
       3, 3, 3, 4, 3, 3, 5, 4])

We convert the housing data to a numpy array, to allow us to search it using our `search_array` function, then see if we have a house which fits our criteria (i.e., 6 bedrooms).

In [4]:
search_array(bedroom_data, 6)

112

We can index back into our original dataframe to find additional information on this house, to see if it's otherwise suitable for us.

In [5]:
housing_data.loc[112]

price                 6083000
area                     4300
bedrooms                    6
bathrooms                   2
stories                     2
mainroad                  yes
guestroom                  no
basement                   no
hotwaterheating            no
airconditioning            no
parking                     0
prefarea                   no
furnishingstatus    furnished
Name: 112, dtype: object

The information above confirms that our search result is accurate. The house with index 112 does indeed have 6 bedrooms.

What about even more bedrooms?

In [6]:
search_array(bedroom_data, 7)

-1

There's no houses with 7 bedrooms, so we get a `-1` for the index.

## Sort Array

Imagine you want to know the lowest and highest house prices, as well as the areas, to get a rough idea of the housing market in that region.

We can find the house prices and areas information in the `price` and `area` columns repectively. Let's take a look for the first 30 entries.

In [7]:
price_data = housing_data['price'].to_numpy()
price_data[0:30]

array([13300000, 12250000, 12250000, 12215000, 11410000, 10850000,
       10150000, 10150000,  9870000,  9800000,  9800000,  9681000,
        9310000,  9240000,  9240000,  9100000,  9100000,  8960000,
        8890000,  8855000,  8750000,  8680000,  8645000,  8645000,
        8575000,  8540000,  8463000,  8400000,  8400000,  8400000])

In [8]:
area_data = housing_data['area'].to_numpy()
area_data[0:30]

array([ 7420,  8960,  9960,  7500,  7420,  7500,  8580, 16200,  8100,
        5750, 13200,  6000,  6550,  3500,  7800,  6000,  6600,  8500,
        4600,  6420,  4320,  7155,  8050,  4560,  8800,  6540,  6000,
        8875,  7950,  5500])

Then, we apply the array sorting function to these two columns. (For simplicity, only the first 30 entries are shown.)

In [9]:
price_data_sorted = sort_array(price_data)
price_data_sorted[0:30]

array([1750000, 1750000, 1750000, 1767150, 1820000, 1855000, 1890000,
       1890000, 1960000, 2100000, 2100000, 2100000, 2135000, 2233000,
       2240000, 2275000, 2275000, 2275000, 2310000, 2345000, 2380000,
       2380000, 2380000, 2408000, 2450000, 2450000, 2450000, 2450000,
       2450000, 2450000])

In [10]:
area_data_sorted = sort_array(area_data)
area_data_sorted[0:30]

array([1650, 1700, 1836, 1905, 1950, 1950, 2000, 2015, 2135, 2145, 2145,
       2145, 2145, 2145, 2145, 2160, 2175, 2176, 2275, 2325, 2398, 2400,
       2400, 2430, 2475, 2500, 2520, 2550, 2610, 2610])

As expected, the `sort_array` function arranges the values of these two columns in ascending order, from smallest to largest.

We can use the index to obtain the first and last elements of the sorted array, which represent the lowest and highest values, respectively.

In [11]:
print("Lowest house price: " , price_data_sorted[0], "dollars")
print("Highest house price: " , price_data_sorted[-1], "dollars")
print("Lowest house area: " , area_data_sorted[0], "sq ft")
print("Highest house area: " , area_data_sorted[-1], "sq ft")

Lowest house price:  1750000 dollars
Highest house price:  13300000 dollars
Lowest house area:  1650 sq ft
Highest house area:  16200 sq ft


Awesome! We have obtained the lowest and highest house prices, as well as the areas, which is exactly what we wanted.

## Count Non-zero Elements

Let's imagine that you want to know the number of houses with parking spaces to guide your planning process for making arrangements to accommodate additional cars at your own property. To achieve this, you can load the data `Housing.csv` and pick `parking` column. The column contains the entries denoting parking and 0 indicates no parking. 

To filter out the houses without parking spaces, you can use `count_nonzero_elements` from `mds_array_manipulation` package, where it filters out the number of houses without parking spaces.


In [12]:
parking_data = housing_data['parking'].to_numpy()

parking_houses = count_nonzero_elements(parking_data)
noparking_houses = len(parking_data) - parking_houses['Total Non-Zero Elements in Array']
print("Houses with Parking space : " ,parking_houses['Total Non-Zero Elements in Array'])
print("Houses with No Parking Space : " ,noparking_houses)

Houses with Parking space :  246
Houses with No Parking Space :  299


This insightful analysis, provided us with clear understanding about parking spaces in `housing.csv`. Out of 545 total houses, you have 246 houses equipped with parking space and 299 without parking space.

## Finding Indices of Maximum Value (argmax)