# Assumptions & Approach - Group 15

## Approach

The main goal of our project was to forecast the rental prices across Victoria for the next 3 years. Our approach was tailored to meet the needs of our primary stakeholders, which included potential renters, real estate investors, property managers, and urban planners. We recognized that these groups have significant interest in the predictions of rental prices in order to support informed decision-making related to livelihood, investment, development, and urban infrastructure planning. With these in mind, we took deliberate steps to ensure that our modelling addressed their specific concerns.

1. **Data Collection and Processing:**

    We sourced data from multiple platforms that were essential in deciding features that could possibly affect the rental price of a property.
    
    * **Rental Property Data from Domain:**
    We scraped data from the [domain.com.au](https://www.domain.com.au) website. This provided us with information on properties that are currently available in the market, including their availablity, pricing, and other features, which are of particular interest to renters, property managers, and investors.
    
    * **Historical Median Rent Pricing:**
    We accessed this data from from the [Department of Families, Fairness, and Housing](https://www.dffh.vic.gov.au/), which provided us with the historical data on median rent prices for different suburbs, which we were unable to access from Domain. This was essential for forecasting rental prices as we can use this data to understand its long-term trends.

    * **Crime Data:**
    Crime severity and crime rates were factored in, as safety is a critical concern for both renters and property managers alike. This data was obtained from the [Crime Statistics Agency](https://www.crimestatistics.vic.gov.au/).

    * **Population and Income Data for Demographics:**
    Population and income data was taken from the 2016 and 2021 Census as we wanted to account for the population growth, affordability, and rental demand across different suburbs, which are essential in determining future rental conditions.

    * **Urban Landmarks and Public Transport:**
    Using openrouteservice, we calculated the distances between rental properties and public transport hubs, as well as key urban landmarks such as shopping centers and employment hubs. These factors are crucial for assessing the desirability of rental properties for future tenants. Data for the landmarks and public transport options were sourced from OpenStreetMap.

    * **Schools and Education:**
    The proximity and rankings for educational institutions were considered, as rental demand from students and families wanting to live near a well-regarded school is a significant factor for stakeholders.


2. **Data Imputation and Forecasting:**

    Some datasets such as population, income and crime rates lacked future projection. As a result, we thought it would be ideal to extrapolate these values and utilise them for our model. We implemented statistical forecasting techniques to project these values, enabling stakeholders to anticipate changes that may influence rental trends.


3. **Feature Engineering:**

    Feature engineering was a crucial step in preparing the data for modelling and ensuring that key stakeholder priorities were reflected. For example, real estate investors and property managers are highly interested in a property's proximity to public transport as well as crime rates, as these factors significantly influence rental demand. 
    
    _Population growth and income data were engineered to capture the economic conditions most relevant to urban planners and local government_ (fix?)


4. **Model Fitting:**



## Assumptions

The following are the assumptions we made for our project:

### Education

**i moved this straight from the summary, if need change lmk** -mel

1. School density representing education quality: we assume that a higher density of education institutions (primary, secondary and tertiary institutions) indicate with better educational opportunities, regardless of the actual quality or performance of those institutions.

2. Relatively static analysis over time: our analysis is based on data from 2023, making the assumption that the educational landscape remains mostly stable. We presume that key factors such as funding, policies, and infrastructure development will not experience significant changes during the analysis period. 

3. Influence of non-academic factors: it is assumed that residents of the suburb have full access to schools within that suburb. That is, external factors such as catchment zones and private versus public school policies are not taken into consideration.

### Domain Data

1. We considered all properties in terms of weekly rent
    * Converting from monthly to weekly rent: multiply by 84 and divide by 365 (Monthly rent is calculated as follows : weekly rent divided by 7 (days) x 365 (days) divided by 12 (months))
    * Converting from yearly to weekly rent: multiply by 7 and divide by 365. 

2. We assumed that the minimum monthly rent should be $500, everything below that is most likely to be car spaces or storage options. 

3. We assumed the maximum weekly rent would be $5000 and removed those above that.
    * If the rental price does not specify weekly or monthly, and if it is less than $5000, we assume that it is weekly. 
    * As a result we removed properties that rented 'by the season'. 

4. We determine whether or not a property is furnished by the description or the initial feature list. From the feature list we only say it's furnished if it's verified (i.e. no *). This is due to the nature of the data from the domain website itself, wherein features with * are unverified. 

5. When taking the rental price and they specify the price for both unfurnished and furnished, we always take furnished and take the higher number (basically assuming that furnished properties are always more expensive than unfurnished properties).

6. For weekly rent that specified a range (e.g. 650 - 700 a week), we take the average of the two numbers. 

7. We also got rid of properties that rented per night as we are only considering properties that are rented for long term stays. 

8. We decided to map different property types into just 2 categories, House and Apartment. 

mapping = {
    'House': 'House',
    'Apartment / Unit / Flat': 'Apartment',
    'Townhouse': 'House',
    'Studio': 'Apartment', 
    'Villa': 'House',
    'New House & Land': 'House',
    'New Apartments / Off the Plan': 'Apartment',
    'Semi-Detached': 'House',
    'Duplex': 'House',
    'Terrace': 'House',
}

9. We removed properties that do not specify rental price, and do not have any information on the number of rooms (bed and bath). We also removed properties with 0 beds or 0 baths. 

10. We removed properties that had more than 12 parking spaces (we manually inspected the properties with more than 5 parking spaces and found that most of them were still within reasonable ranges, with properties having more than 12 parking spaces to be outliers)

## Income

We interpolated the income data between the 2016 and 2021 census data to get the data for each year, assuming a linear relationship

## Limitations

### Education:  
1. Ranking criteria differences across school stages: ranking primary and secondary schools together was challenging due to the differences in their ducational objectives and outcomes. A secondary school's ranking might influence real estate more significantly due to its relevance to ATAR/VCE scores, while primary schools often lack performance-based rankings. Tertiary instituions are often highly selective and profession specific in the offered courses, making it difficult to rank them due to the diversity of the content provided. Hence, education density was used to identify top suburbs instead.

2. Quality vs. Quantity dilemma: the use of "education density" measures the quantity of schools but fails to account for the quality of education provided by each institution. A suburb with many low-performing schools could appear more favorable than one with fewer but higher-performing schools.