So the motivation for this was, to be entirely honest, kind of selfish. I'm buying a house and I want to see how much my house *could* increase by in 'N' years. Basically, a kind of experiment. It is also handy for me to practice time-series analysis, bit of data science stuff, and some ML practice.

The project is broken down into a few parts:
1. the web-scraping, to gather data from the [RightMove.co.uk](https://www.rightmove.co.uk) website. For historical data, and then use this to try and predict how the houses will increase.
2. the analysis of the data, see if there any trends I can see from this analysis as is.
3. the forecasting, can I do some basic forecasting with the data alone, or can I set up an ML model to predict it for me.
 
# 1. Importing and Pre-Processing
## 1.1 Imports

In [25]:
random_code = 42  # Random code to use and replicate results for each run.

# Data Imports
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Geography imports
import plotly.express as px


# Personal Import
from Dataset_Snapshot import RightMoveScraper

Run the next function as needed if you need a new CSV to work with. This will generate a CSV for every run of this code, this is not ideal. Plus also as a basic tool, a separate python file will suffice.

In [3]:
"""# Create an instance of the scraper
scraper = RightMoveScraper(num_pages=10)  # Adjust the number of pages as needed
scraper.run()"""

'# Create an instance of the scraper\nscraper = RightMoveScraper(num_pages=10)  # Adjust the number of pages as needed\nscraper.run()'

In [4]:
data_file = 'rightmove_housing_data_20250330_233719.csv'
df = pd.read_csv(data_file)

## 1.2 Data Clean Up
Tidy up the data now as it's not really in a state to use just yet.

In [19]:
print(df.dtypes)  # Look at the types of each column

address           object
propertyType      object
bedrooms         float64
bathrooms        float64
latitude         float64
longitude        float64
display_price    float64
date_sold          int64
dtype: object


In [None]:
# Display Price Changes - change from 'str' to 'float'
df['display_price'] = df['display_price'].replace('[^0-9.]', '', regex=True).astype(float) 
# Date Sold changes - change from DD-MM-YYYY 'str' to YYYYMMDD 'int'
df['date_sold'] = pd.to_datetime(df['date_sold'], errors='coerce')
df['date_sold'] = df['date_sold'].dt.strftime('%Y%m%d').astype(int)

## 1.3 Some Basic Analysis

In [46]:
color_scale = [(0, 'yellow'), (1,'red')]

fig = px.scatter_mapbox(df, 
                        lat="latitude", 
                        lon="longitude", 
                        hover_name="address", 
                        hover_data=["address", "display_price"],
                        color="display_price",
                        color_continuous_scale=color_scale,
                        size="display_price",
                        zoom=13, 
                        height=600,
                        width=1200)

fig.update_layout(mapbox_style="open-street-map")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()