# SA2 Growth Rate Predict

When making predictions for the SA2 region, we have two issues that need to be addressed
1. What algorithm to use to predict the growth rate of SA2
2. How to proceed with the analysis

## What algorithm to use to predict the growth rate of SA2

In the beginning, we chose lstm or rnn as our model, Through our analysis we found that lstm is more suitable

LSTM networks are an extension of recurrent neural networks (RNNs) mainly introduced to handle situations where RNNs fail. Talking about RNN, it is a network that works on the present input by taking into consideration the previous output (feedback) and storing in its memory for a short period of time (short-term memory). Out of its various applications, the most popular ones are in the fields of speech processing, non-Markovian control, and music composition. Nevertheless, there are drawbacks to RNNs. First, it fails to store information for a longer period of time. At times, a reference to certain information stored quite a long time ago is required to predict the current output. But RNNs are absolutely incapable of handling such “long-term dependencies”. Second, there is no finer control over which part of the context needs to be carried forward and how much of the past needs to be ‘forgotten’. Other issues with RNNs are exploding and vanishing gradients (explained later) which occur during the training process of a network through backtracking. Thus, Long Short-Term Memory (LSTM) was brought into the picture. It has been so designed that the vanishing gradient problem is almost completely removed, while the training model is left unaltered. Long time lags in certain problems are bridged using LSTMs where they also handle noise, distributed representations, and continuous values. With LSTMs, there is no need to keep a finite number of states from beforehand as required in the hidden Markov model (HMM). LSTMs provide us with a large range of parameters such as learning rates, and input and output biases. Hence, no need for fine adjustments. The complexity to update each weight is reduced to O(1) with LSTMs, similar to that of Back Propagation Through Time (BPTT), which is an advantage. 

Since LSTM handles time series tasks better than CNN, in this section we use LSTM regions to predict growth rates


![](../plots/sa2_predict/LSTM.png)

## How to proceed with the analysis

After determining the algorithm，Since pairs require predictions for each SA2 region, this chapter is roughly divided into two parts,
- The first part uses a certain SA2 as an example to make a prediction
- The second part is to sort all SA2 after prediction


![](../plots/sa2_predict/analysis_part.png)

### Single SA2 Analysis

In this section, Let's take 201011001 this SA2 as an example, we expand on the following four parts
1. Load Dataset And Show Base Info
2. Data visualization
3. Feature Engineering
4. Model Predict

#### Load Dataset And Show Base Info

First, we have a basic understanding of the data by reading historical data, through statistical values such as variance, null values, etc

![](../plots/sa2_predict/base_info.png)

#### Data visualization

Then, we analysized the relationship between each feature and the label, which is roughly the trend of fluctuations within a certain range

![](../plots/sa2_predict/visualization.png)

#### Feature Engineering

There are several main difficulties in predicting the future:
- How to get future features
- How to predict house prices

- How to get future features

Because our data is very time-related, we explored the AR model

Autoregressive (AR) modeling is one of the techniques used for time-series analysis. An autoregressive model is a time-series model that describes how a particular variable’s past values influence its current value. In other words, an AR model attempts to predict the next value in a series by incorporating the most recent past values and using them as input data. Autoregressive models are based on the idea that past events can help us predict future events. For example, if we know that the stock market has been going up for the past few days, we might expect it to continue going up in the future. Or, if we know that there has been a lot of rain lately, we might expect more rain in the future.

Autoregressive modeling is training a regression model on the value of the response variable itself. Autoregressive is made of the word, Auto and Regressive which represents the linear regression on itself (auto). In the context of time-series forecasting, autoregressive modeling will mean creating the model where the response variable Y will depend upon the previous values of Y at a pre-determined constant time lag. The time lag can be daily (or 2, 3, 4… days), weekly, monthly, etc. A great way to explain this would be that if I were predicting what the stock price will be at 12 pm tomorrow based on the stock price today, then my model might have an auto part where each day affects the next day’s value just like regular linear regression does but also has regressive features which mean there are different factors influencing changes over shorter spans such as days rather than weeks. AR models can be used to model anything that has some degree of autocorrelation which means that there is a correlation between observations at adjacent time steps. The most common use case for this type of modeling is with stock market prices where the price today (t) is highly correlated with the price one day ago (t-1)

![](../plots/sa2_predict/autoregressive-model.jpg)

Since our data is divided by quarters, we set the order to 4, i.e. AR(4)

This is the change in population predicted by our AR model over time

We use the same method to predict the other 6 characteristics

![](../plots/sa2_predict/population.png)

The **red part** is our forecast value

#### Feature Engineering and Model Predict

In this section, we first divide the dataset

And, Construct batch data methods(create_batch_dataset) to improve performance

In [5]:
def create_batch_dataset(X, y, train=True, buffer_size=1000, batch_size=128):
    batch_data = tf.data.Dataset.from_tensor_slices((tf.constant(X), tf.constant(y)))
    if train:
        return batch_data.cache().shuffle(buffer_size).batch(batch_size)
    else:
        return batch_data.batch(batch_size)

Then I created the model with the help of tensorflow.keras，The model structure is as follows：

![](../plots/sa2_predict/model.png)

With historical and future data, then we can use LSTM for training and prediction
The model we built is shown in the figure
Two layers of LSTM are used, and finally the Dense layer is used to output prediction data

here is some output during training
- model structure: `models/model.png`
- model training logs: `models/logs`
- the best model:`models/best_model.hdf5`

This is the model loss, We can see that the loss is really declining

![](../plots/sa2_predict/output.png)

### ALL SA2 Predict

Based on the above analysis, we can give forecasts for the growth rate of all SA2 regions

![](../plots/sa2_predict/all_predict.png)