# Predicting future demand from SSDA903 returns

This is a very quick walkthrough of using SSDA903 data to predict future demand for children's services placement. 

For more detailed documentation and examples look at the main repository:

https://github.com/data-to-insight/cs-demand-model


In [None]:
import piplite
await piplite.install('cs-demand-model')
await piplite.install('openpyxl')
await piplite.install('tqdm')

At this point it is safe to go offline and disable your network access.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as pp
from cs_demand_model import Config, DemandModellingDataContainer, ModelPredictor, PopulationStats, fs_datastore

## Configuration

There code is [configurable](https://github.com/data-to-insight/cs-demand-model/blob/master/docs/configuration.ipynb) in terms of the different levels and categories used for prediction. For this example, we're simply going to use the standard default configuration.

In [None]:
config = Config()

print(f"{config.name} - version {config.version}")

# Load Data 

We need some files to work on. The code includes sample files, although this could also load from a local folder or a networked filesystem using the 
[PyFilesystem2](https://docs.pyfilesystem.org/en/latest/) library.

In [None]:
datastore = fs_datastore("sample://v1.zip")
list(datastore.files)

# Merge files and add model-specific fields

We then need to [shape the data](https://github.com/SocialFinanceDigitalLabs/csdm-py/blob/master/docs/data-container.ipynb)
in the way we need it for analysing it. This involves merging all the relevant data files, and also using the configuration to group entries in suitable bins for analysis.


In [None]:
dc = DemandModellingDataContainer(datastore, config)
dc.enriched_view

# Calculate model statistics

Next we need to calculate the [data metrics](https://github.com/SocialFinanceDigitalLabs/csdm-py/blob/master/docs/data-analysis.ipynb)
we need for running the predictive model. The model is a stock and flow model, so the first steps involve creating daily population counts in each bin (stock) and transitions between bins (flow).  


In [None]:
stats = PopulationStats(dc.enriched_view, config)
display(stats.stock)
display(stats.transitions)

We can plot these to see how 

In [None]:
stats.stock.groupby('bin', axis=1).sum().plot()
pp.show()

Plotting the transitions are not quite as interesting, but it gives a feeling for how many individual moves there are. 

In [None]:
stats.transitions.groupby('start_bin', axis=1).sum().cumsum().plot()
pp.show()

These are the overall trends, but we want to look at the behaviour over a smaller part of the dataset and use this to predict future behaviour. For this we need to set some dates. 

So let's look at the data from 2019 and see if we can use that to predict behaviour going forward (showing only for one age group here to simplify):

In [None]:
# Dates we use for window (you can try different values for these)
start_date, end_date = pd.to_datetime('2019-01-01'), pd.to_datetime('2019-12-31')

sub_group = ("TEN_TO_SIXTEEN", "FOSTERING")

# Plot population for the 10 to 16 age bracket.
stats.stock[sub_group].plot()

# Plot the 'calculation window'
pp.axvline(end_date, alpha=0.4)
pp.axvspan(start_date, end_date, alpha=0.1)
pp.show()

We can use this window to look at the probability of a child to move from one placement to another over that period:

In [None]:
stats.raw_transition_rates(start_date, end_date)[sub_group]

We also have to consider new individuals entering care over this period

In [None]:
stats.daily_entrants(start_date, end_date)

# Prediction

Now those rates, plus a few other factors such as the probability of moving into the next age bracket, is all we need to take a daily population and 
[predict](https://github.com/SocialFinanceDigitalLabs/csdm-py/blob/master/docs/predict.ipynb) 
what the next timestep will look like. We can create a "predictor" directly from the stats object:

In [None]:
predictor = ModelPredictor.from_model(stats, start_date, end_date)

The predictor has an "initial population" used to calculate the future state. 

In [None]:
predictor.initial_population

We can now ask for the next population:

In [None]:
predictor = predictor.next()
predictor.initial_population

You can run the above block multiple times to see the population change. To reset, go back and create a new predictor from the initial state. 

Since we usually want to run multiple iterations in one simple operations, there is also a utility method on the predictor to run *n* generations and return a dataframe of all the populations. You can add progress=True to get a progress bar so you have something to enjoy watching while you make a cup of tea...

In [None]:
predictor = ModelPredictor.from_model(stats, start_date, end_date)
predicted_pop = predictor.predict(720, progress=True)  # Predict 720 days forward
predicted_pop

We can plot all the data

In [None]:
stock, predicted_pop = stats.stock.align(predicted_pop, axis=1)

# Plot original data
ax = stock[[sub_group]].plot(legend=True)

# Reset colours and plot predictions
pp.gca().set_prop_cycle(None)
predicted_pop[[sub_group]].plot(ax=ax, linestyle='dashed', legend=False)

# Plot window
pp.axvline(end_date, alpha=0.4)
pp.axvspan(start_date, end_date, alpha=0.1)
pp.show()

## Loading your own data

If you now feel ready to try with your own data, you can upload your own files. The simplest way is to create a zip file with a set of SSDA903 header and episodes CSV files. Please create a separate folder for each year, so you get a structure that looks like:

mydata.zip
  * 2019
    * header.csv
    * episodes.csv
  * 2020
    * header.csv
    * episodes.csv
  * 2021
    * header.csv
    * episodes.csv
    
You can then drag and drop that file into the sidebar of this page - this will not upload anything and you can even do this while disconnected from the internet.

If the following line works, then you can go back up to the section named **Merge files and add model-specific fields** and run from there with your own data.

In [None]:
datastore = fs_datastore("mydata.zip")
list(datastore.files)

## Exporting your stats

You can also export your stats and predictions

In [None]:
dc = DemandModellingDataContainer(datastore, config)
stats = PopulationStats(dc.enriched_view, config)
stats.to_excel("analysis.xlsx", start_date, end_date)