# Learning the evolutions with control measures

## Introduction

In [precedent work](https://kipre.github.io/files/internship/reports/data-driven-covid-modeling/data-driven-covid-modeling.html) we explained how we used [sparse system identification of nonlinear dynamics](#) in order to identify dynamics from the number of cumulative cases in different countries. We were identifying a function $f$ that described the dynamics of $\mathbf{x}$ the number of cumulative cases:

$$\mathbf{x_{t+1}} = f(\mathbf{x_t})$$

In this work, the idea is to add the different controls from the government measures. A group of researchers from Oxford gathers all information about the government measures being applied worldwide to contain the pandemic and provides all of this information in [their repository](https://github.com/OxCGRT/covid-policy-tracker).

| Name | Description |
|:--- |:--- |
| School closing | Record closings of schools and universities |
| Workplace closing | Record closings of workplaces |
| Cancel public events | Record cancelling public events |
| Restrictions on gatherings | Record limits on private gatherings |
| Close public transport | Record closing of public transport | 
| Stay at home requirements | Record orders to "shelter-in-place" and otherwise confine to the home | 
| Restrictions on internal movement | Record restrictions on internal movement between cities/regions |
| International travel controls | Record restrictions on international travel <br/><br/><small>Note: this records policy for foreign travellers, not citizens</small> | 

These measures are coded as an ordinal value. For example for school closing here is how the values are interpreted:

| Value | Meaning |
| ---- | ---- |
| 0 | no measures |
| 1 | recommend closing |
| 2 | require closing <br/>(only some levels or categories, eg just <br/>high school, or just public schools) |
| 3 | require closing all levels | 
| Blank | no data | 

For other indicators the ordinal scale is similar and traduces the severity of the measures being taken. Additionally, each of the aforementioned indicators has a flag (a boolean value) informing about whether the measure is implemented locally or globally in the country. Indeed, in countries where the local authorities have more power the measures can vary a lot in different regions. In a similar manner, some countries chose to apply more severe measures in the most affected regions and this flag is designed to inform about this.

Now, we can consider these measures as time-dependent variables in our system $(h_1(t), h_2(t), h_3(t), ..., h_k(t))$ and our ODE becomes:

$$\mathbf{x_{t+1} = f(\mathbf{x_t}, h_1(t), h_2(t), h_3(t), ..., h_k(t))$$


The plot below shows these variables as well as the number of cumulative cases for all available countries.


Since these variables are encoded as ordinal values we can try to use them as they are for our system.

## Fitting

### Delay

COVID-19 has strong delay effects in its dynamics: the incubation period is up to two weeks. For this experiment it is important to take into account this delay. We propose to compute models for a delay ranging from 0 to 20 days to check whether we are able to see the effect of the delay in our fitting.

The following plot shows the mean squared deviations in all countries with the delay ranging from 0 days to 19. All of the models were fitted with a cutoff value of $10^{-10}$.

<img src='mse_vs_delay.svg' width='600'/>

We can see that, unfortunately, there is no minimum in this plot. Ideally, we would have a global minimum for some value around 14 days, which would mean that the optimization found some synchronization between the evolution of the cases and the control measures. The fact that there is no such result already means that we cannot expect some very good results.

The clearest limit in the computing of this result is the fact that we have used a fixed cutoff value for all countries and all delays. We chose $10^{-10}$ because such a low value allows the model to be as complex as its candidate functions allow it to be which usually results in better fitting. But this very low cutoff value also allows the model to overfit since it is no longer regularized. The best way to handle this problem would be to fit the models for a range of different cutoff values and then to choose a model that is reasonably sparse while still well performing. This would increase the computation time of this plot to about one hour.

From this plot it is unclear what delay should we use in the next steps of this work. We can start by choosing a delay of 0 and then also try with 10 for example, to see if there is any difference. 


### Trajectories

We will start by fitting the system as described in the introduction 

