## Short Term Modeling Approach

### Goal

<!-- For a fixed category $\mathcal{C}$ and a forecasting horizon $h$, we want to predict the number of submissions tagged with $\mathcal{C}$ on the day $t_0+h$. -->

For a stakeholder submitting a paper with tags $\mathcal{C}_1,\dots,\mathcal{C}_n$ and wants to submit within the next $h$ days, we will give the suggestion of the best day(s) within this time period to submit to optimize visibility.

Some questions/remarks:
- ~~Is there a relation between amount of the daily submission and the daily usage?~~ A lot of people subscribe to the mailing list and won't actaully visit the arxiv webpage. 
- Is there an interesting distribution on hourly usage (perhaps on different days of a week)? 
- Need to figure out a way to exclude weekends.


### Steps

#### Step 1: Data Preparation
We will need to fix the following parameters:
- A fixed cateogry $\mathcal{C}$
- A forecasting horizon $h$

#### Step 2: Baseline Model
There are several choices of time series models for us to form the baseline model:
- [Holt-Winters (Triple Exponential Smoothing)](https://www.statsmodels.org/devel/generated/statsmodels.tsa.holtwinters.ExponentialSmoothing.html)  
    Use cross-validation or a validation to find the best combination of the hyperparameters `trend`, `damped_trend`, `seasonal`.  
    - Additive version
    - Multiplicative version

- Seasonal Autoregressive Integrated Moving Average (SARIMA) or ARIMA
- Some regression models ❓

#### Step 3: Modification
Given the result of the baseline modeling, we can modify it to give a better prediction. Furthermore, we can use some frameworks which allow us to run multiple models:
- [Facebook Prophet](https://github.com/facebook/prophet) or [NeuralProphet](https://github.com/ourownstory/neural_prophet)  
    NeuralProphet offers an "_iterative human-in-the-loop mdoel building_".

#### Step 4: Suggestion Output
For a stakeholder submitting a paper with tags $\mathcal{C}_1,\dots,\mathcal{C}_n$ and wants to submit within the next $h$ days, we can give the suggestion by the following steps: 
1. Assign a weight $\lambda_i$ for each categary $\mathcal{C}_i$. (audience, customizable, long-term model❓)
1. Run the short-term model for all the categories $\mathcal{C}_i$ and forecasting horizon $h$. 
1. Suggest the best option(s) within $h$ days for all the categories $\mathcal{C}_i$ with the given weight $\lambda_i$. (other score systems❓)


We can maybe combine with the use of the long-term modeling to give a comprehensive suggestion: 
1. For each $\mathcal{C}_i$, use the long-term model to predict its next move $\mathcal{D}_i$ within the time period of $h$. 
1. Assign a weight for each categary $\mathcal{C}_i$ and $\mathcal{D}_i$ and forecasting horizon $h$. (audience, customizable, long-term model❓)
1. Run the short-term model for all the categories $\mathcal{C}_i$ and $\mathcal{D}_i$. 
1. Suggest the best option(s) within $h$ days for all the categories $\mathcal{C}_i$ and $\mathcal{D}_i$ with the given weight. (other score systems❓)


### Resource

- Lectures on _Time Series_ in the data science bootcamp
- [Time-related feature engineering](https://scikit-learn.org/stable/auto_examples/applications/plot_cyclical_feature_engineering.html#time-related-feature-engineering)