# Model Prototyping
Geoff Pidcock | PacifImpact | Sept 2020

## Scope
- Review prior examples of "Nowcasting"
  - what algorithms
  - how were they evaluated
  - were there any gold standards
- Apply method to toy data example

# Review

#### Macroeconomic Nowcasting with Kalman Filtering, Jason Yip (Medium)
[link](https://towardsdatascience.com/macroeconomic-nowcasting-with-kalman-filtering-557926dbc737)
- References `Macroeconomic Nowcasting and Forecasting with Big Data` (Brandyn Bok et al, 2018) [link](https://www.annualreviews.org/doi/abs/10.1146/annurev-economics-080217-053214) (behind a paywall)
   - [version that's open](https://www.newyorkfed.org/medialibrary/media/research/staff_reports/sr830.pdf)
   - 55 citations

> Nowcasting relies on State-space Representation in Systems Theory to represent the evolution of a variable through time in a way that depends on its past values and the evolution of other variables. It is a natural representation for handling mixed frequencies (monthly/quarterly/yearly) and nonsynchronicity of data releases. The Kalman Filter and Smoothing algorithm is then used to make the nowcast. It extracts co-movements in the timeseries data as a latent factor, use it to estimate past and present values of the observed data, make corrections when new data comes in, and nowcast the current state and values of the variables.

- implemented in `statsmodels.tsa.statespace.dynamic_factor`

- Some interesting choices in numeric transformations

- Full code can be found [here](https://github.com/jasonyip184/nowcast/blob/master/nowcast.ipynb)

#### Analysing port and shipping operations using big data, Bonham et al (2018)
[link](https://datasciencecampus.ons.gov.uk/wp-content/uploads/sites/10/2020/04/Report_Analysing_port_and_shipping_operations_using_big_data_June2018.pdf)

```
tl;dr: interesting study with a very different motivation to our own (they want to classify port delays, not "now-cast" indicators). Nice description of AIS data. Data prep methods are useless as they used a different provider and backend (hadoop). 
```

- What is their motivation
  - UK relies heavily on shiping - tonnage per day; cost of delays in shipping. 
  - They want to better understand shipping

- What are their outputs
  - classifying ship into one of 6 categories using kmeans clustering
  - Using it to inform
    - port traffic and utilisation
    - shipping movements
    - port network analysis
    - delays at port
    - movement of hazardous materials
  - classifying delays (tree, XGBoost)

- How did they evaluate performance?
  - Accuracy of identifying a port delay over a set of test cases. 

- Do they have open code?
  - Yes, yes they do: https://github.com/datasciencecampus/off-course

- Are they trying to do something similar to us?
  - Nope
  - In addition, they leverage the CERS for valuable metadata, which is absent in the pacific (as used by the Malta study)


**IMF working paper - Nowcasting Trade Flows in Real Time, Arslanalp et al 2019 - Geoff**
<br>
[link](https://www.imf.org/en/Publications/WP/Issues/2019/12/13/Big-Data-on-Vessel-Traffic-Nowcasting-Trade-Flows-in-Real-Time-48837)

```
tl;dr: a motivation we can copy paste. Exploring correlations between cargo number and trade volume may be enough. Create a dashboard with "economy health". <br>
That stated, we should emphasize the trade-off between timeliness and reliability. and explore a way we can validate our AIS stats (i.e. is there a port calls dataset for FJ?).
Great section on conclusion and policy implications
```
- What is their motivation
  -  enable statistical agencies to complement existing data sources on trade and introduce more timely (real-time) new statistics that measure trade flows. This, in turn, could facilitate faster detection of turning points in the economic cycle. 
  - some countries are more dependent on maritime trade than others. 
  - early detection of risks, closing data gaps, improving timeliness of official stats

  - The use of vessel traffic data can improve the timeliness of official trade statistics. This could sharpen policymakers’ ability to detect emerging risks in trade flows and, possibly, help identify turning points in the business cycle—especially for small open economies that rely heavily on seaborne trade for either imports or exports.
  - The more granular (ship-by-ship and port-by-port) data may reveal emerging patterns in international trade, including those associated with global trade tensions.
  - The data could close data gaps, especially for countries whose international trade is mostly seaborne and whose statistical capacity is weak (e.g., small island states).
  - Port call data are available on a daily basis in real time, while official trade statistics often appear on a monthly basis, with a one-to-three month lag at best. In countries with weak statistical capacity, they could even be published on an annual basis, with a lag of one year or more after reference period.
  
- What are their outputs
  - filter to identify cargo ships related t0 generating trade activity (i.e. at ports)
  - weekly indicators of trade activity - weekly is the key advantage
     - cargo number - incoming vessals
     - cargo load
- How did they evaluate performance
  - evaluated port calls against a gold standard - % valid
  
- Do they have open code
  - no

- Are they trying to do something similar to us/how are they different
  - Yes!
  - Different data provider (no port calls data item - a key feature for their method)
  - We are also missing deadweight and draught

- methods we should copy
  - seasonal analysis
  - three month moving averages of quarterly stats
  - correlation coefficient of gold standard to derived data
    - cargo number vs port calls 
    - cargo load vs trade volume 

- other interesting stuff
  - on delta between inferred vs recorded port calls
     - ships can arrive and not depart - bad data, long stays, etc etc
     - ships can stay outside of port for trade activity (i.e. smaller boats carry to and fro)
     - not all movements contribute to trade activity (i.e. movement to and fro an offshore oil rig)
  - at least in malta, 80% ships stay for <= 1 day.
  - air traffic can confound things