# MONET2030 - Data Scraping/ETL

In [None]:
# Stdlib imports
import re
from pathlib import Path


# 3rd party imports
import pandas as pd
import numpy as np

# Local imports
from pymonet import monet_scraper as scraper
from pymonet import monet_consts as const

## 1) List of all MONET2030 indicators

First, let's scrape a list of all indicators and their meta information (e.g. the URLs pointing to the indicator-specific subpages). Let's write this info to a dataframe and store it to disk.

In [None]:
itl = scraper.IndicatorTableLoader(const.url_all_monet2030_indicators, 
                                   const.indicator_table_path
                                  )
await itl.get_table()

## 2) List of all data files for all MONET2030 indicators

Given a list of all subpages related to the MONET2030 indicators (see Step 1), we can now go a step further and scrape each of these subpages. Doing so we can find yet a new set of URLs that point to the actual indicator-specific data files. It is the data in these files we are ultimately interested in.

In [None]:
mitl = scraper.MetaInfoTableLoader(itl.table,
                                   const.metainfo_table_path
                                  )
await mitl.get_table()

## 3) Download all the data files

In [None]:
dfl = scraper.DataFileLoader(mitl.table, const.raw_data_path, const.processed_data_path)

In [None]:
dfl.get_data()

In [None]:
dfl.processed_data_list