This repository aims to establish a data-driven pipeline for coooperatives of the Solidarity Economy Network of Catalonia (Xarxa d’Economia Solidària de Catalunya - XES)
The general schema proposal to date
Currently it is necessary to centralize data from different sources to consult with a data visualization application, also to have them in a standardized format. Some of the sources are non-standard and particular to the energy sector.
The extractor part of is tool is used to obtain data from different data sources, store it in a raw format and then load the data for transformation.
There are two modules, the datasources and the pipeline:
-
Datasources: extracts through crawlers and saves raw data in a database.
-
Pipeline: from the raw data it makes the transformation and performs the most complex operations
The following commands require virtualenvwrapper
.
mkvirtualenv dades
pip install -r requirements.txt
cp dbconfig.example.py dbconfig.py
The following commands require pyenv
, pipx
and poetry
.
Assuming poetry
and pyenv
are already installed, you can install the dependencies with the following commands:
pyenv install 3.8.12
pyenv virtualenv 3.8.12 somenergia-kpis
pyenv activate somenergia-kpis
poetry install
You should be able to access the project's CLI as in
$ somenergia-kpis-cli --help
Usage: somenergia-kpis-cli [OPTIONS] FUNCTION:{meff_update_closing_prices_day|
meff_update_closing_prices_month|omie_get_historica
l_hour_price|omie_update_latest_hour_price|omie_upd
ate_historical_hour_price|omie_get_historical_energ
y_buy|omie_update_energy_buy|neuro_update_energy_pr
ediction|hs_update_conversations|pipe_hourly_energy
_budget|pipe_omie_garantia}...
Arguments:
FUNCTION:{meff_update_closing_prices_day|meff_update_closing_prices_month|omie_get_historical_hour_price|omie_update_latest_hour_price|omie_update_historical_hour_price|omie_get_historical_energy_buy|omie_update_energy_buy|neuro_update_energy_prediction|hs_update_conversations|pipe_hourly_energy_budget|pipe_omie_garantia}...
Choose which function you want to run.
[required]
Options:
-v, --verbose Increase verbosity [default: 2]
-l, --list-functions List available functions
-s, --dry-run Show dataframes, but don't save to db
--install-completion [bash|zsh|fish|powershell|pwsh]
Install completion for the specified shell.
--show-completion [bash|zsh|fish|powershell|pwsh]
Show completion for the specified shell, to
copy it or customize the installation.
-h, --help Show this message and exit.
Read the following sections for more details.
- Install
pyenv
using pyenv-installer and follow their instructions. Pay special attention to the following steps:- Extending your
.bashrc
file, documented in the pyenv-installer README, so thatpyenv
is available in your shell. - Install
pyenv
dependencies for installing python versions as per the pyenv wiki
- Extending your
- Install the required python version with
pyenv install 3.8.12
, and wait until it finishes. - Create a new virtual environment of your choice. If you used
pyenv-installer
you should be able to create one withpyenv virtualenv 3.8.12 somenergia-kpis
. Here,somenergia-kpis
is the name of the virtual environment. - Activate the virtual environment with
pyenv activate somenergia-kpis
. Alternatively, you can usepyenv shell somenergia-kpis
to activate the virtual environment for the current shell session. You can also set the python version locally withpyenv local somenergia-kpis
. This will create a file called.python-version
which is used by pyenv to automatically activate the virtual environment when youcd
into the directory.
As per their documentation, poetry
can be installed with
curl -sSL https://install.python-poetry.org | python3 -
Follow the instructions in their documentation to install poetry
.
Alternatively, if you have pipx
installed, you can install it with pipx install poetry
. You can read more about this in their installation documentation for pipx
.
pipx
acts as a global interface to pip, meant to consume and install python packages that expose CLIs e.g. poetry
. To install it, follow their instructions at their documentation to install pipx
.
At the root level of the repository you will find a poetry.toml file with some configurations that are valid at the project level only. More specifically,
[virtualenvs]
in-project = false
create = false
the poetry.toml
file tells poetry
to not create a virtual environment for the project, and will instead use the virtual environment created with pyenv
. Read more about this in the poetry documentation.
This means that if you have the environment activated, you can now use the poetry
CLI to manage dependencies.
poetry add <package-name>
will add the package to the pyproject.toml
file. You can also add a package as a development dependency with poetry add --group dev <package-name>
. You can read more about this in the poetry documentation.
This will export the dependencies listed in pyproject.toml
to requirements.txt
and requirements-dev.txt
files. This is useful for deployment in production environments. Read more about this in the poetry documentation.
poetry export -f requirements.txt --only main --output requirements.txt --without-hashes
for development dependencies:
poetry export -f requirements.txt --only dev --output requirements-dev.txt --without-hashes
Edit dbconfig.py
with your data. You will have to create a database.
The visualization user has to have access to new tables by default, because some scripts replace the table each day. You can set this with:
ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT SELECT ON TABLES TO username;
$ pip install dbt-postgres
$ dbt init
Edit the ~/.dbt/profiles.yml with your connection details. You can use dbt_profile.exemple.yml as an example.
Set the schema to your user as dbt_<name>
The dbt_utils macro unpivot
requires dbt_utils macros, which can be installed with
dbt deps --project-dir dbt_kpis
python main.py --help
or python main.py --list-functions
$ dbt run --target testing --project-dir dbt_kpis
Testing will require installing b2btest
which in turn requires lxml
to be installed manually via pip
Create an empty testing database and configure it in dbconfig.py at the test_db
entry.
$ dbt test --target testing --project-dir dbt_kpis