# SQL normalization FTW

* Author: Bartolomeus Haeussling Loewgren
* Kernel: `bw2extdb`
* License: [CC-BY-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/)

## Import routine for `bw2extdb`

This import routine example is using the mobility example dataset which has been exported into a SQLite database in the "Export routine for `bw2extdb`" (`export_routine.ipynb`). This notebook can only be run if the `export_routine.ipynb` has been run.

In [None]:
import bw2extdb.exportImport.importer as importer
import bw2extdb.exportImport.database as database
import bw2data
from bw2io import bw2setup
import pathlib

## Create SQL connection
The SQL connection is managed by the `engine`. It is a `sqlalchemy` object which is the "home base" for the actual database and should only exist once for every connection https://docs.sqlalchemy.org/en/20/core/engines.html

We are using SQLite to demonstrate the import routine. Alternative the engine can be created for any other type of SQL database supported by `sqlalchemy` (https://docs.sqlalchemy.org/en/20/core/engines.html#supported-databases), e.g., PostgreSQL, MySQL, MicrosoftSQL. Some of the SQL database engine creation are wrapped as methods in the `bw2extdb.exportImport.database` module, e.g. `create_sqlite_engine` or `create_MSsql_engine`.

Set the absolute file path of where the SQLite database is located created in the exporting routine in `sqlite_file_path_abs`.

In [None]:
sqlite_file_path_abs: str = 'database.db'

We can just ise the `create_sqlite_engine` method in the `database` module of `bw2extdb`. When the egnine is created we must also create the database and the tables in the SQL database, using `create_db_and_tables`. When the database already exist this method only checks if the database model we have matches the database model in the SQL database.

In [None]:
engine = database.create_sqlite_engine(sqlite_file_path_abs)
database.create_db_and_tables(engine)

## Set information for import
The smallest instance to be imported is a dataset. The dataset can be found by its name or its ID. Two things must be specified:
- `project_name`: the name of the Brightway project where the dataset will be imported to as a new database
- `dataset_name`: the name of the dataset in the SQL database specified in the dataset metadata in datasetmetadata table

In [None]:
project_name = 'import_test'
dataset_name = 'Mobility example (for testing)'

## Initialize the import 

The import is an instance of the `LCIImporterSQL` class and is a child of the `LCIImporter` in `bw2io.importers.base_lci`. This class is a structural copy of the `ExcelImporter` in `bw2io.importers.excel`. The complete structure and workflow is identical to how we normally import data with `bw2io`.
1. Initialize the importer class with the raw data or the link to the data
2. Link all exchanges of the imported dataset to itself and to the existing databases in the projects (e.g., biosphere3 or EcoInvent)
3. write the database to the project

In [None]:
LCIImporter = importer.LCIImporterSql(project_name, dataset_name, engine)

## Link the imported database (dataset)

This part is unique for every dataset and depends on what other databases the dataset is dependent on, often this is EcoInvent and Biosphere3. The names of the database which the dataset was depent on when exported are available. Normally three types of exchanges must be linked: `biosphere`, `production`, `technosphere`. Normally the `production` exchanges are linked internally. The `technosphere` exchanges are mostly linked internally and to other process-activity-databases. The `biosphere` exchanges are mostly linked to `biosphere3` database which is set up when the brightway project is set-up.

First activate the project where the dataset has been loaded into:

In [None]:
bw2data.projects.set_current(project_name)
bw2setup()

Let's have a look at the dependencies:

In [None]:
LCIImporter.database_dependencies

This toy example has no dependencies, but any other example most probably has!

Let's have a look how many exchanges need to be linked

In [None]:
LCIImporter.statistics()

Let's link the biosphere exchanges to the biosphere3 database using the `code` field.

In [None]:
LCIImporter.match_database("biosphere3", fields=["code"], kind="biosphere")
LCIImporter.statistics()

Hm... Not every biosphere flow is matched, this can either be because the biosphere versions do not match. But this time its actually because there are emission activities in the imported dataset. So lets match the biosphere exchanges internally.

In [None]:
LCIImporter.match_database(fields=['reference product', 'unit', 'location', 'name'], kind='biosphere')
LCIImporter.statistics()

The production exchanges are linked internally, using the `code` field. This should be the first thing to be done, because the `code` field is the unique identifier from the original database in Brightway and will therefore link correctly 

In [None]:
LCIImporter.match_database(kind="production", fields=["code"])
LCIImporter.statistics()

The technosphere exchanges are also linked internally first using the `code` field

In [None]:
LCIImporter.match_database(kind="technosphere", fields=["code"])
LCIImporter.statistics()

# This could be a match_database statement when there are more databases to be linked to:
# LCIImporter.match_database('EcoInvent3.9', kind="technosphere", fields=['reference product', 'unit', 'location', 'name'])

**No unlinked exchanges** All exchanges have been matched and the database can be written.

In [None]:
LCIImporter.write_database()

Now lets see if the database has been written and if there is something in it:

In [None]:
bw2data.databases

Yes! Seems to be there and an activity:

In [None]:
bw2data.Database(dataset_name).random().as_dict()

Amazing!! It's that easy!