[Feature #600]: Use multiprocessing to speed up the parsing #601

AlexandraImbrisca · 2025-01-26T22:59:38Z

Using a ProcessorPoolExecutor with 3 processors speeds up the execution time significantly
Depending on the operating system and technical specifications, we obtain a time decrease between 49.68% and 70.03% relatively to the previously optimized algorithm. In combination with the other improvements, this adds up to a 76,56% decrease from the initial, non-optimized implementation
Leveraging the standard multiprocessing functionality and carefully ordering the files leads to a safe optimisation across all tested environments

If it's not done inside of the "if __name__ == "__main__"", it will be recalled inside every new process on Mac/Windows

…he analysis

Since the processing is now async, this print might confuse the users

nesnoj · 2025-01-27T09:57:08Z

Thank you @AlexandraImbrisca for the implementation and sending the detailed report which reads coherently!
Is this PR ready for review?

What I stumbled across so far:

The CPU count is hard coded but should be configurable. Ideally via CLI but we do not have one, so maybe an environment variable could do the job. And I'm not sure whether >1 is an appropriate default, mp could also be promoted optionally. To be kept in mind: the base process uses 100 MB and each process about 1 GB. A standard office PC is equipped with ~8 GB so 3 processes might be ok. Alternatively, we could set a default of 1 and add a message like "Your system supports multiple CPU cores, you can increase the processing speed by setting env var ..."
What do you think @AlexandraImbrisca @FlorianK13 ?
I tested with different CPU counts (Ryzen 7, Linux, SQlite DB, only "solar"). Concerning the processing speed increase my results are somewhat in line to yours:

Cores	Time in s (SQLite)
3	394.8
4	316.6
5	292.4
6	280.8
8	279.5
10	280.6
12	276.4

The speed is stalling somewhere from 5 cores onward. I can imagine this drop in the speed increase is caused by a) the writing concurrency, b) other running processes on my laptop, c) number of parallel processes decrease once most of the tasks are done?

Could you please explain why you chose parameters in create_efficient_engine() like this? And are they optimal for any number of CPUs?
With 12 cores I occasionally(!) get the following error message:

sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) duplicate column name: InAnspruchGenommeneAckerflaeche
[SQL: ALTER TABLE solar_extended ADD "InAnspruchGenommeneAckerflaeche" VARCHAR NULL;]

(The column InAnspruchGenommeneAckerflaeche does not exist in our data model which isn't a problem - it is automatically added but there seems to be an issue with that in the mp)

PostgreSQL is crashing here: sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) invalid dsn: invalid connection option "timeout". The arg timeout in connect_args seems not to be understood by Postgres.
The docs need to be updated
Changelog entry is missing

AlexandraImbrisca · 2025-01-27T19:30:18Z

Thanks a lot for the detailed review and suggestions @nesnoj!

Default number of processes: I like the suggestion of keeping only 1 and adding a note! Since we are just introducing this feature, it might be helpful to make people aware of it and ask them to report any potential issues. How about we add that explanatory message and a link to the issues page to report any possible bugs / negative experiences?
Thanks for testing! From what I have read the general suggestion is to use CPU_count - 1 so I totally agree that increasing the number of cores relatively to the system makes sense. I think the performance stalls because of sqlite (sqlite is not designed for write concurrency) :(
The choice of parameters: sure, I'll leave some comments!
Error when using 12 cores: oh, interestingly! OOC, is the exception caught or does the program terminate?
Postgresql: I unfortunately tested mostly on sqlite:( I'll find a solution for this bug and test a bit more on postgresql
Docs & changelog updates: sure thing! I'll create another commit for these updates

nesnoj · 2025-01-28T08:57:39Z

Hey @AlexandraImbrisca !

Default number of processes: I like the suggestion of keeping only 1 and adding a note! Since we are just introducing this feature, it might be helpful to make people aware of it and ask them to report any potential issues. How about we add that explanatory message and a link to the issues page to report any possible bugs / negative experiences?

Sounds good to me.
What do you think @FlorianK13 ?

I think the performance stalls because of sqlite (sqlite is not designed for write concurrency) :(

An alternative way could be to create separate SQLite DBs and finally merge them. Dunno if this is a viable option..

Error when using 12 cores: oh, interestingly! OOC, is the exception caught or does the program terminate?

It terminates :(

FlorianK13 · 2025-01-28T09:42:03Z

Default number of processes: I like the suggestion of keeping only 1 and adding a note! Since we are just introducing this feature, it might be helpful to make people aware of it and ask them to report any potential issues. How about we add that explanatory message and a link to the issues page to report any possible bugs / negative experiences?

Sounds good to me as well!

…arge

Instead of "timeout", we can use "connect_timeout" which works for both SQLite and PostgreSQL

AlexandraImbrisca · 2025-01-28T19:59:05Z

Awesome, thanks a lot both! A few updates from my side:

I introduced 2 new environment variables: one for using the recommended number of processes, one to set up a custom number of processes. I think that 2 variables are necessary since people might not be aware of what number of processes would perform the best, but it would be nice to allow them to customize it
@nesnoj I think the "duplicate column name" exception occurs because of a race condition (i.e., 2 processes trying to add the same column at the same time). Please correct me if I'm wrong, but I think we can safely ignore this error since once we have introduced the missing columns, we reached our purpose 🤔 I added some more error handling. Could you please let me know if you are still able to reproduce this issue?
I fixed the PostgreSQL issue and generally tested more for PostgreSQL
I updated the documentation and added a message to promote this feature

About merging the DBs: that might work, but it might get quite messy with many processes (i.e., we could end up with 10+ temporary DBs) and we have to make sure that we clean everything up eventually 🤔 Using temporary tables performed better than I expected (source)

nesnoj · 2025-01-29T11:36:13Z

Thx for the quick update!

I introduced 2 new environment variables: one for using the recommended number of processes, one to set up a custom number of processes. I think that 2 variables are necessary since people might not be aware of what number of processes would perform the best, but it would be nice to allow them to customize it

I'll get back to this later

@nesnoj I think the "duplicate column name" exception occurs because of a race condition (i.e., 2 processes trying to add the same column at the same time). Please correct me if I'm wrong, but I think we can safely ignore this error since once we have introduced the missing columns, we reached our purpose 🤔 I added some more error handling. Could you please let me know if you are still able to reproduce this issue?

I fixed the PostgreSQL issue and generally tested more for PostgreSQL

The column issue seems to be solved but now I keep getting an error in PostgreSQL with the privileges, see below for full log. The user has all privileges for the DB (superuser) and the tables are created but no data is written. I think it is not related to the actual privileges but the implementation but I wasn't able to track it further down right now.
Does it work properly at your end?

I updated the documentation and added a message to promote this feature

About merging the DBs: that might work, but it might get quite messy with many processes (i.e., we could end up with 10+ temporary DBs) and we have to make sure that we clean everything up eventually 🤔 Using temporary tables performed better than I expected (source)

Great that you already did some testing in the past! The write-temp-and-merge strategy was just a quick thought, it probably comes with other consequences I cannot estimate and also requires more testing. I'm also fine with the current implementation but open for discussion ;).

Click here for full postgres traceback

Processing file 'AnlagenEegSolar_48.xml'...
Processing file 'EinheitenSolar_48.xml'...

concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/nesnoj/miniconda3/envs/py310_open_mastr_546/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 146, in __init__
    self._dbapi_connection = engine.raw_connection()
  File "/home/nesnoj/miniconda3/envs/py310_open_mastr_546/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 3298, in raw_connection
    return self.pool.connect()
  File "/home/nesnoj/miniconda3/envs/py310_open_mastr_546/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 449, in connect
    return _ConnectionFairy._checkout(self)
  File "/home/nesnoj/miniconda3/envs/py310_open_mastr_546/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 1263, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File "/home/nesnoj/miniconda3/envs/py310_open_mastr_546/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 712, in checkout
    rec = pool._do_get()
  File "/home/nesnoj/miniconda3/envs/py310_open_mastr_546/lib/python3.10/site-packages/sqlalchemy/pool/impl.py", line 179, in _do_get
    with util.safe_reraise():
  File "/home/nesnoj/miniconda3/envs/py310_open_mastr_546/lib/python3.10/site-packages/sqlalchemy/util/langhelpers.py", line 146, in __exit__
    raise exc_value.with_traceback(exc_tb)
  File "/home/nesnoj/miniconda3/envs/py310_open_mastr_546/lib/python3.10/site-packages/sqlalchemy/pool/impl.py", line 177, in _do_get
    return self._create_connection()
  File "/home/nesnoj/miniconda3/envs/py310_open_mastr_546/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 390, in _create_connection
    return _ConnectionRecord(self)
  File "/home/nesnoj/miniconda3/envs/py310_open_mastr_546/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 674, in __init__
    self.__connect()
  File "/home/nesnoj/miniconda3/envs/py310_open_mastr_546/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 900, in __connect
    with util.safe_reraise():
  File "/home/nesnoj/miniconda3/envs/py310_open_mastr_546/lib/python3.10/site-packages/sqlalchemy/util/langhelpers.py", line 146, in __exit__
    raise exc_value.with_traceback(exc_tb)
  File "/home/nesnoj/miniconda3/envs/py310_open_mastr_546/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 896, in __connect
    self.dbapi_connection = connection = pool._invoke_creator(self)
  File "/home/nesnoj/miniconda3/envs/py310_open_mastr_546/lib/python3.10/site-packages/sqlalchemy/engine/create.py", line 646, in connect
    return dialect.connect(*cargs, **cparams)
  File "/home/nesnoj/miniconda3/envs/py310_open_mastr_546/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 622, in connect
    return self.loaded_dbapi.connect(*cargs, **cparams)
  File "/home/nesnoj/miniconda3/envs/py310_open_mastr_546/lib/python3.10/site-packages/psycopg2/__init__.py", line 122, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: connection to server at "localhost" (127.0.0.1), port 5432 failed: FATAL:  password authentication failed for user "mastr"
connection to server at "localhost" (127.0.0.1), port 5432 failed: FATAL:  password authentication failed for user "mastr"


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/nesnoj/miniconda3/envs/py310_open_mastr_546/lib/python3.10/concurrent/futures/process.py", line 246, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/home/nesnoj/git-repos/OpenEnergyPlatform/open-MaStR/open-MaStR_546_parsing_speed/open_mastr/xml_download/utils_write_to_database.py", line 103, in process_xml_file
    create_database_table(engine, xml_table_name)
  File "/home/nesnoj/git-repos/OpenEnergyPlatform/open-MaStR/open-MaStR_546_parsing_speed/open_mastr/xml_download/utils_write_to_database.py", line 215, in create_database_table
    orm_class.__table__.drop(engine, checkfirst=True)
  File "/home/nesnoj/miniconda3/envs/py310_open_mastr_546/lib/python3.10/site-packages/sqlalchemy/sql/schema.py", line 1299, in drop
    bind._run_ddl_visitor(ddl.SchemaDropper, self, checkfirst=checkfirst)
  File "/home/nesnoj/miniconda3/envs/py310_open_mastr_546/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 3248, in _run_ddl_visitor
    with self.begin() as conn:
  File "/home/nesnoj/miniconda3/envs/py310_open_mastr_546/lib/python3.10/contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "/home/nesnoj/miniconda3/envs/py310_open_mastr_546/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 3238, in begin
    with self.connect() as conn:
  File "/home/nesnoj/miniconda3/envs/py310_open_mastr_546/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 3274, in connect
    return self._connection_cls(self)
  File "/home/nesnoj/miniconda3/envs/py310_open_mastr_546/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 148, in __init__
    Connection._handle_dbapi_exception_noconnection(
  File "/home/nesnoj/miniconda3/envs/py310_open_mastr_546/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2439, in _handle_dbapi_exception_noconnection
    raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
  File "/home/nesnoj/miniconda3/envs/py310_open_mastr_546/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 146, in __init__
    self._dbapi_connection = engine.raw_connection()
  File "/home/nesnoj/miniconda3/envs/py310_open_mastr_546/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 3298, in raw_connection
    return self.pool.connect()
  File "/home/nesnoj/miniconda3/envs/py310_open_mastr_546/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 449, in connect
    return _ConnectionFairy._checkout(self)
  File "/home/nesnoj/miniconda3/envs/py310_open_mastr_546/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 1263, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File "/home/nesnoj/miniconda3/envs/py310_open_mastr_546/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 712, in checkout
    rec = pool._do_get()
  File "/home/nesnoj/miniconda3/envs/py310_open_mastr_546/lib/python3.10/site-packages/sqlalchemy/pool/impl.py", line 179, in _do_get
    with util.safe_reraise():
  File "/home/nesnoj/miniconda3/envs/py310_open_mastr_546/lib/python3.10/site-packages/sqlalchemy/util/langhelpers.py", line 146, in __exit__
    raise exc_value.with_traceback(exc_tb)
  File "/home/nesnoj/miniconda3/envs/py310_open_mastr_546/lib/python3.10/site-packages/sqlalchemy/pool/impl.py", line 177, in _do_get
    return self._create_connection()
  File "/home/nesnoj/miniconda3/envs/py310_open_mastr_546/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 390, in _create_connection
    return _ConnectionRecord(self)
  File "/home/nesnoj/miniconda3/envs/py310_open_mastr_546/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 674, in __init__
    self.__connect()
  File "/home/nesnoj/miniconda3/envs/py310_open_mastr_546/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 900, in __connect
    with util.safe_reraise():
  File "/home/nesnoj/miniconda3/envs/py310_open_mastr_546/lib/python3.10/site-packages/sqlalchemy/util/langhelpers.py", line 146, in __exit__
    raise exc_value.with_traceback(exc_tb)
  File "/home/nesnoj/miniconda3/envs/py310_open_mastr_546/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 896, in __connect
    self.dbapi_connection = connection = pool._invoke_creator(self)
  File "/home/nesnoj/miniconda3/envs/py310_open_mastr_546/lib/python3.10/site-packages/sqlalchemy/engine/create.py", line 646, in connect
    return dialect.connect(*cargs, **cparams)
  File "/home/nesnoj/miniconda3/envs/py310_open_mastr_546/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 622, in connect
    return self.loaded_dbapi.connect(*cargs, **cparams)
  File "/home/nesnoj/miniconda3/envs/py310_open_mastr_546/lib/python3.10/site-packages/psycopg2/__init__.py", line 122, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) connection to server at "localhost" (127.0.0.1), port 5432 failed: FATAL:  password authentication failed for user "mastr"
connection to server at "localhost" (127.0.0.1), port 5432 failed: FATAL:  password authentication failed for user "mastr"

(Background on this error at: https://sqlalche.me/e/20/e3q8)
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/nesnoj/git-repos/OpenEnergyPlatform/open-MaStR/open-MaStR_546_parsing_speed/testing.py", line 17, in <module>
    db.download(data="solar")# solar
  File "/home/nesnoj/git-repos/OpenEnergyPlatform/open-MaStR/open-MaStR_546_parsing_speed/open_mastr/mastr.py", line 244, in download
    write_mastr_xml_to_database(
  File "/home/nesnoj/git-repos/OpenEnergyPlatform/open-MaStR/open-MaStR_546_parsing_speed/open_mastr/xml_download/utils_write_to_database.py", line 65, in write_mastr_xml_to_database
    future.result()
  File "/home/nesnoj/miniconda3/envs/py310_open_mastr_546/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/home/nesnoj/miniconda3/envs/py310_open_mastr_546/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) connection to server at "localhost" (127.0.0.1), port 5432 failed: FATAL:  password authentication failed for user "mastr"
connection to server at "localhost" (127.0.0.1), port 5432 failed: FATAL:  password authentication failed for user "mastr"

(Background on this error at: https://sqlalche.me/e/20/e3q8)

Process finished with exit code 1

AlexandraImbrisca · 2025-01-30T21:39:34Z

Thanks a bunch for finding this bug! I was using an unauthenticated database and I didn't realise that this could be an issue. The connection_url obfuscates the password so I updated the code to properly set the password. Could you please try again and let me know if you see the same issue?

nesnoj

These two small things needed a fix, I patched..
Now it works fine with psql, thank you!

open_mastr/xml_download/utils_write_to_database.py

AlexandraImbrisca · 2025-01-31T20:31:53Z

Thank you for spotting the issues and fixing them! If you any other suggestions, please let me know

FlorianK13 · 2025-02-06T08:39:15Z

Is this the version now that should be merged to develop and released afterwards? If yes, I would start with the comparison of the two databases:

downloaded with this branch
downloaded with open-mastr from pypi

AlexandraImbrisca · 2025-02-06T11:28:51Z

Yes, I think this is the final version (unless we find any other bugs/suggestions). If you can help testing, that would be great! I will also test a bit more

FlorianK13 · 2025-02-06T12:49:39Z

Did you test on windows? Without setting os.environ, my program immediatly crashes:

concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

AlexandraImbrisca · 2025-02-07T10:38:01Z

Could you please try again and let me know if any other error is being printed? I unfortunately don't have my own Windows system and I only tested the previous version before adding the os.environ variables. I'll try accessing Windows today and test the code again

AlexandraImbrisca · 2025-02-07T11:26:46Z

I just tested on Windows 11 and I had no issues. I tried with WSL 2.0 and similarly, the program is running correctly. I tested without setting os.environ as well as with setting each of the fields.

FlorianK13 · 2025-02-17T12:09:50Z

Just saw it now, I'll work on this hopefully within this or next week.

FlorianK13 · 2025-02-24T11:47:24Z

@AlexandraImbrisca
Running this script:

from open_mastr import Mastr
import os

os.environ["NUMBER_OF_PROCESSES"] = "1"
db = Mastr()
db.download(date="existing", data="solar")

throws this error:

    concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

    RuntimeError:
    An attempt has been made to start a new process before the
    current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

    To fix this issue, refer to the "Safe importing of main module"
    section in https://docs.python.org/3/library/multiprocessing.html

It refers to:

\xml_download\utils_write_to_database.py", line 67, in write_mastr_xml_to_database future.result()
\utils_write_to_database.py", line 63, in write_mastr_xml_to_database
futures = [
\utils_write_to_database.py", line 64, in
executor.submit(process_xml_file, *item) for item in interleaved_files

I'm using python 3.11.8 in a conda env on Windows.

AlexandraImbrisca · 2025-02-24T12:58:13Z

Oh, could you please try again with the following snippet?

from open_mastr import Mastr
import os

os.environ["NUMBER_OF_PROCESSES"] = "1"
db = Mastr()

if __name__ == "__main__":
      db.download(date="existing", data="solar")

This condition is necessary to ensure that the program doesn't attempt to recreate the pool for every new process. It's already part of main.py. Without this condition, multiprocessing will always break on Windows/MacOS AFAIU

FlorianK13 · 2025-02-24T13:10:09Z

This solved the issue. However I'm sure that many users don't have a ``if name == "main":` check in their code. Is there a way to only use multiprocessing if os.environ["NUMBER_OF_PROCESSES"] = "SomeNumber" is given?

Because otherwise our version update would break their code.

nesnoj · 2025-02-24T14:39:14Z

In my tests above it always worked without checking the top-level code environment by __name__ == '__main__' (linux) 🤔

FlorianK13 · 2025-02-24T14:49:14Z

Yes, I guess this is a problem appearing only on Windows.

AlexandraImbrisca · 2025-02-25T20:19:53Z

Great idea! I updated the code to not use multiprocessing unless one of the 2 options is set (USE_RECOMMENDED_NUMBER_OF_PROCESSES / NUMBER_OF_PROCESSES). I also added a note to the documentation about if __name__ == "__main__"

AlexandraImbrisca · 2025-03-04T07:32:32Z

@FlorianK13 @nesnoj small reminder if you can review these changes again! 🙏🏻

nesnoj · 2025-03-06T14:08:39Z

@FlorianK13

FlorianK13 · 2025-03-10T08:03:25Z

A simple

db = Mastr()
db.download(date="existing", data="wind")

on windows now works again 👍

FlorianK13

Only two small changes. After that I suggest we can merge this branch and start with #602 on the development branch. Do you agree @nesnoj ?

open_mastr/xml_download/utils_write_to_database.py

nesnoj · 2025-03-10T09:33:26Z

Only two small changes. After that I suggest we can merge this branch and start with #602 on the development branch. Do you agree @nesnoj ?

yes, go for it!

AlexandraImbrisca · 2025-03-10T21:37:55Z

Sounds great @FlorianK13, thank you! I fixed the ruff linter warnings

AlexandraImbrisca added 6 commits January 22, 2025 20:42

Move Mastr initialization

0c0c945

If it's not done inside of the "if __name__ == "__main__"", it will be recalled inside every new process on Mac/Windows

Use a ProcessPoolExecutor to process multiple files at once

4a5e427

Move the tables creation inside of the process_xml_file

470a90c

Set the maximum number of processes to 3 as it was shown optimal in t…

418a554

…he analysis

Add unit test for new function

76502cf

Remove print

436bb54

Since the processing is now async, this print might confuse the users

AlexandraImbrisca added 6 commits January 28, 2025 19:29

Set default number of processes to 1 and add warning for values too l…

ccf6b55

…arge

Fix PostgreSQL incompatibility and add comments for database options

6719194

Instead of "timeout", we can use "connect_timeout" which works for both SQLite and PostgreSQL

Add USE_RECOMMENDED_NUMBER_OF_PROCESSES

98cff60

Add check to introduce the new column only if not defined already

e1a199d

Catch & ignore "duplicate column name" exception

ee9b1af

Add message introducing parallelized processing

5aecc8f

AlexandraImbrisca changed the title ~~Use multiprocessing to speed up the parsing~~ [Feature #600]: Use multiprocessing to speed up the parsing Jan 28, 2025

AlexandraImbrisca added 3 commits January 28, 2025 20:32

Update CHANGELOG.md

d76db32

Add missing columns in comments

8efc3d8

Update docs to include new environment variables

2b196e8

AlexandraImbrisca added 3 commits January 28, 2025 21:07

Remove unnecessary import

139bedd

Separate SQLite-only options

506e270

Adapted timeout option per engine

1d90469

AlexandraImbrisca added 3 commits January 30, 2025 22:11

Replace obfuscated password

d844c2b

Add quotes to comments

136abe8

Use regex to generalize password replacement

351faaa

nesnoj mentioned this pull request Jan 31, 2025

Verify data integrity after speed improvements #602

Closed

nesnoj reviewed Jan 31, 2025

View reviewed changes

open_mastr/xml_download/utils_write_to_database.py Show resolved Hide resolved

open_mastr/xml_download/utils_write_to_database.py Outdated Show resolved Hide resolved

nesnoj requested a review from FlorianK13 January 31, 2025 21:35

Add try catch in process_xml_file

fd08980

AlexandraImbrisca added 3 commits February 25, 2025 21:10

Use ProcessPoolExecutor only if the user has opted for parallelisation

68ddeac

Add note about if __name__ == "__main__"

30df437

Escape __ in documentation

26a3e82

FlorianK13 requested changes Mar 10, 2025

View reviewed changes

open_mastr/xml_download/utils_write_to_database.py Outdated Show resolved Hide resolved

open_mastr/xml_download/utils_write_to_database.py Outdated Show resolved Hide resolved

Fix ruff linter warnings

563a9f3

FlorianK13 approved these changes Mar 11, 2025

View reviewed changes

FlorianK13 merged commit e277d4f into OpenEnergyPlatform:develop Mar 11, 2025
0 of 9 checks passed

nesnoj mentioned this pull request Apr 19, 2025

Release v0.15.0 #633

Merged

[Feature #600]: Use multiprocessing to speed up the parsing #601

[Feature #600]: Use multiprocessing to speed up the parsing #601

Uh oh!

Conversation

AlexandraImbrisca commented Jan 26, 2025

Uh oh!

nesnoj commented Jan 27, 2025

Uh oh!

AlexandraImbrisca commented Jan 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nesnoj commented Jan 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FlorianK13 commented Jan 28, 2025

Uh oh!

AlexandraImbrisca commented Jan 28, 2025

Uh oh!

nesnoj commented Jan 29, 2025

Uh oh!

AlexandraImbrisca commented Jan 30, 2025

Uh oh!

nesnoj left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

AlexandraImbrisca commented Jan 31, 2025

Uh oh!

FlorianK13 commented Feb 6, 2025

Uh oh!

AlexandraImbrisca commented Feb 6, 2025

Uh oh!

FlorianK13 commented Feb 6, 2025

Uh oh!

AlexandraImbrisca commented Feb 7, 2025

Uh oh!

AlexandraImbrisca commented Feb 7, 2025

Uh oh!

FlorianK13 commented Feb 17, 2025

Uh oh!

FlorianK13 commented Feb 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

throws this error:

It refers to:

Uh oh!

AlexandraImbrisca commented Feb 24, 2025

Uh oh!

FlorianK13 commented Feb 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nesnoj commented Feb 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FlorianK13 commented Feb 24, 2025

Uh oh!

AlexandraImbrisca commented Feb 25, 2025

Uh oh!

AlexandraImbrisca commented Mar 4, 2025

Uh oh!

nesnoj commented Mar 6, 2025

Uh oh!

FlorianK13 commented Mar 10, 2025

Uh oh!

FlorianK13 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

nesnoj commented Mar 10, 2025

Uh oh!

AlexandraImbrisca commented Mar 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

AlexandraImbrisca commented Jan 27, 2025 •

edited

Loading

nesnoj commented Jan 28, 2025 •

edited

Loading

nesnoj left a comment •

edited

Loading

FlorianK13 commented Feb 24, 2025 •

edited

Loading

FlorianK13 commented Feb 24, 2025 •

edited

Loading

nesnoj commented Feb 24, 2025 •

edited

Loading