In [1]:
%run StdPackages.ipynb

*Run non-standard packages:*

In [2]:
import gmdcc, gamstransfer
os.chdir(d['py'])
import Database_old
os.chdir(d['curr'])

# Comparison of speed for loading/writing data

This notebook compares the speed of writing/reading data from/to GAMS with various methods: (1) Old database version. (2) GAMS' own package ```gamstransfer```, (3) New database version (from ```gams.GamsDatabase``` and ```gams.GamsDataBase._gmd``` attributes).


We refer to *databases with large symbols* as including variables/sets/mappings with 1 million records. We refer to *databases with many symbols* as one including roughly 100 relatively small symbols. The overall conclusion is that the new ```Database``` methods are **by far the fastest** for loading all types of data, the fastest at writing large symbols, and almost as fast as writing many small symbols as the old database and much faster than the ```gamstransfer``` methods. More specifically:
* ```Database_old:```
    * *Loading:* Very inefficient at loading large symbols (37 times slower than best performance). Very inefficient at loading many smaller symbols (9.5 times slower than best performance).
    * *Writing:* Relatively inefficient at writing large symbols (5.4 times slower than best performance). **Best performance** for writing many smaller symbols with an average of **0.279 seconds** per run.
* ```gamstransfer:```
    * *Loading:* Relatively inefficient at loading large symbols (2.7 times slower than best performance). *Cannot run with aliased symbols.*
    * *Writing:* Relatively inefficient at writing large symbols (2.7 times slower than best performance). Very inefficient at writing many smaller symbols (14.7 times slower than best performance).
* ```Database:```
    * *Loading, from ```GamsDatabase```:* (Almost) **best performance** for loading large symbols with an average of **2.89 seconds** per run. (Almost) **best performance** for loading many smaller symbols with an average of **90 milliseconds**. 
    * *Loading, from ```GamsDatabase._gmd```:* **Best performance** for loading large symbols with an average of **2.84 seconds** per run. **Best performance** for loading many smaller symbols witn an average of **89 milliseconds**. Has the added bonus of also reading in information on aliased symbols.
    * *Writing:* **Best performance** for writing large symbols with an average of **7.91 seconds** per run. Relatively efficient for writing many smaller symbols (1.4 times slower than best performance).

### 1: Load test data sets

*Load test databases: Test file 1 has large sets and variables, test file 2 has a lot of symbols and types.*

In [3]:
fs = [f"{d['data']}\\test_size1000000.gdx", f"{d['data']}\\baselinerun.gdx"] # files
ws = gams.GamsWorkspace() 
g2np = gams2numpy.Gams2Numpy(ws.system_directory)
rc = gmdcc.new_intp()
dbs = {'gmd1': ws.add_database_from_gdx(fs[0]), 'gmd2': ws.add_database_from_gdx(fs[1])}
dbs.update({'gpydict1': Database.dict_from_GamsDatabase(dbs['gmd1'],g2np), 'gpydict2': Database.dict_from_GamsDatabase(dbs['gmd2'],g2np)})
dbs.update({'gpm1': Database_old.GPM_database(db=dbs['gmd1']), 'gpm2': Database_old.GPM_database(db=dbs['gmd2'])})

### 2: Load dataset

#### 2.1: Load large files

In [4]:
db = dbs['gmd1']

*Old version (simply takes too much time to run a lot of times):*

In [5]:
%%timeit -r 1 -n 1
db_old = Database_old.GPM_database.PM_from_gdx(db)

1min 43s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


*gamstransfer container:*

In [6]:
%%timeit -r 3 -n 10
db_gt = gamstransfer.Container(db)

7.47 s ± 122 ms per loop (mean ± std. dev. of 3 runs, 10 loops each)


*Database using the ```gams.GamsDatabase``` instance:*

In [7]:
%%timeit -r 3 -n 10
db_gms = Database.dict_from_GamsDatabase(db,g2np)

2.89 s ± 155 ms per loop (mean ± std. dev. of 3 runs, 10 loops each)


*Database using ```gams.GamsDatabase._gmd```:*

In [8]:
%%timeit -r 3 -n 10
db_gmd = Database.dict_from_GmdDatabase(db._gmd,g2np)

2.99 s ± 93.4 ms per loop (mean ± std. dev. of 3 runs, 10 loops each)


#### 2.2: Load files with many symbols/different types

In [9]:
db = dbs['gmd2']

*The old database version*

In [10]:
%%timeit -r 3 -n 10
db_old = Database_old.GPM_database.PM_from_gdx(db)

914 ms ± 54.6 ms per loop (mean ± std. dev. of 3 runs, 10 loops each)


*gamstransfer container: Cannot run with aliases in the database - crashes the kernel.*

In [11]:
# %%timeit -r 3 -n 10
# db_gt = gamstransfer.Container(db)

*Database using the ```gams.GamsDatabase``` instance:*

In [11]:
%%timeit -r 3 -n 10
db_gms = Database.dict_from_GamsDatabase(db,g2np)

90.6 ms ± 3.12 ms per loop (mean ± std. dev. of 3 runs, 10 loops each)


*Database using ```gams.GamsDatabase._gmd```:*

In [12]:
%%timeit -r 3 -n 10
db_gmd = Database.dict_from_GmdDatabase(db._gmd,g2np)

89.1 ms ± 1.92 ms per loop (mean ± std. dev. of 3 runs, 10 loops each)


### 3: Write to gams.GamsDatabase

#### 3.1: Write large files

In [16]:
db = dbs['gpydict1']

*Old version: results in roughly 43 sec per run:*

In [17]:
%%timeit -r 3 -n 5
new_db = ws.add_database()
[Database_old.GPM_database.gpy2gams(new_db, symbol) for symbol in dbs['gpm1']]

42.6 s ± 1.09 s per loop (mean ± std. dev. of 3 runs, 5 loops each)


*Using ```gamstransfer.Container```:*

In [18]:
%%timeit -r 3 -n 5
new_db = ws.add_database()
container = gamstransfer.Container()
[Database.gamstransfer_from_py_(symbol,container) for symbol in db.values()]
container.write(write_to=new_db)

21 s ± 297 ms per loop (mean ± std. dev. of 3 runs, 5 loops each)


*Using ```gams2numpy``` methods directly:*

In [19]:
%%timeit -r 3 -n 5
new_db = ws.add_database()
Database.gams_from_db_py(db.values(),new_db,g2np)

7.91 s ± 406 ms per loop (mean ± std. dev. of 3 runs, 5 loops each)


#### 3.2: Write files with many symbols/different types

In [20]:
db = dbs['gpydict2']

*Old version: Fastest when we loop over symbols.*

In [21]:
%%timeit -r 3 -n 5
new_db = ws.add_database()
[Database_old.GPM_database.gpy2gams(new_db, symbol) for symbol in dbs['gpm2']]

279 ms ± 12.8 ms per loop (mean ± std. dev. of 3 runs, 5 loops each)


*Using ```gamstransfer.Container```:*

In [22]:
%%timeit -r 3 -n 5
new_db = ws.add_database()
container = gamstransfer.Container()
[Database.gamstransfer_from_py_(symbol,container) for symbol in db.values()]
container.write(write_to=new_db)

4.13 s ± 310 ms per loop (mean ± std. dev. of 3 runs, 5 loops each)


*Using ```gams2numpy``` methods directly:*

In [24]:
%%timeit -r 3 -n 5
new_db = ws.add_database()
Database.gams_from_db_py(db.values(),new_db,g2np)

393 ms ± 6.51 ms per loop (mean ± std. dev. of 3 runs, 5 loops each)
