Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python read attributes from data Record #114

Closed
edigiacomo opened this issue May 17, 2018 · 5 comments
Closed

Python read attributes from data Record #114

edigiacomo opened this issue May 17, 2018 · 5 comments
Assignees
Milestone

Comments

@edigiacomo
Copy link
Member

In order to read the attribute data, I call db.attr_query_data - and the DB executes an SQL query filtering by the context_id of the record:

...
for rec in db.query_data(query):
    attrs = db.attr_query_data(rec["context_id"])
    ...

Dballe v7 stores the attributes with the data, then the attributes could be retrieved without executing another query.

It's possible to avoid the attribute query and instead retrieve the attribute with the data?

I don't know if it's an enhancement or not, so I use the "question" label.

@spanezz
Copy link
Contributor

spanezz commented Jun 5, 2018

It would be possible, but whether it would be useful depends a lot on what one does with the query.
If one just wants to read the data, always querying the attributes together with the data would increase network traffic significantly; on the other hand, querying attributes for each data increases latency and number of queries massively.
It would be interesting to be able to hint dballe that one also wants the attributes, and in that case, dballe can directly give you variables with attributes, and attr_query_data would become unneeded.
Another useful hinting one can do is telling dballe that one is querying data with the intention to update it: in that case, since we run in a transaction, I can remember that a given value comes from the DB, and on update directly issue an UPDATE instead of querying the DB first to see if I should do an INSERT or an UPDATE. However, in the use case of querying tons of data, streaming them out and forgetting them as they come, this means keeping in memory a little something for each data queried, so memory usage becomes proportional to the query size, unless one is directly exporting to bufr. On the other hand, this last thing could already be the case in most uses.
Since V7 stores attributes with the data, the two are related: if one is querying data to add attributes, the second optimization also applies.
One option is using the query key, which already contains hints, and say things like query=attrs,update. The query key is starting to look like the drawer where one puts all the bits of string and buttons and odd pieces of metal that don't fit elsewhere, though, so I wonder if there can be a better way.
If we can't find a better way, we can always dedicate that key to Anoia.

@edigiacomo
Copy link
Member Author

Thanks Enrico. My use case is reading a lot of data as fast as possible. Using the query key seems ok to me - but dedicating this key to Anoia conflicts with resolving the documentation issues 😄

@spanezz
Copy link
Contributor

spanezz commented Jun 7, 2018

Fixed in be388a1.

Now you can specify query=attrs. It turns out that the behaviour was already implemented to implement a faster data export, and it was just a matter of exposing it in the API.

I then implemented Record.attrs in Python to access attributes. Your code can now be rewritten as:

query["query"] = "attrs"
for rec in db.query_data(query):
    attrs = rec.attrs(rec["var"])

@edigiacomo
Copy link
Member Author

Reopening the issue because using v7.35 or v7.36 it seems that query="attrs"stopped working:

(Input file giralda.bufr.gz)

$ dbadb import --wipe-first --dsn=sqlite:db.sqlite3 giralda.bufr
$ dbadb export --dsn=sqlite:db.sqlite3 | dbamsg dump --interpreted
#0[0] generic message, rep_memo: , 44.81420,12.24842, ident: , dt: 2018-10-29T11:00:00, 2 contexts:
Level 103,2000,-,-, tr 254,0,0 1 vars:
012101 TEMPERATURE/DRY-BULB TEMPERATURE(K): 238.45
           033007 PER CENT CONFIDENCE(%): 0
Level -,-,-,-, tr -,-,-, 5 vars:
001019 LONG STATION OR SITE NAME(CCITTIA5): Giralda
001194 [SIM] Report mnemonic(CCITTIA5): icirfe
005001 LATITUDE (HIGH ACCURACY)(DEGREE): 44.81420
006001 LONGITUDE (HIGH ACCURACY)(DEGREE): 12.24842
007030 HEIGHT OF STATION GROUND ABOVE MEAN SEA LEVEL (SEE NOTE 3)(M): 3.0
#1[0] generic message, rep_memo: , 44.81420,12.24842, ident: , dt: 2018-10-29T11:30:00, 2 contexts:
Level 103,2000,-,-, tr 254,0,0 1 vars:
012101 TEMPERATURE/DRY-BULB TEMPERATURE(K): 238.45
           033007 PER CENT CONFIDENCE(%): 0
Level -,-,-,-, tr -,-,-, 5 vars:
001019 LONG STATION OR SITE NAME(CCITTIA5): Giralda
001194 [SIM] Report mnemonic(CCITTIA5): icirfe
005001 LATITUDE (HIGH ACCURACY)(DEGREE): 44.81420
006001 LONGITUDE (HIGH ACCURACY)(DEGREE): 12.24842
007030 HEIGHT OF STATION GROUND ABOVE MEAN SEA LEVEL (SEE NOTE 3)(M): 3.0

But

$ python3 <<EOF
import dballe
db = dballe.DB.connect_from_file("db.sqlite3")
for row in db.query_data(dballe.Record(lon=12.24842, var="B12101", trange=(254, 0, 0), query="attrs")):
    attrs = row.attrs(row["var"])
    print(list(attrs.keys()))
EOF
[]
[]

@edigiacomo
Copy link
Member Author

@spanezz I modified the test testQueryDataAttrs in branch issue114 (commit 34b8aac) and now it's consistent with the issue (i.e. it fails).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants