Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upcoming changes to request.py #58

Open
cnlg-lanl opened this issue May 1, 2023 · 1 comment
Open

Upcoming changes to request.py #58

cnlg-lanl opened this issue May 1, 2023 · 1 comment
Assignees
Milestone

Comments

@cnlg-lanl
Copy link
Contributor

cnlg-lanl commented May 1, 2023

We are planning on making some upcoming changes to request.py. For the time being, the existing functions will still be there (despite using the same names in these examples) but will be given a deprecation warning.

We recently implemented wildcard functionality for both ["*" , "?"] and the sql ["%'", "_"] wildcard notations and all existing functions have been updated to accept wildcards in lists, tuples, and comma-separated strings ("HH*,?DF") for all parameters with string inputs (stations, networks, channels, phases, etc).

We are also looking to implement stationxml and obspy inventory object (and possibly QuakeML and Catalog objects) support (#18 ) within parameters supported by CSS3.0-like databases such as not having location codes. There isn't a great way to implement this without turning something like get_stations into large and unwieldy functions that don't work well for smaller queries. As such we are looking to break these queries into more modular functions that can be combined more flexibly that can still be used to build inventory objects.

For example, right now, get_stations only returns results from the site table though you can filter those results by networks and channels. What we are looking at implementing is a series of functions such as get_stations, get_channels, get_networks, etc as well as an additional set that can take the query output and filter further such as join_stations, join_channels, join_networks, etc.

Note: Function names shown as an example and mentioned above are not final and are primarily for example purposes.

For example:

station_query = get_stations(site, stations = [<list of stations>], starttime = None, endtime = None, <other filtering options>, return_table = True, asquery = True)

which will then serve as input to something such as

filtered_query = join_channels(station_query, sitechan, joinon = site.sta==sitechan.sta, channels = [<list of channels>], starttime = None, <other options>, return_table =True, asquery = True)

which would return the results of the site and sitechan table based on the filtering options included.

This would be implemented for the minimum number of tables needed to build a complete inventory object: network, affiliation, site, sitechan, sensor, and instrument. This would also be true of Catalog objects.

Lastly, we would like to note there are ways in which CSS3.0 like databases are incompatible with stationxml (quakeml?) formats that will require clearly explained workaround. Namely, CSS3 tables do not support a location code so a location code will not be included, but location code is a minimum requirement of the stationxml format so will have to be included with a dummy variable of some sort. I usually use a double underscore '__' when I make these for myself. Multiple networks can also be mapped to a single station and we would like to implement some sort of preferred_network list, though we are still thinking through what this would like.

We will begin working on this shortly, but would appreciate any feedback the community would like to provide.

@jkmacc-LANL
Copy link
Member

This is great, @cnlg-lanl !

There isn't a great way to implement this without turning something like get_stations into large and unwieldy functions that don't work well for smaller queries.

Yes, this is the big one for me. Functions that accept tons of tables and arguments are already getting unsustainable.

I like the syntax above, but it still feels like there are a lot of parameters. It's almost like our own little query language, which could become hard to follow. One idea that could hide a lot of the boilerplate (which tables to use, which joins to do), would be to write a class😱 As much as I'd hoped to avoid it, the syntax could be quite nice to use, e.g.:

from pisces import DBClient

tables = {
    'wfdisc': 'owner.wfdisc',
    'site': 'owner.site',
    'sitechan': 'owner.sitechan',
    ...
}
db = DBCLient('oracle://scott:tiger@myserver.domain:1521/mydb', **tables)
# db.Wfdisc, db.Site, ..., db.session, db.query are all attributes on the instance

# query Origin table, join with the Netmag table and include the joined columns
db.origins(region=[W, E, S, N], depth=[10, 30]).magnitudes(ml=[2, 3], include=True)

In this example, db.<table method> triggers a self._query = self.session.query(<the table targeted by the method>) under the hood, and returns self which can be chained along with other .<table method> calls that append .filter(...) calls to the hidden self._query object. When you're done, you access the query directly to do what you want with it, like whatever().query.all() or something.

This hides a lot of the boring stuff, like which tables are being used (db.origins(...) is obviously targeting the Origin table, which were already supplied at the beginning), and which joins are being used (db.origins(...).magnitudes(...) is obviously joining the Origin table to the Netmag table on the orid).

There'd have to be some introspection in these .<table method> calls, so that methods can figure out if they're the first one in a chain (i.e. they're meant to be in the select clause).

It's not all clear exactly how this'd work, but it is readable and it does solve some of the problems you've identified above, so I thought I'd add it to the pile.

@jkmacc-LANL jkmacc-LANL added this to the v0.5.0 milestone May 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants