Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify mi query #11

Merged
merged 27 commits into from
May 16, 2017
Merged

Simplify mi query #11

merged 27 commits into from
May 16, 2017

Conversation

pralitp
Copy link
Contributor

@pralitp pralitp commented Mar 31, 2017

A whole suite of changes. Originally intending to make querying the model interface easier by no longer having to generate temporary batch file templates writing CSV to disk and reading it back in. But feature creep quickly set in as I starting using it to do analysis.

Now queries are sent to the model interface and queries results parsed without going to disk. We did not go as far as using the rJava package mostly because I wanted to add a feature where queries can be made on a remote server in which queries are made and results transferred over HTTP requests. If a user is solely running queries on a remote sever then they do not even need to have Java installed.

Thus addScenario now takes as a first argument an S3 generic object which can be either a localDBConn or a remoteDBConn which behind the scenes runs the queries appropriately.

In addition some more "add" methods to import data into a project. The first is addSingleQuery which is similar to addScenario except a user doesn't have to bother writing a full batch query file it the only need to make a couple of queries. Also addMIBatchCSV which is mostly leveraging functionality that was already there (parse a Model Interface batch CSV results file) to add to a project file but provides a consistent interface, for instance to specify transformations. To do this cleanly I refactored some code so each of these methods are just using addQueryTable under the hood. This also resolves #7.

There are a number of other methods that are used under the hood but are exposed for flexibility. They are all explained better than they could be here in the "rgcam-usecases" vignette.

temporary XML/CSV files).  Note we still do not communicate with BaseX
directory via rjava since it is not yet apparent to me that the
additional complexity is worth the benefit.

We have two ways to communicate:
* A "localDBConn" that calls BaseX in standalone mode.
* A "remoteDBConn: that calls BaseX HTTP Server via REST api.

A user can then call runQueries with:
* Either of those two DB connection objects
* An MI query (either the XML or a query for the XML)
* list of scenario
* list of regions
And the user gets back the query results as a "long" data.frame

Still TODO:
* A lot of error checking
* Maybe unit tests
* Change around the rest of rgcam to use the new capabilities.
* Add utilities to parse a Batch Query file and run with this.
method of running batch query with the new one in addScenrio.
(especially with local DB queries).  We are now using readr to parse_csv
and dplyr to aggregate MI query results.
Begin to consolidate functionality so that addQueryTable is the backbone
for adding data to the project and is called by drivers such as:
* Running a batch query file (addScenario)
* Running a single query (newly added addSingleQuery)
* Parsing batch query CSV results (newly added addMIBatchCSV)

Note, each of the new methods for adding data uses the same paradigm as addScenario.
that this version of rgcam relies on heavily.  WARNING: this version is
using an updated version of BaseX which is not compatible with Java 1.6.
and description.  Set username/password for remoteDBConn using the
httr::authenticate instead of in the URL to avoid odd characters in the
URL problems.
Fix some bugs in handling empty scenario list.
in the DB results (such as land queries) would sometimes report the
scenario name/date multiple times for the same row of data.

Typo and doc fixes.  Add parallel package as a Suggests.
the ModelInterface directly (since `pipe` doesn't support it) and move
the flag to the localDBConn.

Fix up tests so they now run with the simplify-mi-query changes.
the issues trying to maintain this attribute.  Tests have been updated.

Fix bug that 'date' attr wasn't being added in `addQueryTable` that was
causing a test to fail.
…ueries in the

vignette examples.
Error: processing vignette 'rgcam-usecases.Rmd' failed with diagnostics:
4 simultaneous processes spawned
Execution halted
Update some documentation, warnings for consistency.
@pralitp pralitp requested a review from rplzzz March 31, 2017 19:21
@pralitp
Copy link
Contributor Author

pralitp commented Mar 31, 2017

Oops, I should bump the version number.. but to what?

@rplzzz
Copy link
Contributor

rplzzz commented Apr 4, 2017

Let's update to 0.4.0 for now. We're getting close to being able to go to 1.0, but I'd like to leave us some flexibility to rejigger the interface once we get some feedback from live users. Once we've had a chance to incorporate their feedback, we can lock that in.

@rplzzz
Copy link
Contributor

rplzzz commented Apr 4, 2017

Hopefully I'll be able to review and merge this week. I've got a couple of other things ahead in the queue.

@codecov
Copy link

codecov bot commented Apr 5, 2017

Codecov Report

Merging #11 into master will increase coverage by 6.71%.
The diff coverage is 92.3%.

@@            Coverage Diff             @@
##           master      #11      +/-   ##
==========================================
+ Coverage   85.41%   92.13%   +6.71%     
==========================================
  Files           5        5              
  Lines         384      445      +61     
==========================================
+ Hits          328      410      +82     
+ Misses         56       35      -21
Impacted Files Coverage Δ
R/importmisc.R 92.06% <100%> (+7.58%) ⬆️
R/querymi.R 85.93% <85.93%> (ø)
R/manageProjectData.R 91.61% <94.23%> (+4.2%) ⬆️
R/importgcam.R 94.21% <95.45%> (+5.09%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 56c932b...1505d8e. Read the comment docs.

@codecov
Copy link

codecov bot commented Apr 5, 2017

Codecov Report

Merging #11 into master will increase coverage by 6.71%.
The diff coverage is 92.3%.

@@            Coverage Diff             @@
##           master      #11      +/-   ##
==========================================
+ Coverage   85.41%   92.13%   +6.71%     
==========================================
  Files           5        5              
  Lines         384      445      +61     
==========================================
+ Hits          328      410      +82     
+ Misses         56       35      -21
Impacted Files Coverage Δ
R/importmisc.R 92.06% <100%> (+7.58%) ⬆️
R/querymi.R 85.93% <85.93%> (ø)
R/manageProjectData.R 91.61% <94.23%> (+4.2%) ⬆️
R/importgcam.R 94.21% <95.45%> (+5.09%) ⬆️

@rplzzz
Copy link
Contributor

rplzzz commented May 4, 2017

I've been slacking on this one. I'll get to it today or tomorrow.

@rplzzz
Copy link
Contributor

rplzzz commented May 15, 2017

@pralitp Is it necessary to close a db connection opened with localDBConn or remoteDBConn? If so, then we should say so in the doc files for those functions. If not, then how are the resources allocated by those functions (especially the network socket allocated by the latter)? Does it get taken care of when the GC runs?

Also, when I test the package I get a system alert indicating that R is asking for a firewall exception. I'm guessing it's related to opening up a remote connection, for testing. Is this expected? Is there any way we can avoid this sort of alert? When you run a shiny app it starts a server on the loopback interface, and it doesn't trigger an alert, so it ought to be possible.

@pralitp
Copy link
Contributor Author

pralitp commented May 15, 2017

Is it necessary to close a db connection opened with localDBConn or remoteDBConn? If so, then we should say so in the doc files for those functions. If not, then how are the resources allocated by those functions (especially the network socket allocated by the latter)? Does it get taken care of when the GC runs?

No, it is not necessary to close them. At create these objects only track the information they need to run queries on a database. Any sort of connection is open and closed each time a query is run (pipe1 when using localDBConn and an HTTP connection when using remoteDBConn.

Also, when I test the package I get a system alert indicating that R is asking for a firewall exception. I'm guessing it's related to opening up a remote connection, for testing. Is this expected? Is there any way we can avoid this sort of alert? When you run a shiny app it starts a server on the loopback interface, and it doesn't trigger an alert, so it ought to be possible.

No it is not expected and I don't recall seeing that. I have not written any tests for querying a remote DB server since I didn't know how to configure and launch such a server reliably from the tests (would travis allow such a thing?). There is one exception where I have a test to ensure failure when opening a remoteDBConn to a server that is not running. But I wouldn't think you would have to open a port to make an outgoing HTTP request. Note I'm using the httr package for making these requests. Maybe they are doing something funny under the hood?

@rplzzz
Copy link
Contributor

rplzzz commented May 15, 2017

I'll investigate further and see what I can come up with. If you deny the firewall request when it comes up it doesn't recur, and there doesn't seem to be any impact on the package functionality. So, I'm pretty sure it will be an easy fix once we locate the source of the problem.

@rplzzz
Copy link
Contributor

rplzzz commented May 15, 2017

Made a few documentation updates and whatnot. I'll take a quick stab at tracking down the firewall alert tomorrow, but if it's too much of a pain to locate, we'll just live with it for now.

@rplzzz
Copy link
Contributor

rplzzz commented May 16, 2017

It's definitely the makeCluster() command in the vignette. I'd have thought that switching to a fork() cluster would cure the problem, but apparently even then the master process tries to open a socket for listening. At this point, I'm just going to switch the code to not-run so as to avoid the alert.

Copy link
Contributor

@rplzzz rplzzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go. We can merge as soon as the checks finish.

@rplzzz rplzzz merged commit 1d24c55 into master May 16, 2017
@rplzzz rplzzz deleted the simplify-mi-query branch May 16, 2017 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow addScenario() to accept a vector of names of scenarios to add.
2 participants