Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 2307 v1 #2314

Merged
merged 17 commits into from Nov 20, 2019
Merged

Fix 2307 v1 #2314

merged 17 commits into from Nov 20, 2019

Conversation

nathandunn
Copy link
Contributor

@nathandunn nathandunn commented Nov 7, 2019

fixes #2307

Replaces my bad merge in #2310

  • get tests to pass by uploading new data
  • delete old sequences
  • add new sequence if not there
  • change length of sequence if not there

@nathandunn nathandunn added this to In progress in 2.5.0 via automation Nov 7, 2019
@nathandunn
Copy link
Contributor Author

@abretaud I'll continue to take a look, but feel free to push here, as well.

@nathandunn
Copy link
Contributor Author

nathandunn commented Nov 10, 2019

@abretaud I added a change to allow sequence to be writeable, which was causing errors in some of my changes.

I guess the thing I'm a little unsure of whether you want to allow deletion / update of Sequence / Scaffold objects in the code, or just the addition of them. If you allow deletion, you have to delete all of the annotations associated with them (or explicitly check and disallow them). If you allow updates, then this code should work. My intuition is that we simply don't allow alteration of existing sequences, just addition of new scaffolds. From the original problem description, it sounded like you only wanted to be able to add them.

I think the current issue now is that some of the tests fail (4) because the organism is created with a genomic FASTA sequence and then updated for a refSeq.json. Because the original keeps the FASTA sequence it gets confused, so I think it ends up creating a second sequence.

Anyway, my recommendation:

@abretaud
Copy link
Collaborator

@abretaud rewrite tests to do updates for genomic data only or refSeq.json data only

Hum, it should be fasta for all test data I'm using, it's the dataset_x_files dirs in https://github.com/galaxy-genome-annotation/python-apollo/tree/2c916d295776c6a57dd015464dd7b1b30eb79b7f/test-data
I can have a look at it tomorrow maybe

@abretaud decides if he wants to allow removals, or updates of sequences, or simply additions only (I am voting for the latter)

  • additions -> yes, we need it for sure

  • removals -> yes, I don't want to keep sequences that were removed in the genome sequence. So I guess removing annotations/alterations on it would make sense

  • updates -> I know it sounds scary, but there are valid cases where you want to do it (e.g. you created an organism, you realise there is an error in one scaffold sequence that no one has worked on => you just want to update the sequence, and forget about the old one). I guess it's ok to delete annotations/alterations too as they will probably not make sense if the sequence changed.

There is always the no_reload_sequences param when you want to be sure not to lose anything

So it would look like this:

foreach(old seqs):
    if the sequence is no longer ni the new fasta:
        delete the sequence + annotations/alterations

foreach(new seq):
    if the sequence is already in db:
        if the length is the same
            keep it
        else:
            update it
            delete annotations/alterations
    else:
        add the new sequence to the db

@abretaud
Copy link
Collaborator

Some progress:

  • added an optional env var to the docker image to ease debugging
  • 2 tests were failing just because I forgot to set the option I was testing (damn!)
  • Still 2 failing tests in remote mode, when expecting apollo to reload the sequences, I get this in the logs:
2019-11-14 18:56:06,353 [http-nio-8080-exec-6] DEBUG apollo.OrganismController  - Updating organism info null
Genome FASTa seq/genome.fasta
is a genome fasta
done is a genome fasta
should be laoding a data file here null , /home/abretaud/git/python-apollo/apollo_shared_dir/org2, /home/abretaud/git/python-apollo/apollo_shared_dir/org2 , false

It seems like organismDataFile is null there while it should not: do you see a reason why? Something in the handling of uploaded file (I think it is sent properly by python-apollo)?

@nathandunn
Copy link
Contributor Author

@abretaud I think the problem is that things are being posted potentially wrong.

I have a piece of code here:

if (!request instanceof ShiroHttpServletRequest) {
organismDataFile = request.getFile(FeatureStringEnum.ORGANISM_DATA.value)
}

That checks for a regular shiro request instead of a form post type:

If I remove the check I get:

No signature of method: org.apache.shiro.web.servlet.ShiroHttpServletRequest.getFile()

Note that I don't have that check in other parts of my code, but I think this method gets used in other ways, so its probably still applicable?

Maybe the autoconvert_to_json flag being passed in on the post is getting ignored so its not setting the header to ?
https://github.com/galaxy-genome-annotation/python-apollo/blob/master/apollo/remote/__init__.py#L170

The flag should probably be multipart/form-data? I'm not sure . . it looks like addOrganismWithSequence works just fine. I'm wondering if somehow its getting passed differently somehow.

@nathandunn
Copy link
Contributor Author

@abretaud / @erasche The default communication I use (at least on the GWT-side) is:

        builder.setHeader("Content-type", "application/x-www-form-urlencoded");
        builder.setHeader("Accept", "application/json");

I noticed you don't actually seem to test the addOrganismWithSequence as far as I can tell, so that piece of code doesn't get tested.

I added this just to confirm my suspicions:

diff --git a/grails-app/controllers/org/bbop/apollo/OrganismController.groovy b/grails-app/controllers/org/bbop/apollo/OrganismController.groovy
index 9759777bb..155450754 100644
--- a/grails-app/controllers/org/bbop/apollo/OrganismController.groovy
+++ b/grails-app/controllers/org/bbop/apollo/OrganismController.groovy
@@ -254,6 +254,14 @@ class OrganismController {
   def addOrganismWithSequence() {
 
 
+    if (!request instanceof ShiroHttpServletRequest) {
+      println "ADD ORG request is not a shiro one ${request}"
+//      organismDataFile = request.getFile(FeatureStringEnum.ORGANISM_DATA.value)
+    }
+    else{
+      println "ADD ORG IS A shiro request ${request}"
+    }
+
     JSONObject returnObject = new JSONObject()
     JSONObject requestObject = permissionService.handleInput(request, params)
     log.info "Adding organism with SEQUENCE ${requestObject as String}"
@@ -1304,8 +1312,13 @@ class OrganismController {
 
         CommonsMultipartFile organismDataFile = null
         if (!request instanceof ShiroHttpServletRequest) {
+          println "request is not a shiro one ${request}"
           organismDataFile = request.getFile(FeatureStringEnum.ORGANISM_DATA.value)
         }
+        else{
+          println "IS A shiro request ${request}"
+        }
+//        CommonsMultipartFile organismDataFile = request.getFile(FeatureStringEnum.ORGANISM_DATA.value)
         String foundBlatdb = null
         if (organismDataFile) {
           File archiveFile = new File(organismDataFile.getOriginalFilename())

@abretaud
Copy link
Collaborator

To be sure I sniffed the http traffic with wireshark: addOrganismWithSequence and updateOrganismInfo posted content look very similar, and both have the multipart/form-data content type (added automatically by python requests module)
So I guess the problem is probably on the grails/Apollo side

@nathandunn
Copy link
Contributor Author

nathandunn commented Nov 18, 2019 via email

@abretaud
Copy link
Collaborator

Yep they do!

A wireshark screenshot when adding an organism:
addorg_screenshot

And when updating an organism:
update_screenshot

@nathandunn
Copy link
Contributor Author

That is strange . . okay . . I'll take a look. Probably tomorrow for me, though. This is very helpful.

@nathandunn
Copy link
Contributor Author

@abretaud So in some cases form data is being sent over and in some cases it is not.

So for example here:

ERROR: test_update_organism (test.organism_test.OrganismTest)

root: INFO: curl http://localhost:8080/apollo/organism/updateOrganismInfo -H 'Content-Type: application/json' -d '"{\"username\": \"admin@local.host\", \"password\": \"password\", \"name\": \"temp_org\", \"species\": \"updatedspecies\", \"publicMode\": false, \"noReloadSequences\": false, \"directory\": \"/Users/nathandunn/repositories/python-apollo/apollo_shared_dir/org1\", \"blatdb\": \"/Users/nathandunn/repositories/python-apollo/apollo_shared_dir/org1/seq/genome.2bit\", \"genus\": \"updatedgenus\", \"id\": 1000}"'
urllib3.connectionpool: DEBUG: Starting new HTTP connection (1): localhost:8080
urllib3.connectionpool: DEBUG: http://localhost:8080 "POST /apollo/organism/updateOrganismInfo HTTP/1.1" 200 None

and here:

ERROR: test_update_organism_changedseq (test.organism_test.OrganismTest)
root: INFO: curl http://localhost:8080/apollo/organism/updateOrganismInfo -H 'Content-Type: application/json' -d '"{\"username\": \"admin@local.host\", \"password\": \"password\", \"name\": \"temp_org\", \"species\": \"updatedspecies\", \"publicMode\": false, \"noReloadSequences\": false, \"directory\": \"/Users/nathandunn/repositories/python-apollo/apollo_shared_dir/org_update_changedseq\", \"blatdb\": \"/Users/nathandunn/repositories/python-apollo/apollo_shared_dir/org1/seq/genome.2bit\", \"genus\": \"updatedgenus\", \"id\": 1004}"'
urllib3.connectionpool: DEBUG: Starting new HTTP connection (1): localhost:8080
urllib3.connectionpool: DEBUG: http://localhost:8080 "POST /apollo/organism/updateOrganismInfo HTTP/1.1" 200 None

But in some cases it does get sent through as multi-part form data.

So, I think there are two methods for calling updateOrganism as well. One in the organism/__init__.py file and one int the remote/__init__.py file. Sometimes the response should be posted as JSON (no multipart data) and the rest of the time it should be fine without.

So, my guess is that its calling the wrong thing in the wrong place. I'll have to explore it.

@nathandunn
Copy link
Contributor Author

@abretaud The problem was that the updateOrganismInfo in some instances call was updating a common directory to multiple organisms as per design. However, doesn't make for a great test. I rewrote that part. There was also a part with the changing the decoding check, but I'm not sure if it actually did anything (it didn't break anything).

@nathandunn
Copy link
Contributor Author

@abretaud I am going to merge this in for now in order to simplify the testing, but we can add more for sure

@nathandunn nathandunn merged commit 5c36e8d into develop Nov 20, 2019
2.5.0 automation moved this from In progress to Done Nov 20, 2019
@cmdcolin cmdcolin deleted the fix-2307-v1 branch December 7, 2023 19:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
2.5.0
  
Done
Development

Successfully merging this pull request may close these issues.

Update organism + reloading sequences (reloading only SOME sequences)
2 participants