Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update analysis job objects #208

Merged
merged 6 commits into from
Aug 13, 2014

Conversation

squirrelo
Copy link
Contributor

This adds some functionality to job objects that is needed for the commands selection page. Exists now properly checks the data used for the job, and delete now works for jobs.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.61%) when pulling c6eead7 on squirrelo:update-analysis-job-objects into 1ffd66a on biocore:master.

@squirrelo
Copy link
Contributor Author

It's been a flood of pull requests, but this one is also ready for review.

@@ -92,6 +95,8 @@ def exists(cls, datatype, command, options):
The name of the command run on the data
options : dict
Options for the command in the format {option: value}
analysis : Analysis object
Analysis the job will be added to if it doesn't exist
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not clear, could you fix?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree

@squirrelo
Copy link
Contributor Author

I also need opinions on one thing: Currently exists is a boolean. This means that if a job exists we can tell it exists, but not which one specifically. This becomes problematic with the create function, as we have no way currently of getting and returning that already existing job. Should create find the existing job and return it, or raise an error like it currently does in the commented out code? And if not, I'm not sure how to handle getting an existing job in this case.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.72%) when pulling 0cd7af8 on squirrelo:update-analysis-job-objects into 1ffd66a on biocore:master.

sql, (datatype_id, command_id, opts_json))
if not analyses:
return False
analyses = [x[0] for x in analyses]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this, and the following if statement seem odd. Why not just do this in a single list-comp?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, fixed that but haven't put in the pull since the above discussion.

analyses.remove(analysis.id)

# check data used to create jobid
sql = ("SELECT processed_data_id, array_agg(sample_id ORDER BY "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably incorrect, but isn't this query and the subsequent forloop nearly equivalent to:

SELECT 1 
FROM (SELECT processed_data_id, array_agg(sample_id ORDER BY sample_id) as samples
      FROM qiita.analysis_sample
      WHERE analysis_id = %s
      GROUP BY processed_data_id)
WHERE samples=foo;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, as the current statement actually creates the dictionary structure in postgres, so the samples are multiple lists tied to their processed data ids. I can't think of a way to do the actual comparison in SQL itself because of that. If we were dealing with a single aggregated list of samples that would work though.

@coveralls
Copy link

Coverage Status

Coverage increased (+2.55%) when pulling 14abe79 on squirrelo:update-analysis-job-objects into 1ffd66a on biocore:master.

@squirrelo
Copy link
Contributor Author

I've simplified down the sample equality checking in exists, so it's ready for review again.

"GROUP BY processed_data_id")
samples = dict(conn_handler.execute_fetchall(sql, [analysis.id]))
# check passed analyses' samples dict against all found analyses
for aid in analyses:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make it a while loop vs a for/break?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I need to loop through all the ids in analyses and it's easier to break that for loop than find a way to do it in a while loop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @squirrelo on this one, the current structure seems fine to me

@adamrp
Copy link
Contributor

adamrp commented Aug 8, 2014

Just that question about the classmethod decorator on delete, but other than that it's good to merge! 👍 Thanks, @squirrelo!

@adamrp
Copy link
Contributor

adamrp commented Aug 8, 2014

Oh, I missed your comment from a day ago:

I also need opinions on one thing: Currently exists is a boolean. This means that if a job exists we can tell it exists, but not which one specifically. This becomes problematic with the create function, as we have no way currently of getting and returning that already existing job. Should create find the existing job and return it, or raise an error like it currently does in the commented out code? And if not, I'm not sure how to handle getting an existing job in this case.

Maybe you can add a parameter like return_job_if_exists to the create method, which defaults to False?

@squirrelo
Copy link
Contributor Author

Yeah, the problem is we then need to search again for the actual job. I suppose this could be alleviated by making exists set up as

exists(**thecurrentinfo, return_job=False)

and just have returning the job as part of exists, saving a lot of extra lookup time. Combining the two of those would then work to "create" an existing job really easily.

@adamrp
Copy link
Contributor

adamrp commented Aug 8, 2014

That seems reasonable
On Aug 8, 2014 3:42 PM, "Joshua Shorenstein" notifications@github.com
wrote:

Yeah, the problem is we then need to search again for hte acutal job. I
suppose this could be aleviated by making exists set up as

exists(**thecurrentinfo, return_job=False)

and just have returning the job as part of exists, saving a lot of extra
lookup time.


Reply to this email directly or view it on GitHub
#208 (comment).

@coveralls
Copy link

Coverage Status

Coverage increased (+2.86%) when pulling 8fff67b on squirrelo:update-analysis-job-objects into 1ffd66a on biocore:master.

@squirrelo
Copy link
Contributor Author

This is ready for more review.

@teravest
Copy link
Contributor

👍 looks good to me

@antgonza
Copy link
Member

Thanks, merging now.

antgonza added a commit that referenced this pull request Aug 13, 2014
@antgonza antgonza merged commit 8dba489 into qiita-spots:master Aug 13, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants