-
Notifications
You must be signed in to change notification settings - Fork 845
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize secondary task recollection #2800
Conversation
Signed-off-by: Andrew Brain <andrewbrain2019@gmail.com>
Signed-off-by: Andrew Brain <andrewbrain2019@gmail.com>
Signed-off-by: Andrew Brain <andrewbrain2019@gmail.com>
augur/application/db/lib.py
Outdated
@@ -1,7 +1,7 @@ | |||
import sqlalchemy as s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[pylint] reported by reviewdog 🐶
C0114: Missing module docstring (missing-module-docstring)
@@ -1,7 +1,7 @@ | |||
from celery import chain |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[pylint] reported by reviewdog 🐶
C0114: Missing module docstring (missing-module-docstring)
@@ -1,7 +1,7 @@ | |||
from celery import chain | |||
import logging | |||
|
|||
def machine_learning_phase(repo_git): | |||
def machine_learning_phase(repo_git, full_collection): | |||
from augur.tasks.data_analysis.clustering_worker.tasks import clustering_task |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[pylint] reported by reviewdog 🐶
C0415: Import outside toplevel (augur.tasks.data_analysis.clustering_worker.tasks.clustering_task) (import-outside-toplevel)
@@ -1,7 +1,7 @@ | |||
from celery import chain | |||
import logging | |||
|
|||
def machine_learning_phase(repo_git): | |||
def machine_learning_phase(repo_git, full_collection): | |||
from augur.tasks.data_analysis.clustering_worker.tasks import clustering_task | |||
from augur.tasks.data_analysis.discourse_analysis.tasks import discourse_analysis_task |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[pylint] reported by reviewdog 🐶
C0415: Import outside toplevel (augur.tasks.data_analysis.discourse_analysis.tasks.discourse_analysis_task) (import-outside-toplevel)
@@ -1,7 +1,7 @@ | |||
from celery import chain | |||
import logging | |||
|
|||
def machine_learning_phase(repo_git): | |||
def machine_learning_phase(repo_git, full_collection): | |||
from augur.tasks.data_analysis.clustering_worker.tasks import clustering_task | |||
from augur.tasks.data_analysis.discourse_analysis.tasks import discourse_analysis_task | |||
from augur.tasks.data_analysis.insight_worker.tasks import insight_task |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[pylint] reported by reviewdog 🐶
C0415: Import outside toplevel (augur.tasks.data_analysis.insight_worker.tasks.insight_task) (import-outside-toplevel)
@@ -166,7 +166,7 @@ def build_primary_repo_collect_request(session,enabled_phase_names, days_until_c | |||
primary_gitlab_enabled_phases.append(primary_repo_collect_phase_gitlab) | |||
|
|||
#task success is scheduled no matter what the config says. | |||
def core_task_success_util_gen(repo_git): | |||
def core_task_success_util_gen(repo_git, full_collection): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[pylint] reported by reviewdog 🐶
W0613: Unused argument 'full_collection' (unused-argument)
@@ -186,7 +186,7 @@ def build_secondary_repo_collect_request(session,enabled_phase_names, days_until | |||
|
|||
secondary_enabled_phases.append(secondary_repo_collect_phase) | |||
|
|||
def secondary_task_success_util_gen(repo_git): | |||
def secondary_task_success_util_gen(repo_git, full_collection): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[pylint] reported by reviewdog 🐶
W0613: Unused argument 'full_collection' (unused-argument)
@@ -202,12 +202,12 @@ def build_facade_repo_collect_request(session,enabled_phase_names, days_until_co | |||
|
|||
facade_enabled_phases.append(facade_phase) | |||
|
|||
def facade_task_success_util_gen(repo_git): | |||
def facade_task_success_util_gen(repo_git, full_collection): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[pylint] reported by reviewdog 🐶
W0613: Unused argument 'full_collection' (unused-argument)
return facade_task_success_util.si(repo_git) | ||
|
||
facade_enabled_phases.append(facade_task_success_util_gen) | ||
|
||
def facade_task_update_weight_util_gen(repo_git): | ||
def facade_task_update_weight_util_gen(repo_git, full_collection): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[pylint] reported by reviewdog 🐶
W0613: Unused argument 'full_collection' (unused-argument)
@@ -222,7 +222,7 @@ def build_ml_repo_collect_request(session,enabled_phase_names, days_until_collec | |||
|
|||
ml_enabled_phases.append(machine_learning_phase) | |||
|
|||
def ml_task_success_util_gen(repo_git): | |||
def ml_task_success_util_gen(repo_git, full_collection): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[pylint] reported by reviewdog 🐶
W0613: Unused argument 'full_collection' (unused-argument)
@ABrain7710 : What's the status of this one with regards to testing? Should we merge it now that we have done a release. |
This is complete, but I haven't had a chance to test it yet. It is a bit of a difficult test.It is a very small change though so if it was reviewed throughly it could be probably be merged |
Signed-off-by: Andrew Brain <andrewbrain2019@gmail.com>
Signed-off-by: Andrew Brain <andrewbrain2019@gmail.com>
@@ -53,6 +73,7 @@ def pull_request_commits_model(repo,logger, key_auth): | |||
logger.info(f"{task_name}: Inserting {len(all_data)} rows") | |||
pr_commits_natural_keys = ["pull_request_id", "repo_id", "pr_cmt_sha"] | |||
bulk_insert_dicts(logger, all_data,PullRequestCommit,pr_commits_natural_keys) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[pylint] reported by reviewdog 🐶
E0602: Undefined variable 'bulk_insert_dicts' (undefined-variable)
@@ -328,7 +326,7 @@ def collect_pull_request_review_comments(repo_git: str) -> None: | |||
|
|||
|
|||
@celery.task(base=AugurSecondaryRepoCollectionTask) | |||
def collect_pull_request_reviews(repo_git: str) -> None: | |||
def collect_pull_request_reviews(repo_git: str, full_collection: bool) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[pylint] reported by reviewdog 🐶
R0912: Too many branches (16/12) (too-many-branches)
@@ -328,7 +326,7 @@ def collect_pull_request_review_comments(repo_git: str) -> None: | |||
|
|||
|
|||
@celery.task(base=AugurSecondaryRepoCollectionTask) | |||
def collect_pull_request_reviews(repo_git: str) -> None: | |||
def collect_pull_request_reviews(repo_git: str, full_collection: bool) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[pylint] reported by reviewdog 🐶
R0915: Too many statements (54/50) (too-many-statements)
Signed-off-by: Andrew Brain <andrewbrain2019@gmail.com>
@@ -2,18 +2,18 @@ | |||
from augur.tasks.github.pull_requests.commits_model.core import * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[pylint] reported by reviewdog 🐶
W0401: Wildcard import augur.tasks.github.pull_requests.commits_model.core (wildcard-import)
element.decode('utf-8').replace('\x00', ' ') if isinstance(element, bytes) else element | ||
for element in page_data | ||
] | ||
logger.info(f"NUL characters were found in PR Reviews and replaced with spaces.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[pylint] reported by reviewdog 🐶
W1309: Using an f-string that does not have any interpolated variables (f-string-without-interpolation)
logger.info(f"NUL characters were found in PR Reviews and replaced with spaces.") | ||
elif isinstance(page_data, bytes): | ||
page_data = page_data.decode('utf-8').replace('\x00', ' ') | ||
logger.info(f"NUL characters were found in PR Reviews and replaced with spaces.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[pylint] reported by reviewdog 🐶
W1309: Using an f-string that does not have any interpolated variables (f-string-without-interpolation)
contributors = [] | ||
for pull_request_id in all_pr_reviews.keys(): | ||
contributors = [] | ||
for pull_request_id in all_pr_reviews.keys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[pylint] reported by reviewdog 🐶
C0206: Consider iterating with .items() (consider-using-dict-items)
contributors = [] | ||
for pull_request_id in all_pr_reviews.keys(): | ||
contributors = [] | ||
for pull_request_id in all_pr_reviews.keys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[pylint] reported by reviewdog 🐶
C0201: Consider iterating the dictionary directly instead of calling .keys() (consider-iterating-dictionary)
pr_reviews = [] | ||
for pull_request_id in all_pr_reviews.keys(): | ||
pr_reviews = [] | ||
for pull_request_id in all_pr_reviews.keys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[pylint] reported by reviewdog 🐶
C0206: Consider iterating with .items() (consider-using-dict-items)
pr_reviews = [] | ||
for pull_request_id in all_pr_reviews.keys(): | ||
pr_reviews = [] | ||
for pull_request_id in all_pr_reviews.keys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[pylint] reported by reviewdog 🐶
C0201: Consider iterating the dictionary directly instead of calling .keys() (consider-iterating-dictionary)
|
||
pr_reviews = [] | ||
pr_reviews_generator = GithubPaginator(pr_review_url, manifest.key_auth, logger) | ||
for page_data, page in pr_reviews_generator.iter_pages(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[pylint] reported by reviewdog 🐶
W0612: Unused variable 'page' (unused-variable)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ready for testing!
Description
Overall
pull_request_files
,pull_request_commits
, andpull_request_reviews
when they are recollectedImplementation Details
full_collection
boolean flag sostart_data_collection
knows what to passAugurTaskRoutine.start_data_collection
iterate through the tuples and pass therepo_git
andfull_collection
flag to each phasefull_collection
flag to every phase's method arguments. This had to be done sinceAugurTaskRoutine.start_data_collection
passes the same arguments for every phaseget_secondary_data_last_collected
and get prs that have been updated since a datepull_request_files
to only get pr numbers for updated prs iffull_collection
flag is falsepull_request_commits
to only get pr urls for updated prs iffull_collection
flag is falsepull_request_reviews
to only get pr numbers for updated prs iffull_collection
flag is falseNotes for Reviewers
I have not tested this yet, I will change it from a draft pr to a pr when it is test
Signed commits