New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test traffic Merge Into Dev: Request for Feedback #2220
Conversation
* Add files via upload Frontend Files to work locally * Json file to work on frontend locally * Update README.md * Changing * add repo count to gropus, maybe sorting too Signed-off-by: Henryufa <henrywbahr@gmail.com> * remove comment Signed-off-by: Henryufa <henrywbahr@gmail.com> * added changes to lock file back Signed-off-by: Henryufa <henrywbahr@gmail.com> * I think this gets groups sorting by number of repo * added schema updates Signed-off-by: meetagrawal09 <agrawalmeet91@gmail.com> * added logic for task Signed-off-by: meetagrawal09 <agrawalmeet91@gmail.com> * corrected db class Signed-off-by: meetagrawal09 <agrawalmeet91@gmail.com> * added sequence for clone_id Signed-off-by: meetagrawal09 <agrawalmeet91@gmail.com> * changed schema field names, updated task logic Signed-off-by: meetagrawal09 <agrawalmeet91@gmail.com> * added logic for parsing data Signed-off-by: meetagrawal09 <agrawalmeet91@gmail.com> * final corrections to schema Signed-off-by: meetagrawal09 <agrawalmeet91@gmail.com> * added task to the queue Signed-off-by: meetagrawal09 <agrawalmeet91@gmail.com> * added schema migration script Signed-off-by: meetagrawal09 <agrawalmeet91@gmail.com> * changed version file formatting Signed-off-by: meetagrawal09 <agrawalmeet91@gmail.com> Signed-off-by: Henryufa <henrywbahr@gmail.com> Signed-off-by: meetagrawal09 <agrawalmeet91@gmail.com> Co-authored-by: CadenHicks <cadenhicks@gmail.com> Co-authored-by: Henryufa <44609877+Henryufa@users.noreply.github.com> Co-authored-by: Henryufa <henrywbahr@gmail.com> Co-authored-by: Benjamin Williams <112727169+benwilliams95@users.noreply.github.com> Co-authored-by: Sean P. Goggins <outdoors@acm.org> Co-authored-by: Sean P. Goggins <s@goggins.com>
Note that the change to start_tasks.py may need to be refactored to be consistent with the way we are managing jobs now: ``` with DatabaseSession(logger) as session: query = session.query(Repo) repos = execute_session_query(query, 'all') #Just use list comprehension for simple group repo_info_tasks = [collect_repo_info.si(repo.repo_git) for repo in repos] for repo in repos: first_tasks_repo = group(collect_issues.si(repo.repo_git),collect_pull_requests.si(repo.repo_git),collect_github_repo_clones_data.si(repo.repo_git)) second_tasks_repo = group(collect_events.si(repo.repo_git), collect_github_messages.si(repo.repo_git),process_pull_request_files.si(repo.repo_git), process_pull_request_commits.si(repo.repo_git)) repo_chain = chain(first_tasks_repo,second_tasks_repo) issue_dependent_tasks.append(repo_chain) repo_task_group = group( *repo_info_tasks, chain(group(*issue_dependent_tasks),process_contributors.si()), generate_facade_chain(logger), collect_releases.si() ) chain(repo_task_group, refresh_materialized_views.si()).apply_async() ```
@@ -466,8 +466,22 @@ def extract_needed_contributor_data(contributor, tool_source, tool_version, data | |||
|
|||
return contributor | |||
|
|||
def extract_needed_clone_history_data(clone_history_data:List[dict], repo_id:int): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@IsaacMilarky / @ABrain7710 : What needs fixing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function LGTM unless it's throwing errors.
@@ -3358,3 +3358,31 @@ class PullRequestReviewMessageRef(Base): | |||
msg = relationship("Message") | |||
pr_review = relationship("PullRequestReview") | |||
repo = relationship("Repo") | |||
|
|||
|
|||
class RepoClone(Base): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ABrain7710 / @IsaacMilarky : Is this the right way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be a unique constraint on the repo_id if you plan on using postgres 'on conflict' inserts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this, and then I realized this is like "releases", or repo_info... we want to hold the historical record for the repos. There should not be any conflicts since the primary key is an autoincrement @IsaacMilarky
@@ -2777,6 +2777,35 @@ CREATE TABLE augur_data.working_commits ( | |||
|
|||
ALTER TABLE augur_data.working_commits OWNER TO augur; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ABrain7710 / @IsaacMilarky : Is this the right way to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proper way to do this is with alembic which you did already. I would not do it this way.
@@ -0,0 +1,7 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine because to run our frontend, we do still need this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ABrain7710 / @IsaacMilarky : I left several comments. On the frontend related files, I am simply going to run them and see what happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some changes should be made. The section from the merge needs to be removed as it is old, the new table is missing a unique constraint for the on conflict logic, The alembic migration looks like it does a bit more than it should, and the table creation should not be done through augur_full.sql. Otherwise looks good for the most part
@@ -3358,3 +3358,31 @@ class PullRequestReviewMessageRef(Base): | |||
msg = relationship("Message") | |||
pr_review = relationship("PullRequestReview") | |||
repo = relationship("Repo") | |||
|
|||
|
|||
class RepoClone(Base): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be a unique constraint on the repo_id if you plan on using postgres 'on conflict' inserts.
@@ -466,8 +466,22 @@ def extract_needed_contributor_data(contributor, tool_source, tool_version, data | |||
|
|||
return contributor | |||
|
|||
def extract_needed_clone_history_data(clone_history_data:List[dict], repo_id:int): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function LGTM unless it's throwing errors.
augur/application/schema/alembic/versions/12_traffic_additions.py
Outdated
Show resolved
Hide resolved
@@ -2777,6 +2777,35 @@ CREATE TABLE augur_data.working_commits ( | |||
|
|||
ALTER TABLE augur_data.working_commits OWNER TO augur; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proper way to do this is with alembic which you did already. I would not do it this way.
Updating Test-Traffic with `dev`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think changes are required on the foreign key before this is merged.
Add a unique constraint on repo_id in the clones table.
…recreating logic that I don't understand the point of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is pretty close.
Updating from Dev
@IsaacMilarky : I think I addressed this point by merging dev into this branch: #2220 (comment) |
@IsaacMilarky : I might have missed something. Getting this error at runtime... looking into it:
|
Added collect_github_repo_clones_data.si(repo.repo_git) to new job flow logic.
There is one section that looked to me like it might be using an older strategy for adding the traffic gathering stats. Its this one:
@IsaacMilarky / @ABrain7710 : Let me know if my hunch about that is right. I also believe there's a database change that will be substantially out of order and needs a new name.