Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test traffic Merge Into Dev: Request for Feedback #2220

Merged
merged 33 commits into from Mar 19, 2023
Merged

Test traffic Merge Into Dev: Request for Feedback #2220

merged 33 commits into from Mar 19, 2023

Conversation

sgoggins
Copy link
Member

@sgoggins sgoggins commented Mar 7, 2023

There is one section that looked to me like it might be using an older strategy for adding the traffic gathering stats. Its this one:

### Section from traffic metric merge that may need to be changed

    with DatabaseSession(logger) as session:
        query = session.query(Repo)
        repos = execute_session_query(query, 'all')
        #Just use list comprehension for simple group
        repo_info_tasks = [collect_repo_info.si(repo.repo_git) for repo in repos]

        for repo in repos:
            first_tasks_repo = group(collect_issues.si(repo.repo_git),collect_pull_requests.si(repo.repo_git),collect_github_repo_clones_data.si(repo.repo_git))
            second_tasks_repo = group(collect_events.si(repo.repo_git),
                collect_github_messages.si(repo.repo_git),process_pull_request_files.si(repo.repo_git), process_pull_request_commits.si(repo.repo_git))

            repo_chain = chain(first_tasks_repo,second_tasks_repo)
            issue_dependent_tasks.append(repo_chain)

        repo_task_group = group(
            *repo_info_tasks,
            chain(group(*issue_dependent_tasks),process_contributors.si()),
            generate_facade_chain(logger),
            collect_releases.si()
        )
    
    chain(repo_task_group, refresh_materialized_views.si()).apply_async()

#### End of section from traffic metric merge that may need to be changed

@IsaacMilarky / @ABrain7710 : Let me know if my hunch about that is right. I also believe there's a database change that will be substantially out of order and needs a new name.

meetagrawal09 and others added 2 commits January 21, 2023 08:20
* Add files via upload

Frontend Files to work locally

* Json file to work on frontend locally

* Update README.md

* Changing

* add repo count to gropus, maybe sorting too

Signed-off-by: Henryufa <henrywbahr@gmail.com>

* remove comment

Signed-off-by: Henryufa <henrywbahr@gmail.com>

* added changes to lock file back

Signed-off-by: Henryufa <henrywbahr@gmail.com>

* I think this gets groups sorting by number of repo

* added schema updates

Signed-off-by: meetagrawal09 <agrawalmeet91@gmail.com>

* added logic for task

Signed-off-by: meetagrawal09 <agrawalmeet91@gmail.com>

* corrected db class

Signed-off-by: meetagrawal09 <agrawalmeet91@gmail.com>

* added sequence for clone_id

Signed-off-by: meetagrawal09 <agrawalmeet91@gmail.com>

* changed schema field names, updated task logic

Signed-off-by: meetagrawal09 <agrawalmeet91@gmail.com>

* added logic for parsing data

Signed-off-by: meetagrawal09 <agrawalmeet91@gmail.com>

* final corrections to schema

Signed-off-by: meetagrawal09 <agrawalmeet91@gmail.com>

* added task to the queue

Signed-off-by: meetagrawal09 <agrawalmeet91@gmail.com>

* added schema migration script

Signed-off-by: meetagrawal09 <agrawalmeet91@gmail.com>

* changed version file formatting

Signed-off-by: meetagrawal09 <agrawalmeet91@gmail.com>

Signed-off-by: Henryufa <henrywbahr@gmail.com>
Signed-off-by: meetagrawal09 <agrawalmeet91@gmail.com>
Co-authored-by: CadenHicks <cadenhicks@gmail.com>
Co-authored-by: Henryufa <44609877+Henryufa@users.noreply.github.com>
Co-authored-by: Henryufa <henrywbahr@gmail.com>
Co-authored-by: Benjamin Williams <112727169+benwilliams95@users.noreply.github.com>
Co-authored-by: Sean P. Goggins <outdoors@acm.org>
Co-authored-by: Sean P. Goggins <s@goggins.com>
Note that the change to start_tasks.py may need to be refactored to be consistent with the way we are managing jobs now:
```

    with DatabaseSession(logger) as session:
        query = session.query(Repo)
        repos = execute_session_query(query, 'all')
        #Just use list comprehension for simple group
        repo_info_tasks = [collect_repo_info.si(repo.repo_git) for repo in repos]

        for repo in repos:
            first_tasks_repo = group(collect_issues.si(repo.repo_git),collect_pull_requests.si(repo.repo_git),collect_github_repo_clones_data.si(repo.repo_git))
            second_tasks_repo = group(collect_events.si(repo.repo_git),
                collect_github_messages.si(repo.repo_git),process_pull_request_files.si(repo.repo_git), process_pull_request_commits.si(repo.repo_git))

            repo_chain = chain(first_tasks_repo,second_tasks_repo)
            issue_dependent_tasks.append(repo_chain)

        repo_task_group = group(
            *repo_info_tasks,
            chain(group(*issue_dependent_tasks),process_contributors.si()),
            generate_facade_chain(logger),
            collect_releases.si()
        )

    chain(repo_task_group, refresh_materialized_views.si()).apply_async()

```
@sgoggins sgoggins marked this pull request as draft March 7, 2023 21:09
@@ -466,8 +466,22 @@ def extract_needed_contributor_data(contributor, tool_source, tool_version, data

return contributor

def extract_needed_clone_history_data(clone_history_data:List[dict], repo_id:int):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@IsaacMilarky / @ABrain7710 : What needs fixing here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function LGTM unless it's throwing errors.

@@ -3358,3 +3358,31 @@ class PullRequestReviewMessageRef(Base):
msg = relationship("Message")
pr_review = relationship("PullRequestReview")
repo = relationship("Repo")


class RepoClone(Base):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ABrain7710 / @IsaacMilarky : Is this the right way?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a unique constraint on the repo_id if you plan on using postgres 'on conflict' inserts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this, and then I realized this is like "releases", or repo_info... we want to hold the historical record for the repos. There should not be any conflicts since the primary key is an autoincrement @IsaacMilarky

@@ -2777,6 +2777,35 @@ CREATE TABLE augur_data.working_commits (

ALTER TABLE augur_data.working_commits OWNER TO augur;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ABrain7710 / @IsaacMilarky : Is this the right way to do this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proper way to do this is with alembic which you did already. I would not do it this way.

@@ -0,0 +1,7 @@
{
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine because to run our frontend, we do still need this file.

Copy link
Member Author

@sgoggins sgoggins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ABrain7710 / @IsaacMilarky : I left several comments. On the frontend related files, I am simply going to run them and see what happens.

Copy link
Contributor

@IsaacMilarky IsaacMilarky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some changes should be made. The section from the merge needs to be removed as it is old, the new table is missing a unique constraint for the on conflict logic, The alembic migration looks like it does a bit more than it should, and the table creation should not be done through augur_full.sql. Otherwise looks good for the most part

@@ -3358,3 +3358,31 @@ class PullRequestReviewMessageRef(Base):
msg = relationship("Message")
pr_review = relationship("PullRequestReview")
repo = relationship("Repo")


class RepoClone(Base):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a unique constraint on the repo_id if you plan on using postgres 'on conflict' inserts.

@@ -466,8 +466,22 @@ def extract_needed_contributor_data(contributor, tool_source, tool_version, data

return contributor

def extract_needed_clone_history_data(clone_history_data:List[dict], repo_id:int):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function LGTM unless it's throwing errors.

@@ -2777,6 +2777,35 @@ CREATE TABLE augur_data.working_commits (

ALTER TABLE augur_data.working_commits OWNER TO augur;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proper way to do this is with alembic which you did already. I would not do it this way.

augur/tasks/init/celery_app.py Show resolved Hide resolved
augur/tasks/start_tasks.py Outdated Show resolved Hide resolved
augur/tasks/start_tasks.py Outdated Show resolved Hide resolved
frontend/src/views/RepoGroups.vue Show resolved Hide resolved
Updating Test-Traffic with `dev`
Copy link
Member Author

@sgoggins sgoggins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think changes are required on the foreign key before this is merged.

Copy link
Member Author

@sgoggins sgoggins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is pretty close.

@sgoggins sgoggins marked this pull request as ready for review March 19, 2023 19:48
@sgoggins
Copy link
Member Author

@IsaacMilarky : I think I addressed this point by merging dev into this branch: #2220 (comment)

@sgoggins
Copy link
Member Author

@IsaacMilarky : I might have missed something. Getting this error at runtime... looking into it:

[2023-03-19 14:54:45,961: ERROR/ForkPoolWorker-1] Task augur.tasks.start_tasks.augur_collection_monitor[4a790a8d-e655-42d6-b599-337d5f4abc36] raised unexpected: NameError("name 'generate_facade_chain' is not defined")
Traceback (most recent call last):
  File "/home/sean/github/virtualenv/berkeley/lib/python3.8/site-packages/celery/app/trace.py", line 451, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/home/sean/github/virtualenv/berkeley/lib/python3.8/site-packages/celery/app/trace.py", line 734, in __protected_call__
    return self.run(*args, **kwargs)
  File "/home/sean/github/berkeley/augur/tasks/start_tasks.py", line 364, in augur_collection_monitor
    start_primary_collection(session, max_repo=50, days=30)
  File "/home/sean/github/berkeley/augur/tasks/start_tasks.py", line 231, in start_primary_collection
    primary_augur_collection.start_data_collection()
  File "/home/sean/github/berkeley/augur/tasks/util/collection_util.py", line 372, in start_data_collection
    for repo_git, task_id in self.send_messages():
  File "/home/sean/github/berkeley/augur/tasks/util/collection_util.py", line 387, in send_messages
    augur_collection_sequence.append(job(repo_git))
  File "/home/sean/github/berkeley/augur/tasks/start_tasks.py", line 96, in primary_repo_collect_phase
    generate_facade_chain(logger),
NameError: name 'generate_facade_chain' is not defined

@sgoggins sgoggins marked this pull request as draft March 19, 2023 20:01
@sgoggins sgoggins merged commit 83f0c5c into dev Mar 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants