Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create individual plans to discuss #16

Closed
abelsiqueira opened this issue Jan 27, 2022 · 6 comments
Closed

Create individual plans to discuss #16

abelsiqueira opened this issue Jan 27, 2022 · 6 comments

Comments

@abelsiqueira
Copy link
Contributor

Spawned from standup.

Each one creates a plan (answering the issues) and we discuss approaches in the meeting.

@abelsiqueira
Copy link
Contributor Author

We can disagree with the issues or come up with our own issues.

@abelsiqueira
Copy link
Contributor Author

cc @fdiblen @jspaaks

@fdiblen
Copy link
Member

fdiblen commented Jan 27, 2022

My proposal (WIP)

  1. Decide which repositories we want to do that
    We already have requirements for filtering
    Start with NLeSC repositories first then outside of NLeSC
  2. Decide at what level we want to do automation
    I think
    • what ever tool we develop it should only filter the repositories based on our requirements
    • it should not create GH issues or PRs
    • would be nice, but not required:
      • if it can provide some statistics
      • if it is re-usable
  3. A tool for automatic filtering
    • Check for existing solutions. Is GitHub interface good enough?
      If we really need to build something new:
      • It should be simple and reusable
      • preferably a shell script or Python script
      • keeping statistics (via cron job) for NLeSC may be useful:
        • how many repos with CITATION.cff?
        • how many repos are using our GA?
  4. Agree on the text for the GH issue and use the same text
  5. Agree on the text for the PR and use the same text
  6. Create some test repositories to test our filtering method
  7. Decide what NLeSC repositories we will start with, create separate issues for bookkeeping and assign to our selves to the issues
  8. Decide how everyone will follow the interactions, how/who will react to questions etc.
  9. Decide what repos are interesting outside of NLeSC and work on them

@abelsiqueira
Copy link
Contributor Author

abelsiqueira commented Jan 27, 2022

Abel's Plan

Overview:

  • Download meta information in whatever way we can already filtering CITATION.cff.
  • Create automated filters
    • based on meta information;
    • based on files and their contents (might involve cloning).
  • Evaluate the number and quality of the results and reassess plan.
  • Find ~5 people that we know that will be affected and asked that they think of the plan.
  • Evaluate and reassess.
  • Create sorting criteria.
  • Create issue template.
  • Create PR template.
  • for case in ["5 people above", "NLeSC RSEs", "Filtered"]
    • Send automated issues.
    • If the user reply with [send me a pr], we send an automated PR.
    • Handle other cases manually.
    • Evaluate and reassess.

Details

  • #12: decide on language

    Julia with GitHub.jl. I like Julia and I have used GitHub.jl before. It can access cffconvert as a Python package, and as a shell command. Cons: first time usage for colleagues. GitHub.jl doesn't cover search.

  • #13: decide on what to automate

    See #16

  • #14: Write the text of the issue

  • #15: decide on input/output for the filters

    I think it depends on the size of the initial body of work, but I would prefer to have an auxiliary function to read and write to file, and read and return internal objects (arrays of dictionaries?) for speed.

  • Split into filtering function for reusability and script

    Many of the filters are useful for other CFF bot applications, such as

    • Finding invalid CITATION.cff
    • Updating the action version
    • Analytics

Filters

Agree

  • #2: Add functionality to filter based on presence of a CITATION.cff file
  • #3: Add functionality to filter based on CITATION.cff occurring in the root of the repo
  • #4: Add functionality to filter on valid CITATION.cff
  • #5: Add functionality to filter on whether the repo already uses workflows
  • #6: Add functionality to filter based on whether the repo already uses Pull Requests
  • #9: Add functionality to filter based on whether the repo already uses a validator
  • #11: Add functionality to filter based on whitelist of repos

Comment

  • #7: Add functionality to filter based on whether the repo accepts external PRs

    It would be ideal, but can it be done automatically?

  • #8: Add functionality to filter based on whether we already sent a similar PR previously

    I would not worry at first.

  • #10: Add functionality to filter based on whether the repo's CITATION.cff has seen multiple updates

    Could be a sorting criteria, instead of filter.

Sorting

This is a multicriteria situtation, so we need to handle with care, instead of simply piping sorting functions.

  • Number of stars
  • Number of contributors

@abelsiqueira
Copy link
Contributor Author

We decided on the following plan:

  • initial set of URLs curl https://research-software.nl/api/software_cache | jq '.[] | .repositoryURLs.github[]' --raw-output > urls.txt
  • pure filtering in node
  • verify that the repo has a CITATION.cff in the root
  • results in a list of candidate repos; inspect the list by hand
  • iterate the steps, include additional filters
  • once we're happy with the curated list of repos, manually create issues said repos

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants