Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LIU-383: Initial work towards linter workflows #266

Merged
merged 4 commits into from
Jul 2, 2024
Merged

LIU-383: Initial work towards linter workflows #266

merged 4 commits into from
Jul 2, 2024

Conversation

myxie
Copy link
Collaborator

@myxie myxie commented Jun 13, 2024

Issue

This PR partially address LIU-383: currently, we run pylint as part of our CI workflows, which is good practice. However, we also disable all checkers, so the value of running that workflow is questionable.

There are a number of levels of linter warnings, and it is rare that we would want all warnings enabled for all files and across such a large codebase. It would be valuable to have some form of linting moving forward, especially to improve and maintain the quality of the codebase.

Solution

This PR starts our progress to using linting across the codebase by enabling (only) the Error level of messages. This should alleviate the most significant potential issues that are currently in our code base and preventing them from being introduced in the future.

A few messages have been disabled globally due to their being false positives (e.g. when interfacing with a 3rd party API, or when the class MRO confuses the checker). I have also disabled some checks for particular files where there are Python version number checks that are not taking into account by Pylint.

I will put PR comments on non-trivial fixes to provide reasoning behind the decisions I have made.

Moving forward

I think we should start working towards adding Warnings to pylint, but do this on specific modules (or even sub-modules). I would start with daliuge-translator, as it has less warnings than the daliuge-engine, and also is less robust a piece of code. Pylint let's us specify the pylintrc file, so there is scope for having separate pylint settings for the different modules whilst we move towards the 'final' settings.

Summary by Sourcery

This pull request introduces initial work towards enabling linter workflows by configuring pylint to check for error-level messages in the CI pipeline. It includes various code refactorings to improve code quality, consistency, and readability across multiple files. Additionally, it addresses false positives in pylint checks by adding appropriate disable comments.

  • Enhancements:
    • Enabled pylint error-level messages in CI workflows to improve code quality.
    • Refactored test cases in daliuge-engine/test/test_drop.py to remove redundant try-finally blocks.
    • Removed unused properties and methods in daliuge-translator/dlg/dropmake/lg_node.py.
    • Improved formatting and consistency in daliuge-engine/dlg/apps/app_base.py.
    • Fixed string formatting issues in daliuge-engine/dlg/graph_loader.py and daliuge-engine/dlg/data/drops/s3_drop.py.
    • Updated type annotations in various files to use List and DefaultDict from typing module.
    • Added pylint disable comments for false positives in multiple test files.
    • Refactored daliuge-engine/dlg/deploy/helm_client.py to remove unused variables.
    • Improved error handling and logging in daliuge-engine/dlg/data/drops/s3_drop.py.
    • Updated method signatures and fixed minor issues in daliuge-engine/dlg/deploy/create_dlg_job.py and daliuge-engine/dlg/lifecycle/dlm.py.
    • Refactored daliuge-engine/dlg/manager/node_manager.py to use Optional for type hinting.
    • Fixed minor issues and improved code readability in various test files.
  • CI:
    • Updated GitHub Actions workflow to run pylint with error-level checks and set a minimum score threshold.

Initial work to get our CI providing Error reports when running the pylint workflow.
Initial work to get our CI providing Error reports when running the pylint workflow.
Copy link

sourcery-ai bot commented Jun 13, 2024

Reviewer's Guide by Sourcery

This pull request (PR) addresses the issue LIU-383 by enabling the 'Error' level of pylint messages in the CI workflows. The changes include fixing various pylint errors across multiple files, updating the linting configuration, and adding or modifying code to comply with pylint's error-level checks.

File-Level Changes

Files Changes
daliuge-engine/test/test_drop.py
daliuge-engine/test/manager/test_smm.py
daliuge-engine/test/test_shared_memory.py
daliuge-engine/test/apps/test_simple.py
daliuge-engine/test/memoryUsage.py
daliuge-engine/test/test_input_fired_app_drop.py
daliuge-engine/test/test_S3Drop.py
Fixed pylint errors and added pylint disable comments where necessary in test files.
daliuge-engine/dlg/apps/app_base.py
daliuge-engine/dlg/graph_loader.py
daliuge-engine/dlg/deploy/helm_client.py
daliuge-engine/dlg/drop.py
daliuge-engine/dlg/event.py
daliuge-engine/dlg/data/io.py
daliuge-engine/dlg/data/drops/s3_drop.py
daliuge-engine/dlg/droputils.py
daliuge-engine/dlg/deploy/configs/__init__.py
daliuge-engine/dlg/deploy/create_dlg_job.py
daliuge-engine/dlg/lifecycle/dlm.py
daliuge-engine/dlg/manager/node_manager.py
daliuge-engine/dlg/named_port_utils.py
daliuge-engine/dlg/deploy/deployment_utils.py
Fixed pylint errors and improved code quality in various engine and deploy files.
daliuge-translator/dlg/dropmake/lg_node.py
daliuge-translator/dlg/dropmake/pgtp.py
daliuge-translator/dlg/dropmake/scheduler.py
daliuge-translator/dlg/dropmake/lg.py
Fixed pylint errors and improved code quality in dropmake files.
daliuge-common/dlg/common/__init__.py
daliuge-common/dlg/common/reproducibility/reproducibility.py
daliuge-common/dlg/restutils.py
Fixed pylint errors and improved code quality in common files.
.github/workflows/linting.yml Updated linting configuration to include --fail-under=9 and --fail-on=E options.

Tips
  • Trigger a new Sourcery review by commenting @sourcery-ai review on the pull request.
  • You can change your review settings at any time by accessing your dashboard:
    • Enable or disable the Sourcery-generated pull request summary or reviewer's guide;
    • Change the review language;
  • You can always contact us if you have any questions or feedback.

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @myxie - I've reviewed your changes and found some issues that need to be addressed.

Blocking issues:

  • String interpolation issue (link)
Here's what I looked at during the review
  • 🔴 General issues: 1 blocking issue, 18 other issues
  • 🟢 Security: all looks good
  • 🟡 Testing: 4 issues found
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.

daliuge-engine/dlg/graph_loader.py Outdated Show resolved Hide resolved
daliuge-engine/dlg/graph_loader.py Show resolved Hide resolved
daliuge-engine/dlg/deploy/helm_client.py Show resolved Hide resolved
daliuge-engine/dlg/drop.py Show resolved Hide resolved
Comment on lines -431 to -432
else:
param = default_value
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Removed else block

The else block that assigns default_value to param has been removed. Ensure that this change does not affect the logic where param should be assigned default_value if no other conditions are met.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this any more now that we are initialising the param variable with the default, and then modifying the value based on the conditionals. This protects us from a potential variable undeclared in scope error.

Comment on lines 401 to 404
raise Exception(
"%r: More effective inputs (%d) than inputs (%d)"
"%r: More effective inputs (%s) than inputs (%d)"
% (self, self.n_effective_inputs, n_inputs)
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Raise a specific error instead of the general Exception or BaseException (raise-specific-error)

ExplanationIf a piece of code raises a specific exception type rather than the generic [`BaseException`](https://docs.python.org/3/library/exceptions.html#BaseException) or [`Exception`](https://docs.python.org/3/library/exceptions.html#Exception), the calling code can:
  • get more information about what type of error it is
  • define specific exception handling for it

This way, callers of the code can handle the error appropriately.

How can you solve this?

So instead of having code raising Exception or BaseException like

if incorrect_input(value):
    raise Exception("The input is incorrect")

you can have code raising a specific error like

if incorrect_input(value):
    raise ValueError("The input is incorrect")

or

class IncorrectInputError(Exception):
    pass


if incorrect_input(value):
    raise IncorrectInputError("The input is incorrect")

@@ -133,7 +133,7 @@ def setCompleted(self):

@property
def dataURL(self) -> str:
return "ngas://%s:%d/%s" % (self.ngasSrv, self.ngasPort, self.fileId)
return "ngas://%s:%s/%s" % (self.ngasSrv, self.ngasPort, self.fileId)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (code-quality): Replace interpolated string formatting with f-string (replace-interpolation-with-fstring)

Suggested change
return "ngas://%s:%s/%s" % (self.ngasSrv, self.ngasPort, self.fileId)
return f"ngas://{self.ngasSrv}:{self.ngasPort}/{self.fileId}"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be taken care of another day when addressing Pylint's warnings .

Comment on lines 495 to 499
return "<%s oid=%s, uid=%s>" % (
self.__class__.__name__,
self.oid,
self.uid,
"self.oid",
"self.uid",
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (code-quality): We've found these issues:

Suggested change
return "<%s oid=%s, uid=%s>" % (
self.__class__.__name__,
self.oid,
self.uid,
"self.oid",
"self.uid",
)
return f"<{self.__class__.__name__} oid=self.oid, uid=self.uid>"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be addressed at some point in the future as we transition to removing all the Pylint warnings.

Comment on lines 312 to 315
roots: List[AbstractDROP] = []
for drop in drops.values():
if not droputils.getUpstreamObjects(drop):
roots.append(drop)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (code-quality): Convert for loop into list comprehension (list-comprehension)

Suggested change
roots: List[AbstractDROP] = []
for drop in drops.values():
if not droputils.getUpstreamObjects(drop):
roots.append(drop)
roots: List[AbstractDROP] = [
drop
for drop in drops.values()
if not droputils.getUpstreamObjects(drop)
]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great.

@@ -73,7 +74,7 @@
"""
block_a = DlgSharedMemory("A")
data = pickle.dumps(3)
block_a.buf[0 : len(data)] = data
block_a.buf[0: len(data)] = data
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (code-quality): Replace a[0:x] with a[:x] and a[x:len(a)] with a[x:] (remove-redundant-slice-index)

Suggested change
block_a.buf[0: len(data)] = data
block_a.buf[:len(data)] = data

@coveralls
Copy link

Coverage Status

coverage: 79.544% (+0.08%) from 79.469%
when pulling b4340ab on LIU-383
into db91b75 on master.

@@ -139,7 +139,7 @@ def __exit__(self, typ, value, traceback):

def _get_json(self, url):
ret = self._GET(url)
return json.load(ret) if ret else None
return json.load(ret) if ret else {}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated this to return a dictionary as we were not checking for None when returning the value in get_json. This means we could receive a TypeError: NoneType is not iterable error at runtime.
Returning the dictionary iterable here is more 'Pythonic' as we are able to use the empty dictionary as an iterable regardless of how many elements are in it (i.e. duck-typing).

@@ -383,8 +383,8 @@ def submit_and_monitor_pgt(self):
"""
Combines submission and monitoring steps of a pgt.
"""
session_id = self.submit_pgt()
monitoring_thread = self._monitor(session_id)
self.submit_pgt()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.submit_pgt was never returning session_id (and from what I can tell, that would be a bit of work to derive). Hence, we've always been passing None as a parameter all this time. It is cleaner to just remove this.

@@ -5,6 +5,8 @@
To run it standalone, change the directories, which are now hardcoded
"""

# pylint: skip-file

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am skipping this file completely because it is using mstransform, which is a CASA method that, at some point in time, was available in this code (although I'm not sure how/where it ever ran). This file is broken/un-runable code, but I don't want to remove it from the code base as I do not know it's use case. A decision can be made either now or in a future PR about what to do with it.

@@ -100,52 +100,6 @@ def __init__(self, jd, group_q, done_dict, ssid):
# def __str__(self):
# return json.dumps(self.jd)

@property
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are being removed because there were duplicate definitions below. As Python is interpreted, these definitions would have been over-written and so the latter definitions are the ones we have been using anyway.


tw = 1
sz = 1
dst = "outputs"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if "outputs"/"inputs" are what we want to be defaults here, but it is important to have defaults. An alternative may simply be empty strings.

@@ -513,7 +513,7 @@ def __init__(self, drop_list, max_dop=8, dag=None):
else:
self._dag = dag
self._max_dop = max_dop
self._parts = None # partitions
self._parts = [] # partitions
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is similar to changes above, in which we have None types being returned but we are treating them like iterators. This makes it clear across the board that we only expect to have a list, which can be empty, rather than having to go to all the places self._parts is mentioned and check for None.

@coveralls
Copy link

Coverage Status

coverage: 79.677% (+0.2%) from 79.469%
when pulling 4eec538 on LIU-383
into db91b75 on master.

@myxie
Copy link
Collaborator Author

myxie commented Jun 14, 2024

Hi @awicenec,

Apologies that there's quite a few changes here; I thought it would be worthwhile starting this off, and then at least our pipelines are doing something. There's a (minimal) plan of action set out in the PR for future work that could be incremental; I'd be curious to hear your thoughts.

This is lower priority so I don't expect to get a review before you're back from your travels.

@@ -22,4 +22,4 @@ jobs:

- name: Run pylint
run: |
pylint daliuge-common daliuge-translator daliuge-engine
pylint daliuge-common daliuge-translator daliuge-engine --fail-under=9 --fail-on=E
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These settings mean Pylint will return an error code of 0 unless we are either under 9 or fail on Errors.
We should be at a 10.0 currently, and should have no Errors after this PR. What these settings also allow is for us to enable Warnings (thus seeing them introduced in the code), without us Failing any tests. This means we can, if we want to, enable Warnings at some point whilst we address them incrementally, without it affecting the CI. Something to think about moving forward I expect.

We score a 9.47(?) with the Warnings enabled, so it's unlikely we'd ever go below 9. It is necessary to get the --fail-on=E to work properly, however.

@myxie
Copy link
Collaborator Author

myxie commented Jun 28, 2024

@awicenec just a 'bump' to have a look at this PR when you get the chance.

@coveralls
Copy link

Coverage Status

coverage: 79.555% (+0.08%) from 79.479%
when pulling ecbf3aa on LIU-383
into 2efb3ba on master.

Copy link
Contributor

@awicenec awicenec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very good start, indeed! Thanks for taking this on. We will to work in this direction quite a bit more in the future as well.

@myxie myxie merged commit 2c4049f into master Jul 2, 2024
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants