GreenPyData Plugin for (PyTorch) Data Scientists

As data science continues to grow in popularity (see LLMs...), it is becoming increasingly important to consider the environmental impact of the code we write. Many data science tasks, especially deep learning ones, require significant computational resources, which in turn generate carbon emissions and contribute to climate change.

It is highly inspired from https://github.com/green-code-initiative/ecoCode which is a project really worth checking! And using!

GreenPyData is a humble try from a data scientist who is interested in sustainability and eco-friendliness in software development and data science.

Introduction

GreenPyData is an open-source SonarQube plugin designed specifically for data scientists (who use PyTorch). Its purpose is to assist in eco-designing your code by identifying and flagging energy-intensive or computationally inefficient code segments that can be optimized to reduce carbon footprint and improve performance.

Currently, GreenPyData only supports PyTorch. However, we plan to support other frameworks in the future.

To install GreenPyData, follow these steps:

Install SonarQube on your system (version 7.9 or higher).
Download the GreenPyData plugin from our GitHub repository.
Create the .jar (mvn clean package -DskipTests) and copy it into the extensions/plugins directory of your SonarQube installation.
Restart SonarQube.

Usage

Once GreenPyData is installed, you can use it to analyze your Python code by running a SonarQube analysis.

To do so, follow these steps:

Open the SonarQube dashboard.
Create a new project and configure the project settings as needed: add the plugin and get your token.
Run the analysis (mvn org.sonarsource.scanner.maven:sonar-maven-plugin:3.9.1.2184:sonar -Dsonar.login=YOUR_TOKEN).

GreenPyData will then analyze your PyTorch code and flag any energy-intensive or computationally inefficient code segments that can be optimized.

Contribution

Any help or contribution to GreenPyData is highly appreciated! Feel free to fork the repository, make your changes, and submit a pull request.

If you encounter any issues or have suggestions for improvement, please open an issue.

For code style and formatting, please read https://github.com/SonarSource/sonar-developer-toolset.

Implemented rules (PyTorch only for now)

The core idea for rules is "100% precision". Rules should not trigger false positives. The package should be used by Data Scientists to help them write greener code and not bother them with thousands of false alarms.

ID	Rule name	Desc.
P1	AvoidDataParallelInsteadofDistributedDataParallel	Usage of DistributedDataParallel instead of DataParallel even for a single node
P2	AvoidBlockingDataloaders	Usage of asynchronous data loading for better (and shorter) GPU usage
P3	AvoidNonPinnedMemoryForDataloaders	Usage of pinned memory to reduce data transfer in RAM
P4	AvoidConvBiasBeforeBatchNorm (Conv2d)	Remove bias for convolutions before batch norm layers to save time and memory
P5	AvoidCreatingTensorUsingNumpyOrNativePython	Directly create tensors as torch.Tensor and avoid Numpy or native python functions
P6	UseInPlaceOperationsInModulesWhenPossible	Use InPlace operations when possible (only implemented for sequential modules)

Conclusion

Thank you for using GreenPyData! We hope it helps you in eco-designing your data science code and contributes to a more sustainable software development process. If you have any questions or feedback, rules, ideas or anything else :) feel free to reach out.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commits
.github/workflows		.github/workflows
docs		docs
pytorch-plugin		pytorch-plugin
.gitignore		.gitignore
LICENCE.md		LICENCE.md
README.md		README.md
pom.xml		pom.xml
sonar-project.properties		sonar-project.properties

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

docs

docs

pytorch-plugin

pytorch-plugin

.gitignore

.gitignore

LICENCE.md

LICENCE.md

README.md

README.md

pom.xml

pom.xml

sonar-project.properties

sonar-project.properties

Repository files navigation

GreenPyData Plugin for (PyTorch) Data Scientists

Introduction

Usage

Contribution

Implemented rules (PyTorch only for now)

Conclusion

About

Releases

Packages

Languages

License

AghilesAzzoug/GreenPyData

Folders and files

Latest commit

History

Repository files navigation

GreenPyData Plugin for (PyTorch) Data Scientists

Introduction

Usage

Contribution

Implemented rules (PyTorch only for now)

Conclusion

About

Topics

Resources

License

Stars

Watchers

Forks

Languages