Welcome to the Software Refactoring Prediction Model repository! This project utilizes machine learning techniques to predict code refactoring needs. It's designed to help developers maintain clean and efficient codebases by whether the developer need to refactor that particular piece of code or not.
- Original Research Paper
- Old Codebase
- Their code used a software called Refactroing Miner, which is a task in itself to get it up and running. Contrary to that, we used the SQL Scripts provided to extract data in almost half the time.
- Data Fetching Scripts This project uses Python to analyze codebases and predict refactoring opportunities. It leverages several machine learning models to assess various aspects of the code and suggests potential refactoring to enhance code quality.
- Improved the code by remvoing the unwanted methods and features that weren't needed in the final project.
- Included Randomized and Grid Search Cross Validation support.
- Due to challenges in obtaining the optimal hardware for running this project, performed the same analysis and got similar results on a fraction (0.2%, 0.5% & 1.0%) of the original dataset.
Follow these simple steps to get a local copy up and running.
- Python 3.8+
- pip
-
Clone the repo:
git clone https://github.com/Hetav01/Software-Refactoring-Prediction-Model.git
-
Extract the amount of dataset required for the pipeline from the Data Fetching Scripts.
-
Copy the CSV dataset in the
dataset
folder. -
Edit the pathnames at required places, namely,
preprocessing/preprocessing.py
,binaryClassification.py
,testing/Runner_Test.py
andtesting/binaryClassification2.py
. -
Before running the driver file for the entire pipeline, install all the required dependencies:
pip3 install --user -r requirements.txt
-
The driver file for the code is either
binaryClassification.py
ortesting/binaryClassification2.py
depending on whether you want to just get the results or additionally test the models on unseen data(usetesting/binaryClassification2.py
for that). You can run either by executing the following command:python3 binaryClassification.py
python3 testing/binaryClassification2.py
The script will follow the configurations in the configs.py
. There, you can define which datasets to analyze, which models to build, which under sampling algorithms to use, and etc. Please, read the comments of this file carefully.
- For collecting the results, the Python scripts will automatically update the
result.txt
andresult_unseen.txt
files to provide you with the latest metrics. Refer to the terminal while the program is running to understand which Hyperparamters work best for each model.
Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (git checkout -b feature/AmazingFeature)
- Commit your Changes (git commit -m 'Add some AmazingFeature')
- Push to the Branch (git push origin feature/AmazingFeature)
- Open a Pull Request