Enhancements and Refactoring’s in run.py Script: String formatting Updates, and AI Feature Integration#35
Open
RahulVadisetty91 wants to merge 2 commits intoVivekPa:masterfrom
Open
Conversation
This update introduces significant enhancements to the AI processing pipeline, focusing on improving data handling, scaling, and model training. The key changes include: 1. Refactoring File Paths with Constants: - Introduced constants for frequently used file paths to improve code maintainability and readability. This reduces redundancy and makes future modifications easier. 2. Enhanced Data Processing: - Updated the `BaseBars` class usage to handle different types of price bars, including tick, dollar, and volume bars. This improves the flexibility of data processing by allowing the script to create multiple bar types from raw price and volume data. - Added functionality to handle data from new CSV paths and ensured compatibility with different data formats. 3. Data Scaling Improvements: - Implemented the `MinMaxScaler` for feature scaling, ensuring that input data is normalized to the range [-1, 1]. This scaling enhances the performance of the AutoEncoder model by improving convergence and accuracy. 4. AutoEncoder Model Enhancements: - Updated the `AutoEncoder` model to include advanced architecture configurations with customizable layer sizes. This includes building and training the model with specified layer dimensions and epochs to better capture complex data patterns. - Added functionality to encode and process data efficiently, saving the encoded features for further analysis. 5. Random Forest Model Updates: - Integrated a new `RFModel` class for Random Forest implementation, allowing for advanced model training and testing. The updated script includes model parameter adjustments and training with both scaled and original datasets. - Enhanced model evaluation to ensure comprehensive testing of the Random Forest model’s performance on various datasets. 6. Removed Unnecessary Code: - Cleaned up commented-out sections related to `NNModel`, focusing the script on the implemented models. This helps streamline the code and reduces clutter. 7. Improved Code Structure: - Refactored the script to improve overall organization, including clear separation of data processing, model training, and evaluation sections. This enhances readability and maintainability. These updates aim to streamline the data processing pipeline, enhance model performance, and ensure more robust handling of various data types and scaling requirements.
Enhanced Data Processing and Model Training with Improved Scaling and Refactored File Paths
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
With this update, there is a large number of enhancements to the `run. Developing such script in python, makes it easier to maintain and improve, faster and optimise and incorporate enhanced AI capabilities and features. Hardcoding has been minimized in the script because it has been replaced by constants for file paths to increase code maintainability. String formatting has been changed to use only readable and compatible string formatting instead of the f-string format.
2. Related Issues:
This update is aimed at the direct violation of code duplication, string literal hard-coding, and data preprocessing and model training function performance enhancement. Using it also resolves issues where f-string formatting affects code readability in certain parts of the code.
3. Discussions:
Discussions centred on enhancing the quality of the script by replacing string literals through better string formatting. Further, there were deliberations on how AI features are to be incorporated in the script; All the while, focusing on making the script adaptable to future AI uses.
4. QA Instructions:
QA should ensure that path constants defined during work on the application have remained functional and that changes in string formatting make it easier to understand the code without affecting its performance. Some of the new features, which include data preprocessing and the training routines, should be validated for the levels of accuracy of performance.
5. Merge Plan:
Once it is sure that the constants, strings formatting and AI feature integration implemented correctly, the branch can be merged with other codes. Before the merge, it is advised that testing be carried out in several environments.
6. Motivation and Context:
The reasons for such changes were the appeals to remove duplicated code as well as to make codebase more manageable through refactoring of constants and string formatting. It was crucial to include the powerful AI features for better script performance and possibility to use it for other tasks with considering today’s coding standards.
7. Types of Changes: