Conversation
- Replace sys.argv with argparse for proper argument parsing - Add --input-folder, --genes-list, and --output-folder parameters (with short flags) - Implement proper error handling and validation - Add logging instead of simple print statements - Follow PEP 8 style guidelines - Add main() function with if __name__ == "__main__" guard - Fix logic bug where FEATURES list was reset incorrectly - Use os.path.join() for path operations - Add comprehensive docstrings for all functions - Add shebang line for better Unix compatibility Co-authored-by: Mxrcon <48180517+Mxrcon@users.noreply.github.com>
Co-authored-by: Mxrcon <48180517+Mxrcon@users.noreply.github.com>
|
@copilot continue this work updating the scripts "take genes into aminoacid.py" and "take genes into nucleotides.py" please add a progress output to tack how many gbks have been processed, also create a testing folder to make sure that all python scripts are tested and working. |
- Update "take genes into aminoacid.py" with argparse and progress tracking - Update "take genes into nucleotides.py" with argparse and progress tracking - Add professional structure matching cdsselector.py - Create comprehensive test suite in tests/ folder - Add test data with sample GenBank files - Add progress output showing "Processing file X/Y" - All 11 tests passing (100%) Co-authored-by: Mxrcon <48180517+Mxrcon@users.noreply.github.com>
Done! I've updated both scripts and created a comprehensive test suite. All changes committed in bf49a01. Updates made:
Run tests with: Both scripts now show progress like: |
Overview
This PR modernizes three bioinformatics Python scripts to follow professional best practices, making them more maintainable, user-friendly, and robust. Additionally, it includes a comprehensive test suite to ensure all scripts work correctly.
Changes Made
1. Modernized cdsselector.py
Replaced positional
sys.argvarguments withargparsefor a professional CLI experience:Before:
After:
python cdsselector.py --input-folder gbk_files --genes-list genes.txt --output-folder filtered_gbk # Short flags also available: python cdsselector.py -i gbk_files -g genes.txt -o filtered_gbk2. Modernized "take genes into aminoacid.py"
Updated the amino acid extraction script with the same professional improvements:
Before:
After:
Key improvements:
3. Modernized "take genes into nucleotides.py"
Updated the nucleotide extraction script with matching improvements:
Before:
After:
Key improvements:
4. Comprehensive Test Suite
Created a complete testing infrastructure:
Structure:
Test Coverage:
Run tests with:
python tests/test_scripts.pyCommon Improvements Across All Scripts
Command-Line Interface:
--help/-hflag with comprehensive usage documentationProgress Tracking:
All scripts now show processing progress:
Enhanced Error Handling:
Professional Code Structure:
#!/usr/bin/env python) for Unix compatibilitymain()function withif __name__ == "__main__"guardImproved Logging:
Replaced basic
print()statements with structured logging:Code Quality Improvements:
FEATURESlistos.path.join()for proper cross-platform path operations.gitignoreto exclude Python cache files and test outputsBetter Documentation:
Testing
Automated Tests:
Manual Testing:
Migration Notes
For cdsselector.py:
For gene extraction scripts:
Important: Gene lists are now loaded from files instead of being hardcoded in the scripts. This makes the scripts more flexible and reusable.
Related
Closes the issue requesting professional updates to cdsselector.py and gene extraction scripts with proper parameter handling, progress tracking, and testing infrastructure.
Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.