Transform complex NetApp XCP scan reports into actionable CSV and JSON insights in seconds.
- ποΈ XCP Data Extraction Tool
The XCP Data Extraction Tool automates the parsing and transformation of verbose NetApp XCP scan reports into clean, structured data formats. Designed for storage administrators and data analysts, this tool eliminates hours of manual report analysis by extracting filesystem metadata, access patterns, ownership information, and storage metrics into Excel-ready CSV files and API-friendly JSON outputs.
- π Automated Extraction: Parse complex XCP logs and extract 7+ critical metadata fields automatically
- πΎ Dual Format Output: Generate both CSV (spreadsheet-compatible) and JSON (database-ready) files simultaneously
- π Human-Readable Metrics: Convert raw byte counts to GB/TB for intuitive capacity planning
- π Access Pattern Analysis: Categorize files by access age (>1 year, >1 month, recent) for archival decisions
- π― Compliance Ready: Extract ownership and usage data for audit trails and chargeback reporting
- β‘ Time Savings: Reduce report analysis time from hours to seconds
- π§ Flexible Integration: JSON output enables seamless integration with monitoring dashboards and automation workflows
Sample Input (Raw XCP Report):
Filesystem: /vol/engineering_data
Filer: netapp-prod-01
Total: 5497558138880 bytes
Access >1 year: 1234 files
Users: 45
...
Sample Output (Generated CSV):
| Filesystem | Filer | Mountpoint | Access >1 Year | Total Used |
|---|---|---|---|---|
| /vol/engineering_data | netapp-prod-01 | /mnt/engineering | 1,234 files | 5.12 TB |
| /vol/archives | netapp-prod-02 | /mnt/archive | 45,678 files | 12.8 TB |
Before running the XCP Data Extraction Tool, ensure you have the following installed:
- Python 3.6+ (Python 3.8+ recommended)
- pip (Python package manager)
- Access to NetApp XCP scan reports (
.txtor.logfiles)
Check your Python version:
python --version
# or
python3 --version# Clone the repository
git clone https://github.com/DMarkStorage/Xcp_Data_Extraction_Tool.git
# Navigate to the project directory
cd xcp-data-extraction
# Install required dependencies
pip install -r requirements.txt- Download the latest release from Releases
- Extract the ZIP file
- Navigate to the extracted directory
- Run:
pip install -r requirements.txt
Required Python Packages:
pandas>=1.3.0docopt>=0.6.2
Run the tool with a single command to extract data from your XCP report:
python xcp_extractor.py --input /path/to/xcp_scan_report.txt --output filesystem_analysisWhat happens:
- The tool reads your XCP scan report
- Extracts filesystem metadata, access patterns, and storage metrics
- Generates two files:
filesystem_analysis.csv(Excel-compatible)filesystem_analysis.json(API/database-ready)
python xcp_extractor.py \
--input xcp_report.txt \
--output api_data \
--format json Usage:
extract_data_xcp.py -r <FILENAME> -f <OUTPUTNAME>
extract_data_xcp.py -r <FILENAME> -f <OUTPUTNAME> -v [-n <NUMROWS>]
extract_data_xcp.py --version
extract_data_xcp.py -h | --help
Options:
-f <OUTPUTNAME> Output filename (without extension).
-v --view View a preview of the output DataFrame.
-n <NUMROWS> Number of rows to display in preview [default: 10].
-r <FILENAME> Input filename to process.
-h --help Show this message and exit
--version Show program version and exitThe tool is built around a modular architecture:
def all_data(output_name, file_systems, filers, mountpoints,
extracted_paths, access_list, users_list, total_used):
"""
Coordinates extraction and transformation of XCP report data.
Args:
output_name (str): Base name for output files
file_systems (list): List of filesystem identifiers
filers (list): NetApp filer names
mountpoints (list): NFS mount paths
extracted_paths (list): Subdirectory paths
access_list (list): File access frequency data
users_list (list): User/owner information
total_used (list): Raw storage consumption in bytes
Returns:
None: Writes data to CSV and JSON files
"""
data = []
for fs, filer, mountpoint, e_path, access, users, used_raw in zip(
file_systems, filers, mountpoints, extracted_paths, access_list, users_list, total_used
):
used_raw_str = used_raw.strip()
used_human = convert_size(int(used_raw_str))
data.append([
fs.strip(),
filer,
mountpoint.strip(),
e_path.strip(),
access[0],
access[1],
access[2],
users,
used_human
])
return data_to_file(output_name, data)Key Design Principles:
- Separation of Concerns: Parsing, transformation, and output are handled by distinct modules
- Data Validation: Input sanitization prevents malformed data from breaking extraction
- Human-Readable Conversion: Automatic byte-to-TB conversion via
convert_size()function - Flexible Output:
data_to_file()handles both CSV and JSON serialization
βββββββββββββββββββββββ
β XCP Scan Report β
β (Raw Text Input) β
ββββββββββββ¬βββββββββββ
β
βΌ
βββββββββββββββββββββββ
β Pattern Matching β
β & Text Parsing β
ββββββββββββ¬βββββββββββ
β
βΌ
βββββββββββββββββββββββ
β Data Extraction β
β (7 Metadata Fields)β
ββββββββββββ¬βββββββββββ
β
βΌ
βββββββββββββββββββββββ
β Transformation β
β (Bytes β TB) β
ββββββββββββ¬βββββββββββ
β
βΌ
βββββββββββββββββββββββ
β Output Generation β
β CSV + JSON β
βββββββββββββββββββββββ
We welcome contributions from the community! Whether you're fixing bugs, adding features, or improving documentation, your help is appreciated.
- Fork the repository on GitHub
- Create a feature branch:
git checkout -b feature/amazing-feature - Make your changes and commit:
git commit -m 'Add amazing feature' - Push to your branch:
git push origin feature/amazing-feature - Open a Pull Request with a clear description of your changes
# Clone your fork
git clone https://github.com/DMarkStorage/Xcp_Data_Extraction_Tool.git
cd xcp-data-extraction
# Create a virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install development dependencies
pip install -r requirements.txt
- Write clear, descriptive commit messages
- Add unit tests for new features
- Update documentation for API changes
- Follow PEP 8 style guidelines for Python code
- Ensure all tests pass before submitting PR
This project is licensed under the MIT License - see the LICENSE file for details.
MIT License
Copyright (c) 2025 Damini Marvin Mark
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files...
This project was inspired by the challenges faced by storage administrators dealing with verbose NetApp XCP reports. Special thanks to:
- NetApp for the XCP tool and comprehensive API documentation
- The Python Community for excellent libraries like
pandasanddocopt - Storage Administrators who provided feedback on early versions
- Contributors who have helped improve this tool
- π Found a bug? Open an issue
- π‘ Have a feature request? [Start a discussion]((https://github.com/DMarkStorage/Xcp_Data_Extraction_Tool/discussions)
- π Website: dmarkstorage.io
If this tool has saved you time or helped your organization, consider:
- β Starring the repository on GitHub
- π’ Sharing it with colleagues in storage administration
- π€ Contributing improvements or documentation