Skip to content

DMarkStorage/Xcp_Data_Extraction_Tool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ—„οΈ XCP Data Extraction Tool

Transform complex NetApp XCP scan reports into actionable CSV and JSON insights in seconds.

License: MIT Python Version Build Status Last Commit Maintenance


πŸ“‘ Table of Contents


πŸš€ Project Overview

The XCP Data Extraction Tool automates the parsing and transformation of verbose NetApp XCP scan reports into clean, structured data formats. Designed for storage administrators and data analysts, this tool eliminates hours of manual report analysis by extracting filesystem metadata, access patterns, ownership information, and storage metrics into Excel-ready CSV files and API-friendly JSON outputs.

✨ Key Features

  • πŸ“Š Automated Extraction: Parse complex XCP logs and extract 7+ critical metadata fields automatically
  • πŸ’Ύ Dual Format Output: Generate both CSV (spreadsheet-compatible) and JSON (database-ready) files simultaneously
  • πŸ“ Human-Readable Metrics: Convert raw byte counts to GB/TB for intuitive capacity planning
  • πŸ• Access Pattern Analysis: Categorize files by access age (>1 year, >1 month, recent) for archival decisions
  • 🎯 Compliance Ready: Extract ownership and usage data for audit trails and chargeback reporting
  • ⚑ Time Savings: Reduce report analysis time from hours to seconds
  • πŸ”§ Flexible Integration: JSON output enables seamless integration with monitoring dashboards and automation workflows

Sample Input (Raw XCP Report):

Filesystem: /vol/engineering_data
Filer: netapp-prod-01
Total: 5497558138880 bytes
Access >1 year: 1234 files
Users: 45
...

Sample Output (Generated CSV):

Filesystem Filer Mountpoint Access >1 Year Total Used
/vol/engineering_data netapp-prod-01 /mnt/engineering 1,234 files 5.12 TB
/vol/archives netapp-prod-02 /mnt/archive 45,678 files 12.8 TB

πŸ› οΈ Getting Started

Prerequisites

Before running the XCP Data Extraction Tool, ensure you have the following installed:

  • Python 3.6+ (Python 3.8+ recommended)
  • pip (Python package manager)
  • Access to NetApp XCP scan reports (.txt or .log files)

Check your Python version:

python --version
# or
python3 --version

Installation

Method 1: Clone from GitHub (Recommended)

# Clone the repository
git clone https://github.com/DMarkStorage/Xcp_Data_Extraction_Tool.git

# Navigate to the project directory
cd xcp-data-extraction

# Install required dependencies
pip install -r requirements.txt

Method 2: Download ZIP

  1. Download the latest release from Releases
  2. Extract the ZIP file
  3. Navigate to the extracted directory
  4. Run: pip install -r requirements.txt

Required Python Packages:

  • pandas>=1.3.0
  • docopt>=0.6.2

βš™οΈ Usage Examples

Quick Start

Run the tool with a single command to extract data from your XCP report:

python xcp_extractor.py --input /path/to/xcp_scan_report.txt --output filesystem_analysis

What happens:

  1. The tool reads your XCP scan report
  2. Extracts filesystem metadata, access patterns, and storage metrics
  3. Generates two files:
    • filesystem_analysis.csv (Excel-compatible)
    • filesystem_analysis.json (API/database-ready)

JSON-Only Output for API Integration

python xcp_extractor.py \
  --input xcp_report.txt \
  --output api_data \
  --format json

Command-Line Options

    Usage:
        extract_data_xcp.py -r <FILENAME> -f <OUTPUTNAME>
        extract_data_xcp.py -r <FILENAME> -f <OUTPUTNAME> -v [-n <NUMROWS>]
        extract_data_xcp.py --version
        extract_data_xcp.py -h | --help

    Options:
        -f <OUTPUTNAME>     Output filename (without extension).
        -v --view           View a preview of the output DataFrame. 
        -n <NUMROWS>        Number of rows to display in preview [default: 10].
        -r <FILENAME>       Input filename to process.
        -h --help           Show this message and exit
        --version           Show program version and exit

Core Functions

The tool is built around a modular architecture:

def all_data(output_name, file_systems, filers, mountpoints,
             extracted_paths, access_list, users_list, total_used):
    """
    Coordinates extraction and transformation of XCP report data.
    
    Args:
        output_name (str): Base name for output files
        file_systems (list): List of filesystem identifiers
        filers (list): NetApp filer names
        mountpoints (list): NFS mount paths
        extracted_paths (list): Subdirectory paths
        access_list (list): File access frequency data
        users_list (list): User/owner information
        total_used (list): Raw storage consumption in bytes
    
    Returns:
        None: Writes data to CSV and JSON files
    """
    data = []

    for fs, filer, mountpoint, e_path, access, users, used_raw in zip(
        file_systems, filers, mountpoints, extracted_paths, access_list, users_list, total_used
    ):
        used_raw_str = used_raw.strip()
        used_human = convert_size(int(used_raw_str))

        data.append([
            fs.strip(),
            filer,
            mountpoint.strip(),
            e_path.strip(),
            access[0],
            access[1],
            access[2],
            users,
            used_human
        ])

    return data_to_file(output_name, data)

Key Design Principles:

  • Separation of Concerns: Parsing, transformation, and output are handled by distinct modules
  • Data Validation: Input sanitization prevents malformed data from breaking extraction
  • Human-Readable Conversion: Automatic byte-to-TB conversion via convert_size() function
  • Flexible Output: data_to_file() handles both CSV and JSON serialization

Architecture Overview

                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  XCP Scan Report    β”‚
                    β”‚  (Raw Text Input)   β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
                            β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  Pattern Matching   β”‚
                    β”‚  & Text Parsing     β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
                            β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  Data Extraction    β”‚
                    β”‚  (7 Metadata Fields)β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
                            β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  Transformation     β”‚
                    β”‚  (Bytes β†’ TB)       β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
                            β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  Output Generation  β”‚
                    β”‚  CSV + JSON         β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

🀝 Contributing

We welcome contributions from the community! Whether you're fixing bugs, adding features, or improving documentation, your help is appreciated.

How to Contribute

  1. Fork the repository on GitHub
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Make your changes and commit: git commit -m 'Add amazing feature'
  4. Push to your branch: git push origin feature/amazing-feature
  5. Open a Pull Request with a clear description of your changes

Development Setup

# Clone your fork
git clone https://github.com/DMarkStorage/Xcp_Data_Extraction_Tool.git
cd xcp-data-extraction

# Create a virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install development dependencies
pip install -r requirements.txt

Contribution Guidelines

  • Write clear, descriptive commit messages
  • Add unit tests for new features
  • Update documentation for API changes
  • Follow PEP 8 style guidelines for Python code
  • Ensure all tests pass before submitting PR

πŸ“ License & Acknowledgements

License

This project is licensed under the MIT License - see the LICENSE file for details.

MIT License

Copyright (c) 2025 Damini Marvin Mark

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files...

Acknowledgements

This project was inspired by the challenges faced by storage administrators dealing with verbose NetApp XCP reports. Special thanks to:

  • NetApp for the XCP tool and comprehensive API documentation
  • The Python Community for excellent libraries like pandas and docopt
  • Storage Administrators who provided feedback on early versions
  • Contributors who have helped improve this tool

References


πŸ’¬ Contact & Support

Get Help

Connect With Us

Support This Project

If this tool has saved you time or helped your organization, consider:

  • ⭐ Starring the repository on GitHub
  • πŸ“’ Sharing it with colleagues in storage administration
  • 🀝 Contributing improvements or documentation

About

Python automation tool that parses NetApp XCP scan reports and extracts filesystem metadata into structured CSV/JSON formats, reducing storage analysis time from hours to seconds.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages