Skip to content

ChristopherOlinDavis/FFXI-python-AH-Scrapper

Repository files navigation

FFXI Auction House Scraper

A Python-based web scraper for collecting item data from Final Fantasy XI (FFXI) Auction House websites. This tool uses headless browsers to navigate and extract item information, prices, stock levels, and other relevant data for price comparison and analysis.

Features

  • Flexible Browser Support: Choose between Playwright or Selenium for web scraping
  • Headless Operation: Run browsers in headless mode for efficient scraping
  • HTML Parsing: Robust HTML parsing using BeautifulSoup
  • Multiple Export Formats: Export data to JSON, CSV, or both
  • Retry Logic: Automatic retry with exponential backoff for failed requests
  • Configurable: YAML-based configuration for easy customization
  • Logging: Comprehensive logging with Loguru
  • Modular Design: Clean, extensible architecture

Project Structure

FFXI-python-AH-Scrapper/
├── src/
│   └── ffxi_ah_scraper/
│       ├── scrapers/          # Browser automation modules
│       │   ├── base_scraper.py
│       │   ├── playwright_scraper.py
│       │   └── selenium_scraper.py
│       ├── parsers/           # HTML parsing modules
│       │   ├── html_parser.py
│       │   └── item_parser.py
│       ├── exporters/         # Data export modules
│       │   ├── data_exporter.py
│       │   ├── json_exporter.py
│       │   └── csv_exporter.py
│       └── utils/             # Utility modules
│           ├── config_loader.py
│           └── logger_setup.py
├── data/
│   ├── raw/                   # Raw HTML files
│   └── processed/             # Exported data (JSON/CSV)
├── tests/                     # Test files
├── config.yaml               # Configuration file
├── main.py                   # Main entry point
└── requirements.txt          # Python dependencies

Installation

Prerequisites

  • Python 3.8 or higher
  • pip package manager

Setup

  1. Clone the repository:
git clone https://github.com/yourusername/FFXI-python-AH-Scrapper.git
cd FFXI-python-AH-Scrapper
  1. Create a virtual environment (recommended):
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Install Playwright browsers (if using Playwright):
playwright install chromium
  1. Copy the environment template:
cp .env.example .env

Configuration

Edit config.yaml to customize the scraper:

# Scraper Settings
scraper:
  browser_type: "playwright"  # or "selenium"
  headless: true
  timeout: 30000
  request_delay: 2
  max_retries: 3

# URLs (update with actual FFXI AH endpoints)
urls:
  base_url: "https://www.ffxiah.com"
  item_search: "/browse"
  item_details: "/item/{item_id}"

# Export Settings
export:
  output_dir: "data/processed"
  format: "both"  # "json", "csv", or "both"
  save_raw_html: true

Usage

Basic Usage

  1. Update the configuration file with actual FFXI AH URLs
  2. Modify main.py to uncomment example code and add your scraping logic
  3. Run the scraper:
python main.py

Example: Scraping a Single Item

from ffxi_ah_scraper.scrapers.playwright_scraper import PlaywrightScraper
from ffxi_ah_scraper.parsers.item_parser import ItemParser
from ffxi_ah_scraper.utils.config_loader import load_config

# Load configuration
config = load_config("config.yaml")

# Initialize scraper
with PlaywrightScraper(config) as scraper:
    # Scrape item page
    html = scraper.scrape_with_retry("https://www.ffxiah.com/item/4096")

    # Parse the data
    parser = ItemParser(html)
    item_data = parser.extract_item_data()

    print(item_data)

Example: Scraping Search Results

from ffxi_ah_scraper.scrapers.playwright_scraper import PlaywrightScraper
from ffxi_ah_scraper.parsers.item_parser import ItemParser
from ffxi_ah_scraper.exporters.json_exporter import JSONExporter

config = load_config("config.yaml")

with PlaywrightScraper(config) as scraper:
    # Get search results
    html = scraper.scrape_with_retry("https://www.ffxiah.com/browse?q=potion")

    parser = ItemParser(html)
    items = parser.extract_search_results()

    # Export to JSON
    exporter = JSONExporter("data/processed")
    exporter.export(items, "search_results")

Customization

Creating a Custom Parser

Extend the HTMLParser class to create custom parsers:

from ffxi_ah_scraper.parsers.html_parser import HTMLParser

class CustomParser(HTMLParser):
    def extract_custom_data(self):
        # Your custom extraction logic
        return self.get_text(".custom-selector")

Creating a Custom Exporter

Extend the DataExporter class for custom export formats:

from ffxi_ah_scraper.exporters.data_exporter import DataExporter

class XMLExporter(DataExporter):
    def export(self, data, filename):
        # Your custom export logic
        pass

Output Data Format

JSON Format

{
  "item_id": "4096",
  "item_name": "Fire Crystal",
  "category": "Crystals",
  "price_data": [
    {
      "server": "Bahamut",
      "price": 1500,
      "stock": 100,
      "seller": "Merchant1"
    }
  ],
  "last_updated": "2025-12-29T12:00:00"
}

CSV Format

item_id,item_name,category,server,price,stock
4096,Fire Crystal,Crystals,Bahamut,1500,100

Development

Running Tests

pytest tests/

Code Formatting

black src/

Type Checking

mypy src/

Important Notes

Legal and Ethical Considerations

  • Always respect the website's robots.txt file
  • Implement appropriate delays between requests to avoid overloading servers
  • Review and comply with the website's Terms of Service
  • Consider using official APIs if available

HTML Selectors

The current parser implementations use placeholder CSS selectors. You'll need to:

  1. Inspect the actual FFXI AH website's HTML structure
  2. Update the selectors in item_parser.py to match the actual elements
  3. Test the selectors to ensure accurate data extraction

Browser Drivers

  • Playwright: Automatically downloads and manages browser binaries
  • Selenium: May require manual ChromeDriver installation/configuration

Troubleshooting

Common Issues

  1. ImportError: Make sure you're in the virtual environment and dependencies are installed
  2. Browser not found: Run playwright install chromium for Playwright
  3. Timeout errors: Increase the timeout value in config.yaml
  4. Parsing errors: Check and update CSS selectors for the current website structure

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

License

This project is provided as-is for educational purposes. Please ensure you comply with all applicable laws and website terms of service when using this tool.

Disclaimer

This scraper is a tool for data collection and should be used responsibly. The authors are not responsible for misuse or any violations of terms of service. Always verify that your use case complies with the target website's policies and applicable laws.

Future Enhancements

  • Database integration for storing scraped data
  • Scheduler for periodic scraping
  • Price trend analysis
  • Multi-server comparison
  • Rate limiting configuration
  • Proxy support
  • User authentication handling
  • API endpoint creation for scraped data

Support

For issues, questions, or contributions, please open an issue on GitHub.

About

A scrapping tool for FFXI AH

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors