FFXI Auction House Scraper

A Python-based web scraper for collecting item data from Final Fantasy XI (FFXI) Auction House websites. This tool uses headless browsers to navigate and extract item information, prices, stock levels, and other relevant data for price comparison and analysis.

Features

Flexible Browser Support: Choose between Playwright or Selenium for web scraping
Headless Operation: Run browsers in headless mode for efficient scraping
HTML Parsing: Robust HTML parsing using BeautifulSoup
Multiple Export Formats: Export data to JSON, CSV, or both
Retry Logic: Automatic retry with exponential backoff for failed requests
Configurable: YAML-based configuration for easy customization
Logging: Comprehensive logging with Loguru
Modular Design: Clean, extensible architecture

Project Structure

FFXI-python-AH-Scrapper/
├── src/
│   └── ffxi_ah_scraper/
│       ├── scrapers/          # Browser automation modules
│       │   ├── base_scraper.py
│       │   ├── playwright_scraper.py
│       │   └── selenium_scraper.py
│       ├── parsers/           # HTML parsing modules
│       │   ├── html_parser.py
│       │   └── item_parser.py
│       ├── exporters/         # Data export modules
│       │   ├── data_exporter.py
│       │   ├── json_exporter.py
│       │   └── csv_exporter.py
│       └── utils/             # Utility modules
│           ├── config_loader.py
│           └── logger_setup.py
├── data/
│   ├── raw/                   # Raw HTML files
│   └── processed/             # Exported data (JSON/CSV)
├── tests/                     # Test files
├── config.yaml               # Configuration file
├── main.py                   # Main entry point
└── requirements.txt          # Python dependencies

Installation

Prerequisites

Python 3.8 or higher
pip package manager

Setup

Clone the repository:

git clone https://github.com/yourusername/FFXI-python-AH-Scrapper.git
cd FFXI-python-AH-Scrapper

Create a virtual environment (recommended):

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Install Playwright browsers (if using Playwright):

playwright install chromium

Copy the environment template:

cp .env.example .env

Configuration

Edit config.yaml to customize the scraper:

# Scraper Settings
scraper:
  browser_type: "playwright"  # or "selenium"
  headless: true
  timeout: 30000
  request_delay: 2
  max_retries: 3

# URLs (update with actual FFXI AH endpoints)
urls:
  base_url: "https://www.ffxiah.com"
  item_search: "/browse"
  item_details: "/item/{item_id}"

# Export Settings
export:
  output_dir: "data/processed"
  format: "both"  # "json", "csv", or "both"
  save_raw_html: true

Usage

Basic Usage

Update the configuration file with actual FFXI AH URLs
Modify main.py to uncomment example code and add your scraping logic
Run the scraper:

python main.py

Example: Scraping a Single Item

from ffxi_ah_scraper.scrapers.playwright_scraper import PlaywrightScraper
from ffxi_ah_scraper.parsers.item_parser import ItemParser
from ffxi_ah_scraper.utils.config_loader import load_config

# Load configuration
config = load_config("config.yaml")

# Initialize scraper
with PlaywrightScraper(config) as scraper:
    # Scrape item page
    html = scraper.scrape_with_retry("https://www.ffxiah.com/item/4096")

    # Parse the data
    parser = ItemParser(html)
    item_data = parser.extract_item_data()

    print(item_data)

Example: Scraping Search Results

from ffxi_ah_scraper.scrapers.playwright_scraper import PlaywrightScraper
from ffxi_ah_scraper.parsers.item_parser import ItemParser
from ffxi_ah_scraper.exporters.json_exporter import JSONExporter

config = load_config("config.yaml")

with PlaywrightScraper(config) as scraper:
    # Get search results
    html = scraper.scrape_with_retry("https://www.ffxiah.com/browse?q=potion")

    parser = ItemParser(html)
    items = parser.extract_search_results()

    # Export to JSON
    exporter = JSONExporter("data/processed")
    exporter.export(items, "search_results")

Customization

Creating a Custom Parser

Extend the HTMLParser class to create custom parsers:

from ffxi_ah_scraper.parsers.html_parser import HTMLParser

class CustomParser(HTMLParser):
    def extract_custom_data(self):
        # Your custom extraction logic
        return self.get_text(".custom-selector")

Creating a Custom Exporter

Extend the DataExporter class for custom export formats:

from ffxi_ah_scraper.exporters.data_exporter import DataExporter

class XMLExporter(DataExporter):
    def export(self, data, filename):
        # Your custom export logic
        pass

Output Data Format

JSON Format

{
  "item_id": "4096",
  "item_name": "Fire Crystal",
  "category": "Crystals",
  "price_data": [
    {
      "server": "Bahamut",
      "price": 1500,
      "stock": 100,
      "seller": "Merchant1"
    }
  ],
  "last_updated": "2025-12-29T12:00:00"
}

CSV Format

item_id,item_name,category,server,price,stock
4096,Fire Crystal,Crystals,Bahamut,1500,100

Development

Running Tests

pytest tests/

Code Formatting

black src/

Type Checking

mypy src/

Important Notes

Legal and Ethical Considerations

Always respect the website's robots.txt file
Implement appropriate delays between requests to avoid overloading servers
Review and comply with the website's Terms of Service
Consider using official APIs if available

HTML Selectors

The current parser implementations use placeholder CSS selectors. You'll need to:

Inspect the actual FFXI AH website's HTML structure
Update the selectors in item_parser.py to match the actual elements
Test the selectors to ensure accurate data extraction

Browser Drivers

Playwright: Automatically downloads and manages browser binaries
Selenium: May require manual ChromeDriver installation/configuration

Troubleshooting

Common Issues

ImportError: Make sure you're in the virtual environment and dependencies are installed
Browser not found: Run playwright install chromium for Playwright
Timeout errors: Increase the timeout value in config.yaml
Parsing errors: Check and update CSS selectors for the current website structure

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

License

This project is provided as-is for educational purposes. Please ensure you comply with all applicable laws and website terms of service when using this tool.

Disclaimer

This scraper is a tool for data collection and should be used responsibly. The authors are not responsible for misuse or any violations of terms of service. Always verify that your use case complies with the target website's policies and applicable laws.

Future Enhancements

Database integration for storing scraped data
Scheduler for periodic scraping
Price trend analysis
Multi-server comparison
Rate limiting configuration
Proxy support
User authentication handling
API endpoint creation for scraped data

Support

For issues, questions, or contributions, please open an issue on GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
examples		examples
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
ACQUISITION_GUIDE.md		ACQUISITION_GUIDE.md
Makefile		Makefile
README.md		README.md
config.yaml		config.yaml
main.py		main.py
requirements.txt		requirements.txt
setup.py		setup.py
test_demo_workflow.py		test_demo_workflow.py
test_live_scrape.py		test_live_scrape.py
test_parse_html.py		test_parse_html.py

Folders and files

Latest commit

History

Repository files navigation

FFXI Auction House Scraper

Features

Project Structure

Installation

Prerequisites

Setup

Configuration

Usage

Basic Usage

Example: Scraping a Single Item

Example: Scraping Search Results

Customization

Creating a Custom Parser

Creating a Custom Exporter

Output Data Format

JSON Format

CSV Format

Development

Running Tests

Code Formatting

Type Checking

Important Notes

Legal and Ethical Considerations

HTML Selectors

Browser Drivers

Troubleshooting

Common Issues

Contributing

License

Disclaimer

Future Enhancements

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages