# One Piece Character Episodes Scraper

This notebook scrapes character appearance data from One Piece episodes using the One Piece Fandom Wiki.

## Overview
- **Source**: One Piece Fandom Wiki
- **Target**: Character appearances per episode
- **Output**: CSV file with character-episode mapping
- **Current Episode Range**: 1 - 1141+ episodes

## 1. Import Required Libraries

Import all necessary libraries for web scraping, data processing, and file operations.

In [19]:
import requests
from bs4 import BeautifulSoup
import time
from collections import defaultdict
import csv

## 2. Configuration and Initialization

Set up the main variables and configuration for the scraping process.

In [20]:
character_episodes = defaultdict(list)

# Scraping configuration
base_url = "https://onepiece.fandom.com/wiki/Episode_"
max_episodes = 1141  # Total available episodes (update as needed)

# HTTP headers to avoid being blocked
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
}

## 3. Character Extraction Function

Define the main function to extract characters from individual episode pages.

In [21]:
def get_characters_from_episode(episode_number):
    """
    Extract characters from a specific One Piece episode page.
    
    Args:
        episode_number (int): The episode number to scrape
    
    Returns:
        list: List of character names that appear in the episode
    """
    url = f"{base_url}{episode_number}"
    
    try:
        response = requests.get(url, headers=headers)
        response.raise_for_status()
        soup = BeautifulSoup(response.text, 'html.parser')


        span = soup.find("span", id="Characters_in_Order_of_Appearance")
        if not span:
            print(f"No 'Characters in Order of Appearance' section found for Episode {episode_number}")
            return []
            
        character_section = span.find_parent("h2")
        if not character_section:
            print(f"No character section found for Episode {episode_number}")
            return []
        
        character_list = character_section.find_next('ul')
        if not character_list:
            print(f"No character list found for Episode {episode_number}")
            return []

        # Extract character names
        characters = []
        for li in character_list.find_all('li'):
            character_name = li.find('a')
            
            # Try to get the character name from the link title
            if character_name and character_name.get('title'):
                name = character_name.get('title')
            else:
                # Fallback to plain text
                name = li.get_text(strip=True)
            
            # Clean the name (remove annotations like "(flashback)", "(debut)", etc.)
            if name:
                cleaned_name = name.split('(')[0].strip()
                if cleaned_name:  # Only add non-empty names
                    characters.append(cleaned_name)

        return characters

    except requests.RequestException as e:
        print(f"Network error fetching Episode {episode_number}: {e}")
        return []
    except Exception as e:
        print(f"Parsing error for Episode {episode_number}: {e}")
        return []

print("✅ Character extraction function defined!")

✅ Character extraction function defined!


## 4. Web Scraping Process

Run the main scraping loop to collect character data from all episodes.

**⚠️ Note**: For demonstration purposes, we'll start with episodes 1-10. To scrape all episodes, change `max_episodes` back to 1141 or your desired range.

In [22]:
# For demonstration, limit to first 10 episodes
# Remove this line to scrape all episodes (will take much longer)
demo_max_episodes = 10

print(f"Starting to scrape episodes 1 to {demo_max_episodes}...")
print(f"Progress:")

start_time = time.time()
successful_scrapes = 0
failed_scrapes = 0

for episode in range(1, demo_max_episodes + 1):
    print(f"Scraping Episode {episode}... ", end="")
    
    characters = get_characters_from_episode(episode)
    
    if characters:
        # Add characters to our data structure
        for character in characters:
            if character:  # Skip empty character names
                character_episodes[character].append(episode)
        
        print(f"Found {len(characters)} characters")
        successful_scrapes += 1
    else:
        print(f"No characters found")
        failed_scrapes += 1

    # Rate limiting: be respectful to the server
    time.sleep(1)

# Summary
end_time = time.time()
total_time = end_time - start_time

print(f"\nScraping completed!")
print(f"Total time: {total_time:.2f} seconds")
print(f" Successful scrapes: {successful_scrapes}")
print(f"Failed scrapes: {failed_scrapes}")
print(f"Unique characters found: {len(character_episodes)}")

Starting to scrape episodes 1 to 10...
Progress:
Scraping Episode 1... Found 8 characters
Scraping Episode 2... Found 12 characters
Scraping Episode 3... Found 12 characters
Scraping Episode 4... Found 18 characters
Scraping Episode 5... Found 8 characters
Scraping Episode 6... Found 9 characters
Scraping Episode 7... Found 14 characters
Scraping Episode 8... Found 13 characters
Scraping Episode 9... Found 13 characters
Scraping Episode 10... Found 12 characters

Scraping completed!
Total time: 42.14 seconds
 Successful scrapes: 10
Failed scrapes: 0
Unique characters found: 50


## 5. Data Processing

Clean and organize the scraped data.

In [25]:
print("Processing character data...")

# Sort episodes for each character
for character in character_episodes:
    character_episodes[character].sort()

# Display summary statistics
total_characters = len(character_episodes)

print(f"\nData Summary:")
print(f"Total characters: {total_characters}")

Processing character data...

Data Summary:
Total characters: 50


## 6. Export to CSV

Save the processed data to a CSV file for further analysis.

In [24]:
# Define output filename
csv_file = "onepiece_characters.csv"
try:
    with open(csv_file, "w", encoding="utf-8", newline='') as f:
        writer = csv.writer(f)
        
        # Write header
        writer.writerow(["Character", "Episodes", "Total_Episodes"])
        
        # Write character data (sorted alphabetically)
        for character, episodes in sorted(character_episodes.items()):
            episode_list = ','.join(map(str, episodes))
            total_eps = len(episodes)

            writer.writerow([character, episode_list, total_eps])

except Exception as e:
    print(f"❌ Error saving to CSV: {e}")

## 7. Project Summary

### What We've Accomplished

This notebook successfully:
- Scraped character data from One Piece Fandom Wiki
- Extracted character appearances from episode pages
- Cleaned and processed the character names
- Generated comprehensive statistics
- Exported data to CSV format
