# One Piece Network Analysis Project

## Project Goals
The primary goal of this project is to analyze the characters and relationships in the *One Piece* series through a combination of network analysis and sentiment analysis. The project is structured around the following objectives:

1. **Data Collection and Cleaning**  
   - Extract content related to *One Piece* characters from the official *One Piece* Wiki.  
   - Process and clean the data to remove unnecessary HTML tags, creating plain text files for each character.

2. **Character Filtering**  
   - Implement a filtering mechanism using a custom "score" metric to identify the most important characters in the series.  
   - Ensure a manageable dataset that focuses on relevant entities for subsequent analysis.

3. **Network Construction and Analysis**  
   - Develop internal networks representing relationships within specific story arcs of *One Piece*.  
   - Build external networks to track changes and dynamics across different arcs, enabling temporal analysis of the series' narrative structure.  
   - Create a complete network to visualize and analyze the overarching structure of *One Piece*.

4. **Community Detection and Advanced Metrics**  
   - Partition the network into communities to understand clusters of characters and their interactions.  
   - Perform TF-IDF analyse to quantify the roles and significance of these communities within the series.

5. **Sentiment Analysis**  
   - Perform sentiment analysis on key characters to study their emotional tone and its evolution throughout the series.

6. **Visualization and Reporting**  
   - Generate visualizations of the networks and analysis results for better understanding and presentation.  
   - Include figures and charts in the project report to illustrate findings effectively.

## Project Structure

- **`onepiece`**: Contains original character data in `.txt` files, extracted in raw HTML format.

- **`onepiece_cleaned`**: Cleaned text files for each character, with HTML tags removed.
- **`networks`**: Includes the main network data:
  - `One_Piece.gexf`: Complete network of all characters and their relationships.
  - `one_piece_cummulative`: Sub-networks for temporal evolution analysis of the series.
- **`jupyter_notebooks`**: Source code in Jupyter Notebook format:
  - `1_reading_cleaning_content.ipynb`: Extracts and cleans character data.
  - `1.1_reading_content_aux.ipynb`: Alternate data processing script.
  - `2_filtering_characters.ipynb`: Filters important characters using a "score" metric.
  - `3_inner_network.ipynb`: Analyzes internal networks in major story arcs.
  - `4_external_network.ipynb`: Creates networks across arcs to study dynamics.
  - `5_complete_network.ipynb`: Builds the complete network for the series.
  - `6_community_partition.ipynb`: Community detection and advanced metrics analysis.
  - `7_sentiment_analysis.ipynb`: Sentiment analysis of key characters.
- **`json_files`**: Intermediate data in JSON format for faster network creation.
- **`Images`**: Visualizations and figures for project reports.


