# One Piece Network Analysis Project

## Project Goals
The primary goal of this project is to analyze the characters and relationships in the *One Piece* series through a combination of network analysis and sentiment analysis. The project is structured around the following objectives:

1. **Data Collection and Cleaning**  
   - Extract content related to *One Piece* characters from the official *One Piece* Wiki.  
   - Process and clean the data to remove unnecessary HTML tags, creating plain text files for each character.
   - Corresponding code: **`1_reading_cleaning_content.ipynb`**, **`1.1_reading_content_aux.ipynb`**

2. **Character Filtering**  
   - Implement a filtering mechanism using a custom "score" metric to identify the most important characters in the series.  
   - Ensure a manageable dataset that focuses on relevant entities for subsequent analysis.
   - Corresponding code: **`2_filtering_characters.ipynb`**

3. **Inner Network Construction and Analysis**  
   - Develop inner networks representing relationships within specific story arcs of *One Piece*. 
   - use degree centrality and betweenness centrality to reflect the importance and influence of characters within the chapter. 
   - Corresponding code: **`3_inner_network.ipynb`**

4. **Externnal Network Construction and Analysis**  
   - Build external networks to track changes and dynamics across different arcs, enabling temporal analysis of the series' narrative structure.  
   - Analyzing the degree distribution's change along with the external network growth the fitted model's parameteres changes 
   - Network partition and TF-IDF cross-comparison with arc
   - Sentiment Analysis for important characters and top 10 communities
   - Corresponding code: **`4_external_network.ipynb`**(external network construction and degree distribution), **`5_complete_network.ipynb`**(construct the network contains all the characters), **`6_community_partition.ipynb`**, **`7_sentiment_analysis.ipunb`**

## Project Structure

- **`onepiece`**: Contains original character data in `.txt` files, extracted in raw HTML format.

- **`onepiece_cleaned`**: Cleaned text files for each character, with HTML tags removed.
- **`networks`**: Includes the main network data:
  - `One_Piece.gexf`: Complete network of all characters and their relationships.
  - `one_piece_cummulative`: Sub-networks for temporal evolution analysis of the series.
- **`Codes`**: Source code in Jupyter Notebook format:
  - `1_reading_cleaning_content.ipynb`: Extracts and cleans character data.
  - `1.1_reading_content_aux.ipynb`: Alternate data processing script.
  - `2_filtering_characters.ipynb`: Filters important characters using a "score" metric.
  - `3_inner_network.ipynb`: Analyzes internal networks in major story arcs.
  - `4_external_network.ipynb`: Creates networks across arcs to study dynamics.
  - `5_complete_network.ipynb`: Builds the complete network for the series.
  - `6_community_partition.ipynb`: Community detection and advanced metrics analysis.
  - `7_sentiment_analysis.ipynb`: Sentiment analysis of key characters.
- **`json_files`**: The json file of the correspondence between arc and role is convenient for quick retrieval and network construction 
- **`Images`**: Visualizations and figures for project reports.


