# <center>Objective 2

## How do the organisational structures of global ocean governance bodies shape the management of the ocean economy, and what systemic characteristics contribute to or impede their efficacy?

<p>The task involves analyzing the data presented in the "Ocean Governance and Ocean Economy Governance Matrix_IGOs” file.</p>
<p>The objective is to analyze the relationship between various intergovernmental organizations (IGOs) based on their distinct attributes. This involves examining the attributes: spatial and subject matter jurisdictions of these governmental institutions, objectives, strategies, inter-institutional relationships.</p>

### Libraries

In [1]:
# Set up and tools
import re
import numpy as np
import nltk
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
import pandas as pd
from nltk.tokenize import sent_tokenize
from sklearn.cluster import KMeans
from sklearn.feature_extraction.text import TfidfVectorizer
import spacy

### Packages

In [2]:
# Ensure necessary NLTK resources are downloaded
nltk.download('punkt')
nltk.download('wordnet')
nltk.download('stopwords')
nltk.download('punkt_tab')

[nltk_data] Downloading package punkt to /home/milo/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package wordnet to /home/milo/nltk_data...
[nltk_data] Downloading package stopwords to /home/milo/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package punkt_tab to /home/milo/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


True

### Data

In [3]:
# File path
file_path = "../raw_data/Ocean Governance and ocean economy governance matrix_IGOs.xlsx"
# Load the dataset
df = pd.read_excel(file_path, header=[0,1])

### Data Cleaning

In [4]:
df.head(2)

Unnamed: 0_level_0,Institutions,Year,Scale,Jurisdictional Scope,Jurisdictional Scope,Source of Jurisdiction,Defined Objectives,Strategies,Defined inter-institutional Relationship,Practical- Coordination,Practical- Coordination,Practical- Coordination,Practical- Coordination,Practical- Coordination,Practical- Coordination,Practical- Coordination,Practical- Coordination,Practical- Coordination,Practical- Coordination
Unnamed: 0_level_1,Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Spatial Jurisdiction,Subject Matter Jurisdiction,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Vertical,Horizontal,Horizontal.1,Horizontal.2,Horizontal.3,Horizontal.4,Horizontal.5,Horizontal.6,Horizontal.7,Horizontal.8
0,Intergovernmental Oceanographic Commission (IOC),1960.0,Global,IOC jurisdiction is global delineated by the b...,The IOC's subject matter jurisdiction encompas...,The IOC’s authority is derived from its statut...,The objectives of the Intergovernmental Oceano...,IOC implements its objectives through series o...,IOC collaborates with UN specialized agencies ...,Vertical coordination within the IOC involves ...,Horizontal coordination within the IOC encompa...,,,,,,,,
1,Food and Agriculture Organization of the Unite...,1945.0,Global,The FAO’s jurisdiction spans a vast array of m...,"FAO’s remit includes nutrition, food and agric...",The FAO’s jurisdiction is established through ...,"As stated in Article 1 of the Constitution, FA...",The FAO executes its objectives through a ser...,"As stated in its constitution, the FAO maintai...",The FAO’s vertical coordination involves colla...,Horizontal coordination within the FAO involve...,https://www.jus.uio.no/english/services/librar...,FAO https://www.fao.org/strategic-framework/en,,,,,,


In [5]:
# Function to rename columns based on provided mapping
def rename_columns(df, column_map):
    # Flatten the column names into a single-level
    df.columns = [column_map.get(col, col) for col in df.columns]
    
    return df

# Define the column mappings (old name -> new name)
column_map = {
    ('Institutions', 'Unnamed: 0_level_1'): 'Institution',
    ('Year', 'Unnamed: 1_level_1'): 'Year',
    ('Scale', 'Unnamed: 2_level_1'): 'Scale',
    ('Jurisdictional Scope', 'Spatial Jurisdiction'): 'Spatial Jurisdiction',
    ('Jurisdictional Scope', 'Subject Matter Jurisdiction'): 'Subject Matter Jurisdiction',
    ('Source of Jurisdiction', 'Unnamed: 5_level_1'): 'Source of Jurisdiction',
    ('Defined Objectives', 'Unnamed: 6_level_1'): 'Defined Objectives',
    ('Strategies', 'Unnamed: 7_level_1'): 'Strategies',
    ('Defined inter-institutional Relationship', 'Unnamed: 8_level_1'): 'Inter-institutional Relationship',
    ('Practical- Coordination', 'Vertical'): 'Practical Vertical Coordination',
    ('Practical- Coordination', 'Horizontal'): 'Practical Horizontal Coordination',
    ('Practical- Coordination', 'Horizontal.1'): 'Horizontal Coordination 1',
    ('Practical- Coordination', 'Horizontal.2'): 'Horizontal Coordination 2',
    ('Practical- Coordination', 'Horizontal.3'): 'Horizontal Coordination 3',
    ('Practical- Coordination', 'Horizontal.4'): 'Horizontal Coordination 4',
    ('Practical- Coordination', 'Horizontal.5'): 'Horizontal Coordination 5',
    ('Practical- Coordination', 'Horizontal.6'): 'Horizontal Coordination 6',
    ('Practical- Coordination', 'Horizontal.7'): 'Horizontal Coordination 7',
    ('Practical- Coordination', 'Horizontal.8'): 'Horizontal Coordination 8'
}

# Apply the renaming function
df = rename_columns(df, column_map)

### Data Preprocessing

In [6]:
# Subseting relevant columns
new_df = df.iloc[:, :11]

# Drop row 48 in place(used in search)
new_df.drop(48, axis=0, inplace=True)

df = new_df.copy()

In [7]:
df.columns

Index(['Institution', 'Year', 'Scale', 'Spatial Jurisdiction',
       'Subject Matter Jurisdiction', 'Source of Jurisdiction',
       'Defined Objectives', 'Strategies', 'Inter-institutional Relationship',
       'Practical Vertical Coordination', 'Practical Horizontal Coordination'],
      dtype='object')

In [8]:
df

Unnamed: 0,Institution,Year,Scale,Spatial Jurisdiction,Subject Matter Jurisdiction,Source of Jurisdiction,Defined Objectives,Strategies,Inter-institutional Relationship,Practical Vertical Coordination,Practical Horizontal Coordination
0,Intergovernmental Oceanographic Commission (IOC),1960.0,Global,IOC jurisdiction is global delineated by the b...,The IOC's subject matter jurisdiction encompas...,The IOC’s authority is derived from its statut...,The objectives of the Intergovernmental Oceano...,IOC implements its objectives through series o...,IOC collaborates with UN specialized agencies ...,Vertical coordination within the IOC involves ...,Horizontal coordination within the IOC encompa...
1,Food and Agriculture Organization of the Unite...,1945.0,Global,The FAO’s jurisdiction spans a vast array of m...,"FAO’s remit includes nutrition, food and agric...",The FAO’s jurisdiction is established through ...,"As stated in Article 1 of the Constitution, FA...",The FAO executes its objectives through a ser...,"As stated in its constitution, the FAO maintai...",The FAO’s vertical coordination involves colla...,Horizontal coordination within the FAO involve...
2,Convention on the Intergovernmental Maritime C...,1948.0,Global,The IMO’s authority spans a global geographica...,The IMO's jurisdiction encompasses a comprehen...,The IMO's jurisdiction is established by the C...,"According to Part I, Article 1 of the Internat...",IMO implements its objectives and mandates thr...,The IMO collaborates with a diverse array of o...,Vertical coordination within IMO involves coll...,Horizontal coordination within the IMO involve...
3,Division for Ocean Affairs and the Law of the ...,1992.0,Global,UN DALOS does not have authority over any spec...,The UN DOALOS's mandate includes providing inf...,DOALOS derives its mandate from the United Nat...,According to the Secretary-General’s bulletin ...,. DOALOS) executes its objectives through a mu...,DOALOS collaborates with key organizations to ...,DOALOS engages in vertical coordination with v...,Horizontal coordination within DOALOS involves...
4,Climate Change Secretariat,1992.0,Global,"The UNFCCC Secretariat operates globally, supp...",The UNFCCC Secretariat is responsible for faci...,The UNFCCC Secretariat derives its mandate fro...,The secretariat's duties under the Convention ...,The UNFCCC Secretariat executes its objectives...,The UNFCCC Secretariat maintains collaborative...,Vertical coordination within UNFCCC involves c...,Horizontal coordination within the UNFCCC invo...
5,International Seabed Authority,1994.0,Global,The international seabed area – the part under...,The ISA is responsible for administering the i...,ISA derives its mandate from the United Nation...,"According to Part XI, Section 4 of the United ...",The International Seabed Authority (ISA) fulfi...,The ISA cooperates with various organizations ...,Vertical coordination within ISA involves coop...,Horizontal coordination within the ISA involve...
6,United Nations Environment Programme (UNEP),1972.0,Global,"The UNEP Secretariat’s authority is global, co...",UNEP is responsible for coordinating the envir...,UNEP derives its mandate from the United Natio...,According to the Governing Council decision 2/...,UNEP fulfills its objectives through diverse s...,UNEP collaborates with a wide range of organiz...,Vertical coordination within UNEP involves col...,Horizontal coordination within UNEP involves c...
7,United Nations Development Programme (UNDP),1965.0,Global,"UNDP's jurisdiction is global, operating in mo...",UNDP’s work is concentrated in three focus are...,UNDP derives its mandate from the United Natio...,The objectives of UNDP as defined include: to ...,The UNDP executes its objectives via several s...,UNDP collaborates with various partners to adv...,Vertical coordination within UNDP involves wor...,Horizontal coordination within UNDP involves c...
8,United Nations Conference on Trade and Develop...,1964.0,Global,The UNCTAD Secretariat’s authority is not limi...,The UNCTAD Secretariat is responsible for prom...,UNCTAD derives its mandate from the United Nat...,As per the General Assembly resolution 1995 (...,The UNCTAD Secretariat implements its objectiv...,UNCTAD collaborates with various organizations...,Vertical coordination within UNCTAD involves c...,Horizontal coordination within UNCTAD involves...
9,United Nations Industrial Development Organiza...,1966.0,Global,"UNIDO’s jurisdiction is global, encompassing t...",UNIDO is responsible for the industrial develo...,UNIDO derives its mandate from its own Constit...,According to Chapter I of the Constitution of ...,UNIDO executes its objectives via several stra...,UNIDO collaborates with various organizations ...,Vertical coordination within UNIDO involves co...,Horizontal coordination within UNIDO involves ...


## Summarization

### Step 1: Extracting URLs and References for Summarization

To summarize the text under each dimension of the Intergovernmental Organizations (IGOs), I started by extracting all URLs and references to relevant documents mentioned within the text. This allowed me to remove the URLs from the content, ensuring that the summary would not lose essential context provided by the references.

I utilized Python for this task and created a column in the dataset specifically for storing the extracted links and referenced documents. The extraction process focused on identifying and isolating web links (URLs) and legal or institutional references, such as treaties, conventions, and agreements, that were crucial to the understanding of the text but were not necessary for the summary itself.

In [9]:
import re

# Function to extract relevant references and return cleaned text (without references)
def extract_references_and_clean_text(text):
    # Regular expression to match URLs (http:// and https://)
    url_pattern = r'https?://[^\s]+'
    urls = re.findall(url_pattern, text)
    
    # List of treaties and agreements to match, case-insensitive
    treaties = re.findall(r'\b(UNCLOS|Paris Agreement|UN Convention on the Law of the Sea|Sustainable Development Goals|SDGs|Basel Convention|Minamata Convention|CBD|CITES|CMS|Outer Space Treaty|Outer Space Law|Kyoto Protocol|UNFCCC|Rio Declaration|Convention on Biological Diversity|World Trade Organization|WTO|International Labour Organization|ILO)\b', text, re.IGNORECASE)
    
    # Additional references such as legal documents or organizations
    documents = re.findall(r'\b(Resolution|Treaty|Convention|Agreement|Declaration|Charter|Protocol|Accord)\s+[A-Za-z0-9\-_]+\b', text)
    
    # Combine all references, ensuring there are no duplicates
    all_references = set(urls + treaties + documents)
    
    # Create the cleaned text by removing URLs and references from the text
    cleaned_text = re.sub(url_pattern, '', text)  # Remove URLs
    cleaned_text = re.sub(r'\b(UNCLOS|Paris Agreement|UN Convention on the Law of the Sea|Sustainable Development Goals|SDGs|Basel Convention|Minamata Convention|CBD|CITES|CMS|Outer Space Treaty|Outer Space Law|Kyoto Protocol|UNFCCC|Rio Declaration|Convention on Biological Diversity|World Trade Organization|WTO|International Labour Organization|ILO)\b', '', cleaned_text, flags=re.IGNORECASE)  # Remove specific treaty references
    cleaned_text = re.sub(r'\b(Resolution|Treaty|Convention|Agreement|Declaration|Charter|Protocol|Accord)\s+[A-Za-z0-9\-_]+\b', '', cleaned_text)  # Remove other legal document references
    
    # Return both the cleaned text and the references found
    return cleaned_text.strip(), ', '.join(all_references) if all_references else "None"

By extracting these references in advance, I was able to generate a cleaner, more focused summary of the text while preserving the critical information without including the URLs directly in the summary.

#### Key Points:
* **Objective:** The goal was to extract URLs and references to preserve the content's context while removing links from the summary.
* **Tools Used:** Python, regular expressions, and pandas were used to extract and organize the URLs and document references.
**Outcome:** A new column containing all relevant links and references was added to the dataset, making the subsequent summarization process more effective and focused.

In [10]:
# List of columns to process
columns_to_process = ['Spatial Jurisdiction','Subject Matter Jurisdiction', 'Source of Jurisdiction', 
                      'Defined Objectives', 'Strategies', 'Inter-institutional Relationship', 
                      'Practical Vertical Coordination', 'Practical Horizontal Coordination']

with pd.ExcelWriter(file_path, mode='a') as writer:
    # Loop through the columns and apply the extraction and cleaning function
    for column in columns_to_process:
        # Apply the function to each row of the column and create two new columns: <column> Cleaned and <column> References
        df[[f'{column} Cleaned', f'{column} References']] = df[column].apply(lambda x: pd.Series(extract_references_and_clean_text(x)))
        
        # Create a spatial_df (or similarly named DataFrame) for each column with cleaned data
        sub_df = df[['Institution', column, f'{column} Cleaned', f'{column} References']]
        
        # Save the spatial_df DataFrame to a sheet in the Excel file
        sub_df.to_excel(writer, sheet_name=f'{column}')
    
    print(f"Data has been saved to {file_path}")

ValueError: Sheet 'Spatial Jurisdiction' already exists and if_sheet_exists is set to 'error'.

In [None]:
# # subsetting the cleaned data
# new_df = df[['Institution', 'Year', 'Scale', 'Spatial Jurisdiction Cleaned', 'Subject Matter Jurisdiction Cleaned', 'Source of Jurisdiction Cleaned', 'Defined Objectives Cleaned', 'Strategies Cleaned', 'Inter-institutional Relationship Cleaned', 'Practical Vertical Coordination Cleaned', 'Practical Horizontal Coordination Cleaned']]

In [None]:
# # Save Spatial df sheet
# with pd.ExcelWriter(file_path, mode='a') as writer:
#     new_df.to_excel(writer, sheet_name='Free Urls Raw Data', index=False)