# EU Taxonomy Ontology Generation Pipeline 
This Jupyter notebook is an automated ontology construction tool designed to encode the **EU taxonomy of sustainable activities** into a semantically structured, machine-readable format. The primary function of this tool is to process structured data from Excel spreadsheets, detailing economic activities and their sustainability criteria related to climate adaptation and mitigation, and convert this information into **RDF (Resource Description Framework) triples**. 

In addition to parsing the data, we leverage the **ChatGPT API** to intelligently extract nuanced information from free-text descriptions within the spreadsheets. This involves identifying and extracting references to external resources and any limitations that qualify an activity's sustainability credentials. These extracted elements are then meticulously added to the ontology, enriching the taxonomy with detailed context and constraints that define sustainable practices within each activity. 

The resulting ontology not only categorizes activities within their respective sectors but also provides an intricate web of information that users can navigate to understand the sustainability landscape of the EU's economic activities. This notebook **streamlines the transformation from raw data to a comprehensive ontology**, ensuring a robust and navigable dataset for stakeholders engaged in sustainable finance.

## Install Necessary Libraries

In [3]:
#pip install openai pandas rdflib

## Dataset import

In [1]:
import json
import logging
from openai import OpenAI
import pandas as pd
import re
import rdflib
import ast 


Read the taxonomy xlsx, as downloaded by the official site: https://ec.europa.eu/sustainable-finance-taxonomy/assets/documents/taxonomy.xlsx
Read

In [19]:
# Function to read specific collumns as strings
def read_excel_as_string(file_path, sheet_name):
    return pd.read_excel(file_path, sheet_name=sheet_name, header=0, dtype={'Activity number': str})

# Read the different excel sheets and keep them in separate dataframes
df_adaptation = read_excel_as_string('taxonomy.xlsx', "Climate adaptation")
df_mitigation= read_excel_as_string('taxonomy.xlsx', "Climate mitigation")

df_adaptation.drop(columns="Unnamed: 12", inplace=True)
df_mitigation.drop(columns="Unnamed: 12", inplace=True)

# Path to the manually created ontology schema
schema_ttl_file = 'taxonomy_schema.ttl'

Show first lines to understand the dataset structure

In [20]:
df_adaptation.head()

Unnamed: 0,NACE,Sector,Activity number,Activity,Contribution type,Description,Substantial contribution criteria,DNSH on Climate mitigation,DNSH on Water,DNSH on Circular economy,DNSH on Pollution prevention,DNSH on Biodiversity,Footnotes
0,"F41.1, F41.2, F43",Construction and real estate,7.1,Construction of new buildings,,Development of building projects for residenti...,1. The economic activity has implemented physi...,"The building is not dedicated to extraction, s...","Where installed, except for installations in r...",At least 70 % (by weight) of the non-hazardous...,Building components and materials used in the ...,The activity complies with the criteria set ou...,(596) Future scenarios include Intergovernment...
1,"F41, F43",Construction and real estate,7.2,Renovation of existing buildings,,Construction and civil engineering works or pr...,1. The economic activity has implemented physi...,"The building is not dedicated to extraction, s...",Where installed as part of the renovation work...,At least 70 % (by weight) of the non-hazardous...,Building components and materials used in the ...,N/A.,(613) Future scenarios include Intergovernment...
2,"F42, F43, M71, C16, C17, C22, C23, C25, C27, C...",Construction and real estate,7.3,"Installation, maintenance and repair of energy...",,Individual renovation measures consisting in i...,1. The economic activity has implemented physi...,"The building is not dedicated to extraction, s...",,,Building components and materials comply with ...,,(622) Future scenarios include Intergovernment...
3,"F42, F43, M71, C16, C17, C22, C23, C25, C27, C28",Construction and real estate,7.4,"Installation, maintenance and repair of chargi...",,"Installation, maintenance and repair of chargi...",1. The economic activity has implemented physi...,"The building is not dedicated to extraction, s...",,,,,(627) Future scenarios include Intergovernment...
4,"F42, F43, M71, C16, C17, C22, C23, C25, C27, C28",Construction and real estate,7.5,"Installation, maintenance and repair of instru...",,"Installation, maintenance and repair of instru...",1. The economic activity has implemented physi...,"The building is not dedicated to extraction, s...",,,,,(632) Future scenarios include Intergovernment...


***
## Generate ontology based on the schema
#### Define functions for generation of instances

In [14]:
# Function to generate sub nodes for each activity (line)
def generate_sub_nodes(activity_id, info_type, row):
    ttl_data = ""
    specific_info_id = f'{activity_id}_{info_type}'
    activity_name = row["Activity"].replace('"', '\\"') if pd.notna(row["Activity"]) else "Unnamed Activity"

    ttl_data += f'# node for "{info_type.capitalize()}" specific information\n'
    ttl_data += f'{specific_info_id}\n'
    ttl_data += f'    a sml:Climate{info_type.capitalize()}Info;\n'
    ttl_data += f'    core:prefLabel "{activity_name} ({info_type.capitalize()} Information)"@en;\n'  

    # Handling contribution type, replacing nan with an empty string
    contribution_type = row.get("Contribution type", "")
    if pd.isna(contribution_type) or contribution_type == "":
        contribution_type = ""
    ttl_data += f'    sml:contributionType "{contribution_type}";\n'

    sub_info_fields = {
        'SubstantialContributionCriteria': 'Substantial contribution criteria',
        'DNSHonClimateMitigation': 'DNSH on Climate mitigation',
        'DNSHonClimateAdaptation': 'DNSH on Climate adaptation',
        'DNSHonWater': 'DNSH on Water',
        'DNSHonCircularEconomy': 'DNSH on Circular economy',
        'DNSHonPollutionPrevention': 'DNSH on Pollution prevention',
        'DNSHonBiodiversity': 'DNSH on Biodiversity',
        'Footnotes': 'Footnotes'
    }

    # Iterate and create nodes for all fields, even if empty
    for sub_info, column_name in sub_info_fields.items():
        # Special handling to exclude irrelevant fields
        if (info_type == "adaptation" and sub_info == "DNSHonClimateAdaptation") or \
           (info_type == "mitigation" and sub_info == "DNSHonClimateMitigation"):
            continue
        
        field_value = row.get(column_name, "")
        sub_info_id = f'{specific_info_id}_{sub_info}'
        ttl_data += f'    sml:has{sub_info} {sub_info_id};\n'  # Link sub-info to the main info node
    
    # ttl_data += f'    core:prefLabel "{activity_name} ({info_type.capitalize()})"@en;\n'  
    ttl_data += f'    sml:isPartOf {activity_id}.\n\n'  # Linking back to the main activity
    
    # Iterate and create nodes for all fields, even if empty
    for sub_info, column_name in sub_info_fields.items():
        # Special handling to exclude irrelevant fields
        if (info_type == "adaptation" and sub_info == "DNSHonClimateAdaptation") or \
           (info_type == "mitigation" and sub_info == "DNSHonClimateMitigation"):
            continue
        
        field_value = row.get(column_name, "")
        if pd.isna(field_value) or field_value == "":
            field_value = ""  # Assign an empty string if the field is NaN or empty
        sub_info_id = f'{specific_info_id}_{sub_info}'
        readable_sub_info = sub_info.replace("DNSHon", "DNSH on ")
        ttl_data += f'{sub_info_id} a sml:{sub_info};\n'
        ttl_data += f'    core:prefLabel "{activity_name} ({info_type.capitalize()} - {readable_sub_info})"@en;\n'  
        ttl_data += f'    sml:description """{field_value}"""@en;\n'
        ttl_data += f'    sml:isPartOf {specific_info_id}.\n\n'

    return ttl_data


# Function to generate ontology instances
def generate_ttl(df_adaptation, df_mitigation):
    ttl_data = ""
    activity_ids = set()

    # Combine unique activity numbers from both dataframes
    unique_activity_numbers = pd.concat([df_adaptation['Activity number'], df_mitigation['Activity number']]).unique()

    # Process each unique activity number
    for activity_number in unique_activity_numbers:
        activity_id = f't:{activity_number}'  # Assuming activity_number is a unique identifier

        # Generate sub-nodes for adaptation if present
        if activity_number in df_adaptation['Activity number'].values:
            row = df_adaptation[df_adaptation['Activity number'] == activity_number].iloc[0]
            ttl_data += generate_sub_nodes(activity_id, "adaptation", row)
        
        # Generate sub-nodes for mitigation if present
        if activity_number in df_mitigation['Activity number'].values:
            row = df_mitigation[df_mitigation['Activity number'] == activity_number].iloc[0]
            ttl_data += generate_sub_nodes(activity_id, "mitigation", row)

        # Only add basic info if it hasn't been added before
        if activity_id not in activity_ids:
            # Assume the activity label is the same in both dataframes
            activity_label = row["Activity"].replace('"', '\\"') if pd.notna(row["Activity"]) else "Unnamed Activity"
            sector_number = str(activity_number).split(".")[0] if pd.notna(activity_number) else ""
            # Check if NACE codes exist and are not NaN
            nace_codes = row["NACE"].split(", ") if pd.notna(row["NACE"]) and isinstance(row["NACE"], str) else None

            ttl_data += f'#------------------basic info of {activity_id}\n'
            ttl_data += f'{activity_id}\n'
            ttl_data += f'    a sml:Activity;\n'
            ttl_data += f'    sml:belongsToSector t:{sector_number};\n'
            ttl_data += f'    sml:hasNaceCode {" ,".join(["nace:" + n.strip() for n in nace_codes])};\n' if nace_codes else ''
            ttl_data += f'    core:definition  """{row.get("Description", "")}"""@en;\n'
            ttl_data += f'    core:prefLabel "{activity_label}"@en;\n'

            # Check if there is adaptation or mitigation information
            has_adaptation = activity_number in df_adaptation['Activity number'].values
            has_mitigation = activity_number in df_mitigation['Activity number'].values
            parts = []
            if has_adaptation:
                parts.append(f'{activity_id}_adaptation')
            if has_mitigation:
                parts.append(f'{activity_id}_mitigation')

            if parts:
                ttl_data += f'    sml:hasPart {", ".join(parts)};\n'
            ttl_data += f'    sml:activityNumber "{activity_number}"^^xsd:string.\n\n'

            activity_ids.add(activity_id)

    return ttl_data

def add_sector_activity_relationships(g, df_adaptation, df_mitigation):
    # Iterate over activities and add the inverse relationship to the graph
    for df in [df_adaptation, df_mitigation]:
        for index, row in df.iterrows():
            activity_number = row["Activity number"]
            sector_number = str(activity_number).split(".")[0]
            if pd.notna(activity_number) and pd.notna(sector_number):
                activity_uri = rdflib.URIRef(f'https://ec.europa.eu/sustainable-finance-taxonomy/assets/taxonomy/{activity_number}')
                sector_uri = rdflib.URIRef(f'https://ec.europa.eu/sustainable-finance-taxonomy/assets/taxonomy/{sector_number}')
                
                # Add the inverse relationship
                g.add((sector_uri, SML.hasActivity, activity_uri))

    return g


#### Call the functions to create the instances and save them to 'new_instances.ttl'.

In [21]:
# Generate the TTL instances
combined_ttl = generate_ttl(df_adaptation, df_mitigation)

# Save to a TTL file
ttl_filename = 'new_instances.ttl'
with open(ttl_filename, 'w') as file:
    file.write(combined_ttl)

print(f'TTL data has been written to {ttl_filename}')

TTL data has been written to new_instances.ttl


#### Combine new instances with existing schema to create ontology and save to 'taxonomy.ttl'.

In [22]:
# Path to the TTL file with the instances you've created
instances_ttl_file = 'new_instances.ttl'

# Path for the new TTL file that will contain both schema and instances
combined_ttl_file = 'taxonomy.ttl'

# Read the existing schema TTL file
with open(schema_ttl_file, 'r') as file:
    schema_content = file.read()

# Read the generated instances TTL file
with open(instances_ttl_file, 'r') as file:
    instances_content = file.read()

# Combine the contents
combined_content = schema_content + '\n\n' + instances_content

# Write the combined content back to a new TTL file
with open(combined_ttl_file, 'w') as file:
    file.write(combined_content)

# Add sector-activity relations
g = rdflib.Graph()
SML = rdflib.Namespace("https://w3id.org/def/smls-owl#")
g.bind("sml", SML)

# Parse the existing ontology
g.parse("taxonomy.ttl", format="turtle")

# Add inverse relationships for sectors and activities
g = add_sector_activity_relationships(g, df_adaptation, df_mitigation)

# Save the updated graph
g.serialize("taxonomy.ttl", format="turtle")

print(f'Combined TTL data has been written to {combined_ttl_file}')


Combined TTL data has been written to taxonomy.ttl


***
## Automatic information extraction from free text using ChatGPT API 

In [4]:
# API Key to ChatGPT
client = OpenAI(api_key="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")

#### Define system and user prompts to be used for each activity by the ChatGPT 4 LLM

**System prompt:** 

*Assume you are an expert in ontology creation. Your task is to extract from the free text information for an ontology of the EU taxonomy for sustainability reporting, focused on the construction sector. Reporting on the sustainability of business activities will become mandatory for all (large) corporations in Europe, in compliance with the CSRD and the EU taxonomy of sustainable activities. Your task is to help us to understand the ontology and explain it in a list of imporant concepts.*

In [5]:
system_prompt = """Assume you are an expert in ontology creation. Your task is to extract from the free text information for an ontology of the EU taxonomy for sustainability reporting. Reporting on the sustainability of business activities will become mandatory for all (large) corporations in Europe, in compliance with the CSRD and the EU taxonomy of sustainable activities. Your task is to help us to understand the ontology and extract the external resources mentioned, as well as any limitations that you may find defined. You will put this information to a json with keys leading back to where you found the information. """

**User prompt:** 

Please provide a JSON object with details about external resources and limitations for the given activity. The JSON object should only have four keys:

1. "activity_name": Contain the name of the activity.
2. "node": Include the activity number followed by a suffix indicating the type (e.g., "_mitigation" or "_adaptation") and the specific column where information was found. If data is from the "Description" field, use only the activity number.
3. "list_of_external_resources": List all specifically named external resources mentioned, including URLs if available.
4. "has_limitations": List any specific limitations or conditions that apply to the activity, focusing on those with clear specifications or numerical details.
Exclude generic or non-specific sources like "scientific peer-reviewed publications" or "open source." Numbers in parentheses should reference the corresponding segment in the "Footnotes" column and not be listed as external links.

The JSON should be formatted for direct use with Python's json.loads function, without any additional text or commentary. Include known appendices or documents directly within the JSON. Here is the info from this row and an example template for your reference (replace $ with curly brackets):


In [6]:
user_prompt_template = """ 
Please provide a JSON object with details about external resources and limitations for the given activity. The JSON object should only have four keys:

1. "activity_name": Contain the name of the activity.
2. "node": Include the activity number followed by a suffix indicating the type (e.g., "_mitigation" or "_adaptation") and the specific column where information was found. If data is from the "Description" field, use only the activity number.
3. "list_of_external_resources": List all specifically named external resources mentioned, including URLs if available.
4. "has_limitations": List any specific limitations or conditions that apply to the activity, focusing on those with clear specifications or numerical details.
Exclude generic or non-specific sources like "scientific peer-reviewed publications" or "open source." Numbers in parentheses should reference the corresponding segment in the "Footnotes" column and not be listed as external links.

The JSON should be formatted for direct use with Python's json.loads function, without any additional text or commentary. Include known appendices or documents directly within the JSON. Here is the info from this row and an example template for your reference (replace $ with curly brackets):

# ACTIVITY NAME:
{Activity}

# ACTIVITY NUMBER
{Activity_number}

# DESCRIPTION:
{Description}

# SECTOR:
{Sector}

# CONTRIBUTION TYPE:
{Contribution_type}

# SUBSTANTIAL CONTRIBUTION CRITERIA:
{Criteria}

# {DNSH_specific_field}
{DNSH_specific_value}

# DNSH on Water
{DNSH_water}   

# DNSH on Circular Economy
{DNSH_circular}

# DNSH on Pollution Prevention
{DNSH_pollution}

# DNSH on Biodiversity
{DNSH_biodiversity}

# FOOTNOTES
{Footnotes}


#EXAMPLE

$
  "activity_name": "Construction of new buildings",
  "{Activity_number}": $
    "list_of_external_resources": [
    "https://susproc.jrc.ec.europa.eu/product-bureau/product-groups/412/documents",
      "https://ec.europa.eu/growth/content/eu-construction-and-demolition-waste-protocol-0_en",
      "https://www.iso.org/standard/69370.html",
      "Decision 2000/532/EC",
      "EU Construction and Demolition Waste Management Protocol",
      "ISO 20887"
    ],
    "has_limitations": [
    ['should be done like this', 'should comply to that']
    ]
  $,
  "{Activity_number}{type_suffix}_SubstantialContributionCriteria": $
    "list_of_external_resources": [
    ],
    "has_limitations": [

    ]
  $,
  "{Activity_number}{type_suffix}_{DNSH_specific_field}": $
    "list_of_external_resources": [

    ],
    "has_limitations": []
  $,
  "{Activity_number}{type_suffix}_DNSHonWater": $
    "list_of_external_resources": [
    ],
    "has_limitations": []
  $,
  "{Activity_number}{type_suffix}_DNSHonCircularEconomy": $
    "list_of_external_resources": [
    
    ],
    "has_limitations": [
    ]
  $,
  "{Activity_number}{type_suffix}_DNSHonPollutionPrevention": $
    "list_of_external_resources": [
     
    ],
    "has_limitations": [
    ]
  $,
  "{Activity_number}{type_suffix}_DNSHonBiodiversity": $
    "list_of_external_resources": [
  
    ],
    "has_limitations": [
    ]
  $,
  "{Activity_number}{type_suffix}_Footnotes": $
    "list_of_external_resources": [
      "https://susproc.jrc.ec.europa.eu/product-bureau/product-groups/412/documents",
      "https://ec.europa.eu/growth/content/eu-construction-and-demolition-waste-protocol-0_en",
      "https://www.iso.org/standard/69370.html",
      "Decision 2000/532/EC",
      "EU Construction and Demolition Waste Management Protocol",
      "ISO 20887"
    ],
    "has_limitations": []
  $
$"""

#### Define functions for extracting information using ChatGPT

In [7]:
# Function to remove markdown from GPT response
def remove_markdown(json_with_markdown):
    # Use regular expression to remove Markdown formatting
    plain_json = re.sub(r'```json|```', '', json_with_markdown, flags=re.DOTALL)
    
    # Remove leading/trailing whitespaces
    plain_json = plain_json.strip()

    return plain_json

In [8]:
# Function that extracts information for one row
def extract_information(text, data_type, model="gpt-4-1106-preview"):
     # Determine the appropriate DNSH field based on data type
    DNSH_field_name, DNSH_field_value = ("DNSHonClimateMitigation", text.get("DNSH on Climate mitigation", "")) \
        if data_type == "adaptation" else \
        ("DNSHonClimateAdaptation", text.get("DNSH on Climate adaptation", ""))


    # Format the user prompt with dynamic DNSH field and type suffix
    user_prompt = user_prompt_template.format(
        Activity=text["Activity"],
        Activity_number=text["Activity number"],
        Description=text["Description"],
        Sector=text["Sector"],
        Contribution_type=text["Contribution type"],
        Criteria=text["Substantial contribution criteria"],
        Footnotes=text["Footnotes"],
        DNSH_specific_field= DNSH_field_name,
        DNSH_specific_value= DNSH_field_value,
        DNSH_water=text["DNSH on Water"],
        DNSH_circular=text["DNSH on Circular economy"],
        DNSH_pollution=text["DNSH on Pollution prevention"],
        DNSH_biodiversity=text["DNSH on Biodiversity"],
        type_suffix="_adaptation" if data_type == "adaptation" else "_mitigation"
    )

    completion = client.chat.completions.create(
        model=model,
        temperature=0,
        messages=[
            { 
                "role": "system",
                "content": system_prompt
            },
            {
                "role": "user",
                "content": user_prompt
            }
        ]
    )

    return completion.choices[0].message.content


# Function to process DataFrame and extracts information for all activities of dataframe
def process_dataframe(df, data_type, target_sector):
    df_columns = ["activity_name", "node", "list_of_external_resources", "has_limitations"]
    df_extracted = pd.DataFrame(columns=df_columns)
    json_objects = []

    # Filter the dataframe for the target sector
    filtered_df = df[df['Sector'] == target_sector]

    for index, row in filtered_df.iterrows():
        try:
            extracted_row_info = extract_information(row, data_type)
            print(f"Extracted info for row {index}:", extracted_row_info)  # Debug print
            
            plain_json_row = remove_markdown(extracted_row_info)
            print(f"Plain JSON for row {index}:", plain_json_row)  # Debug print
            
            json_dict = json.loads(plain_json_row)
            json_objects.append(json_dict)

            rows = []
            for node, details in json_dict.items():
                if node == "activity_name":
                    activity_name = details
                else:
                    row = {
                        "activity_name": activity_name,
                        "node": node,
                        "list_of_external_resources": details.get("list_of_external_resources", []),
                        "has_limitations": details.get("has_limitations", [])
                    }
                    rows.append(row)
            
            df_extracted = pd.concat([df_extracted, pd.DataFrame(rows)], ignore_index=True)
        except Exception as e:
            logging.error(f"Error processing row {index}: {e}\nRow content: {row}")


    return df_extracted


#### Extract information from a specific sector and save to csv
Copy sector name from:
- Forestry (4 activities)
- Environmental protection and restoration activities (1 activity)
- Manufacturing (17 activities)
- Energy (31 activities)
- Water supply, sewerage, waste management and remediation (12 activities)
- Transport (17 activities)
- Construction and real estate (7 activities)
- Information and communication (3 activities)
- Professional, scientific and technical activities (3 activities)
- Financial and insurance activities (2 activities)
- Education (1 activity)
- Human health and social work activities (1 activity)
- Arts, entertainment and recreation (3 activities)


**!!! DONT ACTUALLY RUN THIS BLOCK BECAUSE IT TAKES AROUND 20 MINUTES!!!**

The extracted_adaptation.csv and extracted_mitigation.csv are provided together with this code.

In [9]:
# Specify the target sector
target_sector = "Construction and real estate"

# Use the function for adaptation and mitigation with sector filtering
df_extracted_adaptation = process_dataframe(df_adaptation, "adaptation", target_sector)
df_extracted_adaptation.to_csv("extracted_adaptation_construction.csv")
print("Extracted info from adaptation!")

df_extracted_mitigation = process_dataframe(df_mitigation, "mitigation", target_sector)
df_extracted_mitigation.to_csv("extracted_mitigation_construction.csv")
print("Extracted info from mitigation!")


Extracted info for row 0: ```json
{
  "activity_name": "Forest management",
  "1.3": {
    "list_of_external_resources": [
      "http://www.fao.org/3/I8661EN/i8661en.pdf",
      "Regulation (EC) No 1893/2006"
    ],
    "has_limitations": [
      "Forest management assumes no change in land use and occurs on land matching the definition of forest as set out in national law, or where not available, in accordance with the FAO definition of forest.",
      "The economic activities in this category are limited to NACE II 02.10, 02.20, 02.30, and 02.40."
    ]
  },
  "1.3_adaptation_SubstantialContributionCriteria": {
    "list_of_external_resources": [
      "https://www.ipcc.ch/reports/",
      "https://ec.europa.eu/info/research-and-innovation/research-area/environment/nature-based-solutions_en/",
      "COM/2013/0249 final"
    ],
    "has_limitations": [
      "The climate risk and vulnerability assessment is proportionate to the scale of the activity and its expected lifespan.",
    

#### Add extracted information back to the ontology

In [10]:
# Define namepsace
SML = rdflib.Namespace("https://w3id.org/def/smls-owl#")

def add_properties_to_node(graph, node, external_resources, limitations):
    node_uri = rdflib.URIRef(f"https://ec.europa.eu/sustainable-finance-taxonomy/assets/taxonomy/{node}")
    ext_res_prop = SML.refersExternalResource
    lim_prop = SML.hasLimitation

    # Handle external resources
    if external_resources:
        # Safely evaluate string representation of a list, or split a string
        resources = ast.literal_eval(external_resources) if external_resources.startswith("[") else external_resources.split(", ")
        for res in resources:
            graph.add((node_uri, ext_res_prop, rdflib.Literal(res.strip())))

    # Handle limitations
    if limitations:
        # Safely evaluate string representation of a list, or split a string
        limits = ast.literal_eval(limitations) if limitations.startswith("[") else limitations.split(", ")
        for lim in limits:
            graph.add((node_uri, lim_prop, rdflib.Literal(lim.strip())))


In [23]:
# Add info to ttl file
df_extracted_mitigation = pd.read_csv('extracted_mitigation_construction.csv')
df_extracted_adaptation = pd.read_csv('extracted_adaptation_construction.csv')

# Define namespaces and properties
g = rdflib.Graph()
g.bind("sml", SML)

# Parse the existing ontology
g.parse("taxonomy.ttl", format="turtle")

# Update graph with mitigation data
for index, row in df_extracted_mitigation.iterrows():
    add_properties_to_node(g, row['node'], row['list_of_external_resources'], row['has_limitations'])

# Update graph with adaptation data
for index, row in df_extracted_adaptation.iterrows():
    add_properties_to_node(g, row['node'], row['list_of_external_resources'], row['has_limitations'])

# Save the updated graph
g.serialize("final_taxonomy_with_7.ttl", format="turtle")

print("Added info the the ontology and saved to file final_taxonomy_with_7.ttl.")

Added info the the ontology.
