<a href="https://colab.research.google.com/github/BowieSteutel/acc-nlp-firecodes/blob/main/1B_Regulation_Classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



# **Module 1B - Regulation Classification**



*This is a very basic example of regulation classifications. Most classifications are done manually for now*

---
# **Load libraries**

In [50]:
# Import standard libraries
import re # for pattern matching regular expressions
import pandas as pd # for dataframes
from statistics import mean # for averages
import math # for normalization

In [51]:
# Install spaCy (for syntactic complexity)
!pip install spacy --quiet
import spacy

---
# **Load inputs**

In [52]:
# @title Change root directory (update after downloading)

root_directory = "/content/drive/MyDrive/FINAL_CODE_THESIS" #  @param {"type":"string", "placeholder":""}
import sys
from pathlib import Path
if 'google.colab' in sys.modules:
    from google.colab import drive
    drive.mount('/content/drive', force_remount=False)
    %cd {root_directory}

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
/content/drive/MyDrive/FINAL_CODE_THESIS


In [53]:
# @title Load files
# path_hierarchy = "output/BBL_hier_elements.csv" #  @param {type:"string", "placeholder":"path to hierarchy, relative to file_path (csv)"}
path_subset = "output/BBL_subset.csv" #  @param {type:"string", "placeholder":"path to subset, relative to file_path (csv)"}

# df_hierarchy = pd.read_csv(path_hierarchy)#, encoding='windows-1252')
df_subset = pd.read_csv(path_subset)#, encoding='windows-1252')
df_subset.head()

Unnamed: 0,line,type,full_match,label,id,title,URL,code,text_original,text_translated
0,1970,MAINLVL,Artikel 4.38. (stookplaats),Artikel,4.38,(stookplaats),https://wetten.overheid.nl/BWBR0041297/2024-08...,C4_S4_2_P4_2_6_A4_38,Materiaal ter plaatse van of nabij een stookpl...,Material at the location of or near a fireplac...
1,1975,SUBLVLI,1.\t,Lid,1.0,"(schacht, koker of kanaal)",https://wetten.overheid.nl/BWBR0041297/2024-08...,C4_S4_2_P4_2_6_A4_39_SUB1,Materiaal toegepast aan de binnenzijde van een...,"Material applied to the inside of a shaft, a t..."
2,1976,SUBLVLI,2.\t,Lid,2.0,"(schacht, koker of kanaal)",https://wetten.overheid.nl/BWBR0041297/2024-08...,C4_S4_2_P4_2_6_A4_39_SUB2,Het eerste lid is niet van toepassing op:\n\ta...,The first sub-article does not apply to:\n\ta....
3,2030,SUBLVLI,1.\t,Lid,1.0,(binnenoppervlak),https://wetten.overheid.nl/BWBR0041297/2024-08...,C4_S4_2_P4_2_7_A4_43_SUB1,Een zijde van een constructieonderdeel die gre...,One side of a construction part that borders t...
4,2031,SUBLVLI,2.\t,Lid,2.0,(binnenoppervlak),https://wetten.overheid.nl/BWBR0041297/2024-08...,C4_S4_2_P4_2_7_A4_43_SUB2,In afwijking van het eerste lid geldt de eis a...,"Contrary to the first sub-article, the require..."


## Define hierarchy translation dictionary

Manually define translation of hierarchical elements in text (e.g. used for pretranslation)

In [54]:
dict_hier_elements = {"chapter": {"re_untranslated"    : r"hoofdstuk(?:ken)?",
                          "re_translated"       : r"chapter[s]?",
                          "code": "C"},
                      "section": {"re_untranslated"    : r"afdeling(?:en)?",
                                  "re_translated"       : r"section[s]?",
                                  "code": "S"},
                      "paragraph": {"re_untranslated"  : r"(?:paragra(?:af|fen)|§)",
                                  "re_translated"       : r"paragraph[s]?",
                                  "code": "P"},
                      "article": {"re_untranslated"    : r"artikel(?:en|s)?",
                                  "re_translated"       : r"(?<!sub-)article[s]?",
                                  "code": "A"},
                      "sub-article": {"re_untranslated": r"lid",
                                  "re_translated"       : r"sub[-]?article[s]?",
                                  "code": "SUB"},
                      "table": {"re_untranslated"      : r"tabel(?:len)?",
                                  "re_translated"       : r"table[s]?",
                                  "code": "TAB"},
                      "appendix": {"re_untranslated"   : r"bijlage[n]?[s]?",
                                  "re_translated"       : r"appendi(?:x|ces)",
                                  "code": "APPX"},
                      "figure": {"re_untranslated"     : r"figu(?:ur|en)?",
                                  "re_translated"       : r"figure[s]?",
                                  "code": "FIG"}
          }

---
# **Preprocessing correctness**
*Not a classification, but used for secondary filtering as well*

Identified manually

In [55]:
# Add metrics
df_subset['preprocessing_success'] = [1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,1,0,0,0,1,1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,1,1,0,1,1,0,1,1,1,0,0,1,0,1,1,1,0,1,1,1,1,0,1,0,1,1,0,1,1,1,0,1,1,1,1,0,0,1,1,1,1,1]
with pd.option_context("future.no_silent_downcasting", True):
    df_subset['preprocessing_success'] = df_subset['preprocessing_success'].replace({0:False, 1: True})

# Calculate performance
print(f"Correctly preprocessed {sum(df_subset['preprocessing_success'])}/{len(df_subset)} regulations ({round(100*df_subset['preprocessing_success'].mean())}%)")

Correctly preprocessed 65/85 regulations (76%)


---
# **Clarity**

Classified manually

In [56]:
# Add classifications to dataframe
df_subset['clarity'] = [4,1,2,2,2,1,2,1,1,1,1,2,2,2,2,2,2,2,2,2,4,1,1,2,1,1,1,1,1,1,2,4,1,1,4,2,1,4,1,3,2,2,2,1,1,1,3,3,1,2,4,1,2,1,1,2,1,1,1,1,2,1,4,1,4,1,1,1,1,2,1,1,1,2,1,1,1,1,3,3,4,1,1,3,1]
# for i in range(len(df_subset)):
#   df_subset.loc[i, 'clarity'] = 'C'+str(df_subset.loc[i, 'clarity'])

# Check classification frequencies
df_subset['clarity'].value_counts()

Unnamed: 0_level_0,count
clarity,Unnamed: 1_level_1
1,45
2,25
4,9
3,6


---
# **Syntactic complexity**

Classified using RegEx and rules

Multiple classifications possible

In [57]:
# logarithmic (sigmoid) function for normalizing metrics
def sigmoid(n, midpoint, steepness):
    return 1 / (1 + math.exp(-steepness * (n - midpoint)))

# Load spaCy model
nlp = spacy.load("en_core_web_sm")

# Function to extract syntactic complexity metrics
def syntactic_complexity(text):
    # Analyze text
    doc = nlp(text)
    tokens = [token for token in doc if not token.is_punct]

    # Extract metrics
    sentence_length = len(tokens)
    clause_count = sum(1 for token in doc if token.dep_ in ('ccomp', 'xcomp', 'advcl', 'relcl') or token.text == ';')
    clause_distances = [abs(token.i - token.head.i) for token in doc if token.head != token and token.dep_ in ('ccomp', 'xcomp', 'advcl', 'relcl')]
    avg_clause_distance = mean(clause_distances) if clause_distances else 0
    conjunctions = sum(1 for token in doc if token.dep_ in ('cc', 'mark') or token.pos_ == 'CC' or token.text == ';')

    # Calculate individual scores
    score_sentence_length = sigmoid(sentence_length, 50, .05)
    score_clause_count = sigmoid(clause_count, 4, .5)
    score_avg_clause_distance = sigmoid(avg_clause_distance, 8, 1)
    score_conjunctions = sigmoid(conjunctions, 6, 1)

    # Calculated weighted average score
    combined_score = (score_sentence_length*4 + score_clause_count*2 + score_avg_clause_distance*3 + score_conjunctions) / 10

    # Assign classification
    if combined_score < 0.1:
        return 1 #'S1'
    elif combined_score < 0.2:
        return 2 #'S2'
    elif combined_score < 0.4:
        return 3 #'S3'
    elif combined_score < 0.6:
        return 4 #'S4'
    elif combined_score < 0.8:
        return 5 #'S5'
    else:
        return 6 #'S6'

# classify syntactic complexity type based on metrics
syntype = []
for reg in df_subset['text_translated']:
  syntype.append(syntactic_complexity(reg))

# Add classifications to dataframe
df_subset['syntactic_complexity'] = syntype

# Check classification frequencies
df_subset['syntactic_complexity'].value_counts()

Unnamed: 0_level_0,count
syntactic_complexity,Unnamed: 1_level_1
2,25
1,25
4,21
3,9
5,5


---
# **Validation complexity**

Classified manually

In [58]:
# Add classifications to dataframe
df_subset['V1'] = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
df_subset['V2'] = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0]
df_subset['V3'] = [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,1,1]
df_subset['V4'] = [1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,1,1,1,0,1,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,1,1,0,1,0,0,1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,0]
df_subset['V5'] = [1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,1,0,1,1,1,1]
df_subset['V6'] = [0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
df_subset['V7'] = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
df_subset['V8'] = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]

# Check classification frequencies
print("V1:",sum(df_subset['V1']))
print("V2:",sum(df_subset['V2']))
print("V3:",sum(df_subset['V3']))
print("V4:",sum(df_subset['V4']))
print("V5:",sum(df_subset['V5']))
print("V6:",sum(df_subset['V6']))
print("V7:",sum(df_subset['V7']))
print("V8:",sum(df_subset['V8']))

V1: 0
V2: 3
V3: 72
V4: 71
V5: 34
V6: 6
V7: 2
V8: 0


In [59]:
# Calculate combinations of V classifications
v_columns = ['V1', 'V2', 'V3', 'V4', 'V5', 'V6', 'V7', 'V8']
# find classes and join them into a string
def get_combination(row):
    active_classes = [col for col in v_columns if row[col] == 1]
    return ''.join(active_classes) if active_classes else 'None'  # Handle rows with all 0s if needed

# Apply to each row
df_subset['validation_complexity'] = df_subset[v_columns].apply(get_combination, axis=1)

# Count frequencies of each combination
frequency_matrix = df_subset['validation_complexity'] .value_counts()#.sort_index()

# Optional: convert to dictionary if desired
freq_dict = frequency_matrix.to_dict()

# Display
print(frequency_matrix)

validation_complexity
V3V4          29
V3V4V5        22
V4            10
V3             9
V3V4V5V6       4
V3V5           3
V4V5           2
V2             1
V3V4V6         1
V3V4V5V6V7     1
V2V3V5         1
V3V4V7         1
V2V3V4V5       1
Name: count, dtype: int64


In [60]:
#remove V1-V8 columns
df_subset.drop(columns=v_columns, inplace=True)

---
# **Referral complexity**

Classified using RegEx

Prioritizing higher type numbers if multiple

In [61]:
# classify referral complexity type, prioritizing higher type numbers if multiple
reftype = []
for reg in df_subset['text_translated']:
    if re.search(f"\\b{dict_hier_elements['figure']['re_translated']}\\b", reg, re.IGNORECASE):
        reftype.append(6) # R6
    elif re.search("\\b(NEN|NTA|NPR|NVN|ISO|IEC|NEN-EN) ", reg):
        reftype.append(5) # R5
    elif re.search("(\\b{}\\b)".format("\\b|\\b".join(x.get('re_translated') for x in [dict_hier_elements['table'], dict_hier_elements['appendix']])), reg, re.IGNORECASE):
        reftype.append(4) # R4
    elif re.search("(\\b{}\\b)".format("\\b|\\b".join(x.get('re_translated') for x in [dict_hier_elements['chapter'], dict_hier_elements['section'], dict_hier_elements['paragraph'], dict_hier_elements['article']])), reg, re.IGNORECASE):
        reftype.append(3) # R3
    elif re.search(f"\\b{dict_hier_elements['sub-article']['re_translated']}\\b", reg, re.IGNORECASE):
        reftype.append(2) # R3
    else:
        reftype.append(1) # R3

# Add classifications to dataframe
df_subset['referral_complexity'] = reftype

# Check classification frequencies
df_subset['referral_complexity'].value_counts()

Unnamed: 0_level_0,count
referral_complexity,Unnamed: 1_level_1
5,35
1,23
2,20
3,6
4,1


---
# **Export classified subset**

In [62]:
df_subset

Unnamed: 0,line,type,full_match,label,id,title,URL,code,text_original,text_translated,preprocessing_success,clarity,syntactic_complexity,validation_complexity,referral_complexity
0,1970,MAINLVL,Artikel 4.38. (stookplaats),Artikel,4.38,(stookplaats),https://wetten.overheid.nl/BWBR0041297/2024-08...,C4_S4_2_P4_2_6_A4_38,Materiaal ter plaatse van of nabij een stookpl...,Material at the location of or near a fireplac...,True,4,4,V3V4V5,5
1,1975,SUBLVLI,1.\t,Lid,1.00,"(schacht, koker of kanaal)",https://wetten.overheid.nl/BWBR0041297/2024-08...,C4_S4_2_P4_2_6_A4_39_SUB1,Materiaal toegepast aan de binnenzijde van een...,"Material applied to the inside of a shaft, a t...",True,1,4,V3V4V5,5
2,1976,SUBLVLI,2.\t,Lid,2.00,"(schacht, koker of kanaal)",https://wetten.overheid.nl/BWBR0041297/2024-08...,C4_S4_2_P4_2_6_A4_39_SUB2,Het eerste lid is niet van toepassing op:\n\ta...,The first sub-article does not apply to:\n\ta....,True,2,4,V3V4V5,2
3,2030,SUBLVLI,1.\t,Lid,1.00,(binnenoppervlak),https://wetten.overheid.nl/BWBR0041297/2024-08...,C4_S4_2_P4_2_7_A4_43_SUB1,Een zijde van een constructieonderdeel die gre...,One side of a construction part that borders t...,True,2,2,V3V4V5,5
4,2031,SUBLVLI,2.\t,Lid,2.00,(binnenoppervlak),https://wetten.overheid.nl/BWBR0041297/2024-08...,C4_S4_2_P4_2_7_A4_43_SUB2,In afwijking van het eerste lid geldt de eis a...,"Contrary to the first sub-article, the require...",True,2,1,V3,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
80,2494,SUBLVLI,2.\t,Lid,2.00,(brandklasse buitenoppervlak),https://wetten.overheid.nl/BWBR0041297/2024-08...,C4_S4_2_P4_2_14_A4_92_SUB2,In afwijking van het eerste lid voldoet een de...,"Contrary to the first sub-article, a door, a w...",True,4,2,V3,5
81,2495,SUBLVLI,3.\t,Lid,3.00,(brandklasse buitenoppervlak),https://wetten.overheid.nl/BWBR0041297/2024-08...,C4_S4_2_P4_2_14_A4_92_SUB3,Op ten hoogste 5% van de totale oppervlakte va...,At most 5% of the total area of the constructi...,True,1,4,V3V4V5,2
82,2496,SUBLVLI,4.\t,Lid,4.00,(brandklasse buitenoppervlak),https://wetten.overheid.nl/BWBR0041297/2024-08...,C4_S4_2_P4_2_14_A4_92_SUB4,Het eerste tot en met derde lid zijn niet van ...,The first to third sub-article do not apply to...,True,1,1,V4V5,2
83,2498,SUBLVLI,1.\t,Lid,1.00,(brandklasse dak),https://wetten.overheid.nl/BWBR0041297/2024-08...,C4_S4_2_P4_2_14_A4_93_SUB1,"Een dak van een brandcompartiment is, voor zov...","A roof of a fire compartment is, insofar as th...",True,3,4,V3V4V5,5


In [63]:
df_subset.to_csv('output/BBL_subset_classified.csv', index=False)

---
# **Secondary data filtering**

Based on the classifications, regulations that are not within the scope of the conceptual framework are filtered out

In [64]:
# Define boolean masks
bm_preprocessing = df_subset['preprocessing_success'] == True
# bm_clarity = df_subset['clarity'] == 1
bm_clarity = df_subset['clarity'] != None # include all for demonstration purposes
bm_syntactic = df_subset['syntactic_complexity'] != None # include all for demonstration purposes
# bm_syntactic = df_subset['syntactic_complexity'] == 1
bm_validation = df_subset['validation_complexity'].apply(lambda x: all(v not in x for v in {'V1', 'V5', 'V6', 'V7', 'V8'})) # exclude everything but V2, V3, V4
bm_referral = df_subset['referral_complexity'] != 6 # no visual processing

# Calculate remaining regulations per mask
print(sum(bm_preprocessing))
print(sum(bm_clarity))
print(sum(bm_syntactic))
print(sum(bm_validation))
print(sum(bm_referral))

65
85
85
49
85


In [65]:
# Apply boolean masks
df_subset_filtered = df_subset[bm_preprocessing & bm_clarity & bm_syntactic & bm_validation & bm_referral].reset_index(drop=True)

# Calculate remaining regulations
len(df_subset_filtered)

39

In [66]:
# Show remaining subset
df_subset_filtered

Unnamed: 0,line,type,full_match,label,id,title,URL,code,text_original,text_translated,preprocessing_success,clarity,syntactic_complexity,validation_complexity,referral_complexity
0,2031,SUBLVLI,2.\t,Lid,2.0,(binnenoppervlak),https://wetten.overheid.nl/BWBR0041297/2024-08...,C4_S4_2_P4_2_7_A4_43_SUB2,In afwijking van het eerste lid geldt de eis a...,"Contrary to the first sub-article, the require...",True,2,1,V3,2
1,2041,SUBLVLI,5.\t,Lid,5.0,(buitenoppervlak),https://wetten.overheid.nl/BWBR0041297/2024-08...,C4_S4_2_P4_2_7_A4_44_SUB5,In afwijking van het eerste tot en met derde l...,"Contrary from the first to third sub-article, ...",True,1,2,V3,5
2,2060,SUBLVLI,2.\t,Lid,2.0,(dakoppervlak),https://wetten.overheid.nl/BWBR0041297/2024-08...,C4_S4_2_P4_2_7_A4_47_SUB2,Het eerste lid geldt niet voor een bouwwerk me...,The first sub-article does not apply to a stru...,True,1,1,V3,2
3,2106,SUBLVLI,1.\t,Lid,1.0,(brandcompartiment: ligging),https://wetten.overheid.nl/BWBR0041297/2024-08...,C4_S4_2_P4_2_8_A4_50_SUB1,Een besloten ruimte ligt in een brandcompartim...,A closed space is in a fire compartment.,True,1,1,V3V4,1
4,2112,SUBLVLI,3.\t,Lid,3.0,(brandcompartiment: ligging),https://wetten.overheid.nl/BWBR0041297/2024-08...,C4_S4_2_P4_2_8_A4_50_SUB3,Een wegtunnelbuis met een lengte van meer dan ...,A road tunnel tube with a length of more than ...,True,1,1,V3V4,1
5,2113,SUBLVLI,4.\t,Lid,4.0,(brandcompartiment: ligging),https://wetten.overheid.nl/BWBR0041297/2024-08...,C4_S4_2_P4_2_8_A4_50_SUB4,In afwijking van het eerste lid voert een extr...,"Contrary to the first sub-article, an extra pr...",True,1,1,V3V4,2
6,2114,SUBLVLI,5.\t,Lid,5.0,(brandcompartiment: ligging),https://wetten.overheid.nl/BWBR0041297/2024-08...,C4_S4_2_P4_2_8_A4_50_SUB5,Een niet-besloten gebruiksgebied ligt in een b...,A non-open use area is in a fire compartment.,True,1,1,V3V4,1
7,2117,SUBLVLI,8.\t,Lid,8.0,(brandcompartiment: ligging),https://wetten.overheid.nl/BWBR0041297/2024-08...,C4_S4_2_P4_2_8_A4_50_SUB8,Het eerste en vijfde lid zijn niet van toepass...,The first and fifth sub-article do not apply t...,True,1,4,V3,5
8,2119,SUBLVLI,1.\t,Lid,1.0,(brandcompartiment: omvang),https://wetten.overheid.nl/BWBR0041297/2024-08...,C4_S4_2_P4_2_8_A4_51_SUB1,Een brandcompartiment heeft een gebruiksopperv...,A fire compartment has a use area that is no l...,True,2,4,V3,5
9,2120,SUBLVLI,2.\t,Lid,2.0,(brandcompartiment: omvang),https://wetten.overheid.nl/BWBR0041297/2024-08...,C4_S4_2_P4_2_8_A4_51_SUB2,In een brandcompartiment liggen ten hoogste vi...,In a fire compartment there are a maximum of f...,True,4,2,V3V4,1



## Export filtered subset

In [67]:
df_subset_filtered.to_csv('output/BBL_subset_final_big.csv', index=False)

---
# **Manually pick final subset**

From the filtered set, a few regulations are picked that should be processable with the current Proof of Concept.


In [68]:
# df_subset_chosen = df_subset_filtered.iloc[[0, 38]].copy()
df_subset_final = df_subset_filtered.copy()[df_subset_filtered['line'].isin([2106, 2112, 2494])]
df_subset_final

Unnamed: 0,line,type,full_match,label,id,title,URL,code,text_original,text_translated,preprocessing_success,clarity,syntactic_complexity,validation_complexity,referral_complexity
3,2106,SUBLVLI,1.\t,Lid,1.0,(brandcompartiment: ligging),https://wetten.overheid.nl/BWBR0041297/2024-08...,C4_S4_2_P4_2_8_A4_50_SUB1,Een besloten ruimte ligt in een brandcompartim...,A closed space is in a fire compartment.,True,1,1,V3V4,1
4,2112,SUBLVLI,3.\t,Lid,3.0,(brandcompartiment: ligging),https://wetten.overheid.nl/BWBR0041297/2024-08...,C4_S4_2_P4_2_8_A4_50_SUB3,Een wegtunnelbuis met een lengte van meer dan ...,A road tunnel tube with a length of more than ...,True,1,1,V3V4,1
38,2494,SUBLVLI,2.\t,Lid,2.0,(brandklasse buitenoppervlak),https://wetten.overheid.nl/BWBR0041297/2024-08...,C4_S4_2_P4_2_14_A4_92_SUB2,In afwijking van het eerste lid voldoet een de...,"Contrary to the first sub-article, a door, a w...",True,4,2,V3,5


## Export handpicked final subset

In [69]:
df_subset_final.to_csv('output/BBL_subset_final_small.csv', index=False)