Skip to content

Code and dataset for the paper named "Text Classification-Based Approach for Evaluating and Enhancing Machine Interpretability of Building Codes"

License

Notifications You must be signed in to change notification settings

SkydustZ/Text-Classification-Based-Approach-for-Evaluating-and-Enhancing-Machine-Interpretability-of-Building

Repository files navigation

Text-Classification-Based-Approach-for-Evaluating-and-Enhancing-Machine-Interpretability-of-Building

--author: zhengzhe
--date: 2022.10.26

environment

python 3.7
torch 1.12.1+cu116 boto3 1.24.28
matplotlib 3.5.3
tqdm
sklearn
tensorboardX

Dataset

  • Description: Chinese rule dataset including seven categories are established to classify the interpretability level of each rule in a building code
  • The original labeled dataset can be found in CivilRules/dataset
  • The training, validation, and test dataset can be found in CivilRules/data
Category Definition Interpretability
direct The required information is explicitly available from the BIM model Easy
indirect The required information is implicitly stored in the BIM model. A set of derivations and calculations should be performed. Easy
method An extended data structure and domain-specific knowledge are required. Medium
reference The external information, including pictures, formulas, tables, and other rules or appendices in the current code or other codes, is required. Medium
general The rules provide macro design guidance. Hard
term The rules define the terms used in the codes. Hard
other The rules do not belong to the above six categories. Hard

Models

model Weighted F1 score
TextCNN 86.3%
TextRNN 72.2%
TextRNN-Att 81.5%
Transformers 74.0%
Bert 88.04%
RuleBERT 93.68%

Further pretrained domain-specific models

  • The original Bert model can be found in google drive
    • Please put the original Bert model in ./bert_pretrain
  • The further pretrained domain-specific Bert model (RuleBERT) can be found in google drive
    • Please put the RuleBERT model in ./bert_pretraindc

Finetune BERT models

  • The well trained BERT models (.ckpt files) can be found in google drive
  • Please put these models in ./CivilRules/save_dict

Well-trained other models

How to use

Validate the BERT model results using well fine-tuned model

  • assert the bert models and the finetune models have been put into the right place
  • put test dataset (test.txt) in to ./CivilRules/data
# validate the bert model weighted F1 score
python test.py --model bert
# validate the RuleBERT model weighted F1 score
python test.py --model bertDC

Train your own model using grid_search to find the best model

  • prepare your own test dataset in to ./CivilRules/data
  • modify the dataset, learning_rates, batch_sizes in grid_search.py
# to finetune bert model
python grid_search.py --model bert
# to finetune RuleBERT model
python grid_search.py --model bertDC

Predict with the well-trained BERT model

  • prepare your own prediction dataset (predict.txt) and named it to dev.txt, and then put it in to ./CivilRules/data
  • modify the dataset in application.py
  • prepare well-trained bert model in to ./CivilRules/save_dict
python application.py --model bert
python application.py --model bertDC
  • the result will be saved in ./CivilRules/predict

About

Code and dataset for the paper named "Text Classification-Based Approach for Evaluating and Enhancing Machine Interpretability of Building Codes"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages