# BERT-based Software Requirements Classification

## Introduction
In this notebook, we will perform software requirements classification using the BERT model. The goal is to classify sentences into two categories: "Requirements" and "Non-Requirements". We will train multiple classifiers using cross-validation and evaluate their performance based on accuracy, precision, and recall.

## Dataset Description
The dataset used in this analysis is stored in the file "SoftwareReq300.xlsx". It contains two attributes for each data point:
- "Type": A boolean attribute representing the type of the sentence (1 for "Requirements" and 0 for "Non-Requirements").
- "Sentence": A natural language sentence describing a software requirement.



# Library Installation
Install the required libraries to use BERT model. In this case you need the `transformers` library.

In [None]:
!pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.30.2-py3-none-any.whl (7.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.2/7.2 MB[0m [31m63.7 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.14.1 (from transformers)
  Downloading huggingface_hub-0.15.1-py3-none-any.whl (236 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m236.8/236.8 kB[0m [31m25.4 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers)
  Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m121.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting safetensors>=0.3.1 (from transformers)
  Downloading safetensors-0.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90

# Execution
Ensure that the dataset is present in the same directory of `BERTClassifier.py` and
run the script. Each classifier's performance is evaluated using accuracy, precision, and recall scores. Results are printed in a table format. On a side note, you can customize the code by using other classifiers or other metrics.

In [None]:
!python BERTClassifier.py

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
╒════════════════════╤════════════╤═════════════╤══════════╕
│ Model              │   Accuracy │   Precision │   Recall │
╞════════════════════╪════════════╪═════════════╪══════════╡
│ SVC 