# Text Classification with BERT and Hugging Face Transformers
In this notebook, we will:
1. Install necessary libraries.
2. Load and inspect the dataset.
3. Preprocess the data.
4. Tokenize the data using BERT tokenizer.
5. Build and fine-tune the BERT model.
6. Evaluate the model's performance.

In [None]:
# Install necessary libraries
!pip install transformers datasets torch scikit-learn

In [None]:
# Importing necessary libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, accuracy_score
import torch
from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
from datasets import load_dataset, load_metric

## 1. Load and inspect the dataset
We'll use a sample dataset for text classification.

In [None]:
# Load the dataset
dataset = load_dataset('imdb')

# Display the first 5 rows of the training data
dataset['train'][0:5]

## 2. Preprocess the data
We'll preprocess the data by converting text to lowercase.

In [None]:
# Preprocess the data
def preprocess_function(examples):
    return {'text': [text.lower() for text in examples['text']]}

dataset = dataset.map(preprocess_function, batched=True)

## 3. Tokenize the data using BERT tokenizer
We'll tokenize the text data using the BERT tokenizer.

In [None]:
# Load the BERT tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

# Tokenize the data
def tokenize_function(examples):
    return tokenizer(examples['text'], padding='max_length', truncation=True)

tokenized_datasets = dataset.map(tokenize_function, batched=True)

## 4. Build and fine-tune the BERT model
We'll build and fine-tune a pre-trained BERT model for text classification.

In [None]:
# Load the pre-trained BERT model
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)

# Define training arguments
training_args = TrainingArguments(
    output_dir='./results',
    evaluation_strategy='epoch',
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=3,
    weight_decay=0.01,
    logging_dir='./logs',
)

# Define the Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets['train'],
    eval_dataset=tokenized_datasets['test'],
    tokenizer=tokenizer,
)

# Train the model
trainer.train()

## 5. Evaluate the model's performance
We'll evaluate the model's performance on the test data.

In [None]:
# Evaluate the model
results = trainer.evaluate()
results