<a href="https://colab.research.google.com/github/AnovaYoung/Natural-Language-Processing/blob/main/LLM_for_Text_Classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Implementing and Evaluating a Large Language Model (LLM) for Text Classification

This assignment aims to provide hands-on experience with LLMs by implementing and evaluating a model for a text classification task.

In [34]:
import zipfile

zip_path = '/content/archive (3).zip'

with zipfile.ZipFile(zip_path, 'r') as zip_ref:

    zip_ref.extractall('/content/')

In [35]:
import pandas as pd

data_path = '/content/complaints_processed.csv'
df = pd.read_csv(data_path)

print(df.head())

   Unnamed: 0           product  \
0           0       credit_card   
1           1       credit_card   
2           2    retail_banking   
3           3  credit_reporting   
4           4  credit_reporting   

                                           narrative  
0  purchase order day shipping amount receive pro...  
1  forwarded message date tue subject please inve...  
2  forwarded message cc sent friday pdt subject f...  
3  payment history missing credit report speciali...  
4  payment history missing credit report made mis...  


Based on the output, I can see that the DataFrame has three columns: Unnamed: 0, product, and narrative.



In [36]:
# Check for missing values
missing_values = df.isnull().sum()
print(missing_values)

Unnamed: 0     0
product        0
narrative     10
dtype: int64


Based on the output, I have 10 missing values in the narrative column. Since the narrative column contains the text data we'll use for classification, it's really important to handle these missing entries before proceeding.

Before I remove any data, it's important to understand how many rows there are in total to ensure that dropping rows with missing values won't significantly impact the dataset.

In [37]:
# Get the total number of rows before removing NaN values
total_rows = df.shape[0]
print(f"Total number of rows before removing NaN values: {total_rows}")

# Number of missing 'narrative' entries
missing_narratives = df['narrative'].isnull().sum()
print(f"Number of missing 'narrative' entries: {missing_narratives}")

# Calculate the percentage of missing narratives
percentage_missing = (missing_narratives / total_rows) * 100
print(f"Percentage of missing 'narrative' entries: {percentage_missing:.4f}%")


Total number of rows before removing NaN values: 162421
Number of missing 'narrative' entries: 10
Percentage of missing 'narrative' entries: 0.0062%


This is great, removing the 10 rows will not impact the df at all, lets proceed.

In [38]:
# Remove rows with missing 'narrative' values
df = df.dropna(subset=['narrative'])

# Verify that there are no more missing values in 'narrative'
missing_values_after = df['narrative'].isnull().sum()
print(f"Missing values in 'narrative' after cleaning: {missing_values_after}")

# Updated total number of rows
total_rows_after = df.shape[0]
print(f"Total number of rows after removing NaN values: {total_rows_after}")


Missing values in 'narrative' after cleaning: 0
Total number of rows after removing NaN values: 162411


Ok, now I'm going to check for duplicates based on 'narrative' column

In [39]:
duplicate_count = df.duplicated(subset=['narrative']).sum()
print(f"Number of duplicate narratives: {duplicate_count}")

Number of duplicate narratives: 37939


In [40]:
# Percentage of duplicate narratives
percentage_duplicates = (duplicate_count / df.shape[0]) * 100
print(f"Percentage of duplicate narratives: {percentage_duplicates:.2f}%")


Percentage of duplicate narratives: 23.36%


Total Rows: After removing missing values, we have 162,421 rows.

Duplicate Narratives: 37,939

Impact: 23.36% of the dataset consists of duplicate narratives.

Thats definetly not an insignificant number.

There are many different ways to deal with this, including investigating product labels and looking at class distribution. I'll look at class distribution in a moment but for this purposes of this project I am simply going to remove duplicate narratives, so there is only one unique occurance.

In [41]:
# This function removes duplicate rows based on the 'narrative' column.
# The parameter keep='first' ensures that the first occurrence of each narrative is kept, and subsequent duplicates are dropped.
df = df.drop_duplicates(subset=['narrative'], keep='first')

# Verify that there are no more duplicate narratives
duplicate_narratives = df.duplicated(subset=['narrative']).sum()
print(f"Number of duplicate narratives after removing duplicates: {duplicate_narratives}")


Number of duplicate narratives after removing duplicates: 0


df.drop_duplicates(subset=['narrative'], keep='first'):

This function removes duplicate rows based on the 'narrative' column.

The parameter keep='first' ensures that the first occurrence of each narrative is kept, and subsequent duplicates are dropped.

Now that I've removed duplicate narratives, I will examine the distribution of the classes in the 'product' column to understand how the data is spread across different categories.

In [42]:
# Get the count of each class in 'product'
class_distribution = df['product'].value_counts()

# Print the class distribution
print("Class Distribution:")
print(class_distribution)

Class Distribution:
product
credit_reporting       56240
debt_collection        21057
mortgages_and_loans    18723
credit_card            14983
retail_banking         13469
Name: count, dtype: int64


'credit_reporting' has the highest number of complaints at 56,240 entries.

'retail_banking' has the lowest number of complaints at 13,469 entries.

There is a noticeable imbalance among the classes.

Let's get the percentage representation of each class to quantify the imbalance

In [43]:
# Calculate total number of entries
total_entries = df.shape[0]

# Calculate percentage for each class
class_percentage = (class_distribution / total_entries) * 100

# Print class percentages
print("Class Percentages:")
print(class_percentage)

Class Percentages:
product
credit_reporting       45.182852
debt_collection        16.917058
mortgages_and_loans    15.041937
credit_card            12.037245
retail_banking         10.820908
Name: count, dtype: float64


I will deal with this by adjusting class weights to facor the minority. I will do this later.

For now let's do some preprocessing:

Converting text to lowercase: Since I'm using an uncased model (bert-base-uncased), I'll convert all text to lowercase.

Removing leading and trailing whitespace: Ensures consistency in text formatting.

Replacing multiple spaces with a single space: Cleans up any irregular spacing.

In [44]:
# Convert narratives to lowercase
df['narrative'] = df['narrative'].str.lower()

# Remove leading and trailing whitespace
df['narrative'] = df['narrative'].str.strip()

# Replace multiple spaces with a single space
df['narrative'] = df['narrative'].str.replace('\s+', ' ', regex=True)


In [45]:
print("Sample narratives after preprocessing:")
print(df['narrative'].head(5))

Sample narratives after preprocessing:
0    purchase order day shipping amount receive pro...
1    forwarded message date tue subject please inve...
2    forwarded message cc sent friday pdt subject f...
3    payment history missing credit report speciali...
4    payment history missing credit report made mis...
Name: narrative, dtype: object


Label Encoding is the next step.

In [46]:
from sklearn.preprocessing import LabelEncoder

# Always initialize the label encoder first
label_encoder = LabelEncoder()

# Fit and transform the 'product' column to encode labels
df['label'] = label_encoder.fit_transform(df['product'])

# Map encoded labels back to original labels for reference
label_mapping = dict(zip(label_encoder.classes_, label_encoder.transform(label_encoder.classes_)))

# Display the label mapping
print("Label Mapping:")
for product, label in label_mapping.items():
    print(f"'{product}': {label}")


Label Mapping:
'credit_card': 0
'credit_reporting': 1
'debt_collection': 2
'mortgages_and_loans': 3
'retail_banking': 4


Split the Data into Training and Testing Sets

In [47]:
from sklearn.model_selection import train_test_split

X = df['narrative'].values
y = df['label'].values

# Split the data into training and testing
X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.2,
    random_state=42,
    stratify=y
)

# create DataFrames for the splits
df_train = pd.DataFrame({'narrative': X_train, 'label': y_train})
df_test = pd.DataFrame({'narrative': X_test, 'label': y_test})

# Verify the size of the splits
print(f"Training set size: {df_train.shape[0]} records")
print(f"Testing set size: {df_test.shape[0]} records")


Training set size: 99577 records
Testing set size: 24895 records


Computing class weights helps address class imbalance by assigning higher weights to minority classes during model training. This ensures the model pays more attention to underrepresented classes.

In [48]:
from sklearn.utils.class_weight import compute_class_weight
import numpy as np

# Get the unique class labels from the training set
class_labels = np.unique(y_train)

# Compute class weights
class_weights = compute_class_weight(
    class_weight='balanced',
    classes=class_labels,
    y=y_train
)

# Create a dictionary that maps the labels to the weights
class_weights_dict = dict(zip(class_labels, class_weights))

# Display the class weights with corresponding product names
print("Class Weights:")
for label, weight in class_weights_dict.items():
    product = list(label_mapping.keys())[list(label_mapping.values()).index(label)]
    print(f"Class '{product}' (Label {label}): Weight {weight:.2f}")


Class Weights:
Class 'credit_card' (Label 0): Weight 1.66
Class 'credit_reporting' (Label 1): Weight 0.44
Class 'debt_collection' (Label 2): Weight 1.18
Class 'mortgages_and_loans' (Label 3): Weight 1.33
Class 'retail_banking' (Label 4): Weight 1.85


Now ill prepare the data for training by tokenizing the text using BERT's tokenizer

In [49]:
from transformers import BertTokenizerFast

# Load the BERT tokenizer
tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased')

# Tokenize the training data
train_encodings = tokenizer(
    list(X_train),
    truncation=True,
    padding=True,
    max_length=128
)

# Tokenize the testing data
test_encodings = tokenizer(
    list(X_test),
    truncation=True,
    padding=True,
    max_length=128
)




Now, let's proceed to convert the tokenized data into PyTorch tensors and prepare the datasets for model training.

In [50]:
import torch

# Convert tokenized inputs to tensors
train_input_ids = torch.tensor(train_encodings['input_ids'])
train_attention_masks = torch.tensor(train_encodings['attention_mask'])
train_labels = torch.tensor(y_train)

test_input_ids = torch.tensor(test_encodings['input_ids'])
test_attention_masks = torch.tensor(test_encodings['attention_mask'])
test_labels = torch.tensor(y_test)

# Create TensorDatasets
train_dataset = torch.utils.data.TensorDataset(
    train_input_ids, train_attention_masks, train_labels
)

test_dataset = torch.utils.data.TensorDataset(
    test_input_ids, test_attention_masks, test_labels
)


Creating dsataloaders:

DataLoaders:
Facilitate batch processing and shuffling of data during training.

Samplers:
RandomSampler: Randomly samples elements for training data.

SequentialSampler: Samples elements sequentially for testing data.

In [51]:
from torch.utils.data import DataLoader, RandomSampler, SequentialSampler

# Define batch size
batch_size = 16

# Create DataLoaders
train_dataloader = DataLoader(
    train_dataset,
    sampler=RandomSampler(train_dataset),
    batch_size=batch_size
)

test_dataloader = DataLoader(
    test_dataset,
    sampler=SequentialSampler(test_dataset),
    batch_size=batch_size
)


Set up model

In [52]:
from transformers import BertForSequenceClassification

# Number of labels
num_labels = len(label_encoder.classes_)

# Load pre-trained BERT model with the number of output labels
model = BertForSequenceClassification.from_pretrained(
    'bert-base-uncased',
    num_labels=num_labels
)

# Move model to device (CPU or GPU)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSdpaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e

Define the Optimizer and Scheduler

In [53]:
from transformers import AdamW, get_linear_schedule_with_warmup

# Define optimizer
optimizer = AdamW(model.parameters(), lr=2e-5)

# Total number of training steps
epochs = 3
total_steps = len(train_dataloader) * epochs

# Define scheduler
scheduler = get_linear_schedule_with_warmup(
    optimizer,
    num_warmup_steps=0,
    num_training_steps=total_steps
)




Modify the Loss Function to Include Class Weights

In [54]:
from torch.nn import CrossEntropyLoss

# Convert class weights to tensor
class_weights_tensor = torch.tensor(list(class_weights_dict.values()), dtype=torch.float).to(device)

# Custom training loop
def train_epoch(model, dataloader):
    model.train()
    total_loss = 0

    for batch in dataloader:
        b_input_ids, b_input_mask, b_labels = tuple(t.to(device) for t in batch)

        optimizer.zero_grad()

        # Forward pass
        outputs = model(
            b_input_ids,
            attention_mask=b_input_mask,
            labels=b_labels
        )

        logits = outputs.logits

        # Compute loss with class weights
        loss_fct = CrossEntropyLoss(weight=class_weights_tensor)
        loss = loss_fct(logits.view(-1, num_labels), b_labels.view(-1))

        total_loss += loss.item()

        # Backward pass
        loss.backward()
        optimizer.step()
        scheduler.step()

    avg_loss = total_loss / len(dataloader)
    return avg_loss


Train the Model

I tried to run the model and it was too long. So im going to try training on a smaller dataset which will significantly reduce the training time. I'll take a random sample of your training data.

In [55]:
# Sample 10% of the training data
df_train_sample = df_train.sample(frac=0.1, random_state=42)

# Update X_train and y_train
X_train_sample = df_train_sample['narrative'].values
y_train_sample = df_train_sample['label'].values

# Tokenize the sampled training data
train_encodings_sample = tokenizer(
    list(X_train_sample),
    truncation=True,
    padding=True,
    max_length=128
)

# Convert to tensors
train_input_ids_sample = torch.tensor(train_encodings_sample['input_ids'])
train_attention_masks_sample = torch.tensor(train_encodings_sample['attention_mask'])
train_labels_sample = torch.tensor(y_train_sample)

# Create TensorDataset
train_dataset_sample = torch.utils.data.TensorDataset(
    train_input_ids_sample, train_attention_masks_sample, train_labels_sample
)

# Update DataLoader with the sampled dataset
train_dataloader = DataLoader(
    train_dataset_sample,
    sampler=RandomSampler(train_dataset_sample),
    batch_size=batch_size
)


frac=0.1: Samples 10% of the data.
Adjust frac as needed: You can increase or decrease this fraction based on how much data you want to use.

In [56]:
epochs = 2  # Reduce the number of epochs to run quicker


Reduce the maximum sequence length in tokenization.


In [57]:
# Use a smaller max_length
train_encodings = tokenizer(
    list(X_train),
    truncation=True,
    padding=True,
    max_length=64
)

test_encodings = tokenizer(
    list(X_test),
    truncation=True,
    padding=True,
    max_length=64
)


Using a smaller model like DistilBERT can significantly speed up training.

In [58]:
from transformers import DistilBertForSequenceClassification, DistilBertTokenizerFast

# Load the DistilBERT tokenizer and model
tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-uncased')

model = DistilBertForSequenceClassification.from_pretrained(
    'distilbert-base-uncased',
    num_labels=num_labels
)

# Move model to device
model.to(device)


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)
 

Gradient Accumalation: Simulate a larger batch size with smaller batches.

In [60]:
gradient_accumulation_steps = 2

def train_epoch(model, dataloader):
    model.train()
    total_loss = 0
    optimizer.zero_grad()

    for step, batch in enumerate(dataloader):
        b_input_ids, b_input_mask, b_labels = tuple(t.to(device) for t in batch)
        outputs = model(
            b_input_ids,
            attention_mask=b_input_mask,
            labels=b_labels
        )
        loss = outputs.loss / gradient_accumulation_steps
        loss.backward()
        total_loss += loss.item()

        if (step + 1) % gradient_accumulation_steps == 0:
            optimizer.step()
            scheduler.step()
            optimizer.zero_grad()

    avg_loss = total_loss / len(dataloader)
    return avg_loss


In [61]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'Using device: {device}')

Using device: cuda


Use a fixed number of samples for quick experimentation.

In [62]:
# Use only the first 5000 samples for training
X_train_small = X_train[:5000]
y_train_small = y_train[:5000]

# Tokenize the smaller training set
train_encodings_small = tokenizer(
    list(X_train_small),
    truncation=True,
    padding=True,
    max_length=128
)

# Convert to tensors and create dataset
train_input_ids_small = torch.tensor(train_encodings_small['input_ids'])
train_attention_masks_small = torch.tensor(train_encodings_small['attention_mask'])
train_labels_small = torch.tensor(y_train_small)

train_dataset_small = torch.utils.data.TensorDataset(
    train_input_ids_small, train_attention_masks_small, train_labels_small
)

# Update DataLoader
train_dataloader = DataLoader(
    train_dataset_small,
    sampler=RandomSampler(train_dataset_small),
    batch_size=batch_size
)


In [63]:
# Tokenize sampled training data
train_encodings_sample = tokenizer(
    list(X_train_sample),
    truncation=True,
    padding=True,
    max_length=128
)

# Tokenize the test data (using the full test set)
test_encodings = tokenizer(
    list(X_test),
    truncation=True,
    padding=True,
    max_length=128
)


In [66]:
# Training tensors
train_input_ids = torch.tensor(train_encodings_sample['input_ids'])
train_attention_masks = torch.tensor(train_encodings_sample['attention_mask'])
train_labels = torch.tensor(y_train_sample)

# Testing tensors
test_input_ids = torch.tensor(test_encodings['input_ids'])
test_attention_masks = torch.tensor(test_encodings['attention_mask'])
test_labels = torch.tensor(y_test)


In [67]:
#training dataset
train_dataset = TensorDataset(train_input_ids, train_attention_masks, train_labels)

#testing dataset
test_dataset = TensorDataset(test_input_ids, test_attention_masks, test_labels)


In [68]:
# Define batch size
batch_size = 32

# Create the DataLoaders
train_dataloader = DataLoader(
    train_dataset,
    sampler=RandomSampler(train_dataset),
    batch_size=batch_size
)

test_dataloader = DataLoader(
    test_dataset,
    sampler=SequentialSampler(test_dataset),
    batch_size=batch_size
)


Set Up the Optimizer and Scheduler

In [69]:
from transformers import AdamW, get_linear_schedule_with_warmup

# Define the optimizer
optimizer = AdamW(model.parameters(), lr=2e-5)

# Set the number of epochs
epochs = 2

# Total number of training steps
total_steps = len(train_dataloader) * epochs

# this is the learning rate schedule
scheduler = get_linear_schedule_with_warmup(
    optimizer,
    num_warmup_steps=0,
    num_training_steps=total_steps
)


Re-Modify the Loss Function to Include Class Weights

In [70]:
from torch.nn import CrossEntropyLoss

# Ensure class_weights_dict uses labels from 0 to num_labels - 1
class_weights_list = [class_weights_dict[i] for i in range(num_labels)]

# Convert class weights to a tensor
class_weights_tensor = torch.tensor(class_weights_list, dtype=torch.float).to(device)


Define the Training Loop

In [71]:
def train_epoch(model, dataloader):
    model.train()
    total_loss = 0

    for batch in dataloader:
        b_input_ids, b_input_mask, b_labels = tuple(t.to(device) for t in batch)

        optimizer.zero_grad()

        # Forward pass
        outputs = model(
            b_input_ids,
            attention_mask=b_input_mask,
            labels=b_labels
        )

        logits = outputs.logits

        # Compute loss with class weights
        loss_fct = CrossEntropyLoss(weight=class_weights_tensor)
        loss = loss_fct(logits.view(-1, num_labels), b_labels.view(-1))

        total_loss += loss.item()

        # Backward pass
        loss.backward()
        optimizer.step()
        scheduler.step()

    avg_loss = total_loss / len(dataloader)
    return avg_loss


In [72]:
from sklearn.metrics import accuracy_score, precision_recall_fscore_support

def evaluate(model, dataloader):
    model.eval()
    predictions, true_labels = [], []

    with torch.no_grad():
        for batch in dataloader:
            b_input_ids, b_input_mask, b_labels = tuple(t.to(device) for t in batch)

            outputs = model(
                b_input_ids,
                attention_mask=b_input_mask
            )

            logits = outputs.logits
            predictions.append(logits.detach().cpu())
            true_labels.append(b_labels.cpu())

    predictions = torch.cat(predictions, dim=0)
    true_labels = torch.cat(true_labels, dim=0)

    preds_flat = torch.argmax(predictions, axis=1).flatten()
    labels_flat = true_labels.flatten()

    accuracy = accuracy_score(labels_flat, preds_flat)
    precision, recall, f1, _ = precision_recall_fscore_support(
        labels_flat, preds_flat, average='weighted'
    )

    print(f"Accuracy: {accuracy:.4f}")
    print(f"Precision: {precision:.4f}")
    print(f"Recall: {recall:.4f}")
    print(f"F1-Score: {f1:.4f}")


In [73]:
for epoch in range(epochs):
    print(f"Epoch {epoch + 1}/{epochs}")
    avg_train_loss = train_epoch(model, train_dataloader)
    print(f"Average training loss: {avg_train_loss:.4f}\n")

print("Training complete!")

print("Evaluating the model on the test set:")
evaluate(model, test_dataloader)



Epoch 1/2
Average training loss: 0.8278

Epoch 2/2
Average training loss: 0.4986

Training complete!
Evaluating the model on the test set:
Accuracy: 0.8004
Precision: 0.8141
Recall: 0.8004
F1-Score: 0.8027


**Analysis of the Model Output:**

Training Loss:

Epoch 1 Average Loss: 0.8278

Epoch 2 Average Loss: 0.4986

Observation: The training loss decreased significantly from the first to the second epoch, the model is learning effectively from the data.
Evaluation Metrics on Test Set:

Accuracy: 80.04%

Precision: 81.41%

Recall: 80.04%

F1-Score: 80.27%

Observation: The model achieves solid performance across all metrics, especially considering the reduced dataset size and limited training epochs.
Conclusions:

The decrease in training loss shows effective learning, and the reasonable loss values suggest the model isn't overfitting.

An accuracy of ~80% is respectable for a text classification task with multiple classes.

The close values of precision, recall, and F1-score indicate balanced performance without significant bias toward any class.

Next Steps:

Increase Training Data: Using more training samples could further improve performance.

Adjust Epochs: Training for additional epochs would definetly enhance learning.

Hyperparameter Tuning: Experimenting with learning rates, batch sizes, or using a different model can yield better results. Especially since i immedietly had to redo the and retune the model siince it took an hour originally to go nowhere.



