## Local Inference on GPU
Model page: https://huggingface.co/distilbert/distilbert-base-uncased

‚ö†Ô∏è If the generated code snippets do not work, please open an issue on either the [model repo](https://huggingface.co/distilbert/distilbert-base-uncased)
			and/or on [huggingface.js](https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/model-libraries-snippets.ts) üôè

In [1]:
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("fill-mask", model="distilbert/distilbert-base-uncased")

  from .autonotebook import tqdm as notebook_tqdm
Device set to use cuda:0


In [2]:
# Load model directly
from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
model = AutoModelForMaskedLM.from_pretrained("distilbert/distilbert-base-uncased")

## Remote Inference via Inference Providers
Ensure you have a valid **HF_TOKEN** set in your environment. You can get your token from [your settings page](https://huggingface.co/settings/tokens). Note: running this may incur charges above the free tier.
The following Python example shows how to run the model remotely on HF Inference Providers, automatically selecting an available inference provider for you.
For more information on how to use the Inference Providers, please refer to our [documentation and guides](https://huggingface.co/docs/inference-providers/en/index).

In [3]:
import torch
from transformers import (
    DistilBertTokenizerFast,
    DistilBertForSequenceClassification
)


In [4]:
import os

print(os.listdir("../data/raw"))


['disaster_response_messages_test.csv', 'disaster_response_messages_training.csv', 'disaster_response_messages_validation.csv']


In [5]:
import pandas as pd

# Base path
BASE_PATH = "../data/raw/"

# Load datasets
df  = pd.read_csv(BASE_PATH + "disaster_response_messages_training.csv")
dft = pd.read_csv(BASE_PATH + "disaster_response_messages_test.csv")
dfv = pd.read_csv(BASE_PATH + "disaster_response_messages_validation.csv")

# Quick sanity check
print("Training shape   :", df.shape)
print("Testing shape    :", dft.shape)
print("Validation shape :", dfv.shape)



Training shape   : (21046, 42)
Testing shape    : (2629, 42)
Validation shape : (2573, 42)


  df  = pd.read_csv(BASE_PATH + "disaster_response_messages_training.csv")


In [6]:
df = pd.read_csv("../data/raw/disaster_response_messages_test.csv")

In [7]:

dft = pd.read_csv("../data/raw/disaster_response_messages_training.csv")

  dft = pd.read_csv("../data/raw/disaster_response_messages_training.csv")


In [8]:
dfv = pd.read_csv("../data/raw/disaster_response_messages_validation.csv")
print(dfv.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2573 entries, 0 to 2572
Data columns (total 42 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      2573 non-null   int64 
 1   split                   2573 non-null   object
 2   message                 2573 non-null   object
 3   original                987 non-null    object
 4   genre                   2573 non-null   object
 5   related                 2573 non-null   int64 
 6   PII                     2573 non-null   int64 
 7   request                 2573 non-null   int64 
 8   offer                   2573 non-null   int64 
 9   aid_related             2573 non-null   int64 
 10  medical_help            2573 non-null   int64 
 11  medical_products        2573 non-null   int64 
 12  search_and_rescue       2573 non-null   int64 
 13  security                2573 non-null   int64 
 14  military                2573 non-null   int64 
 15  chil

In [9]:
print(df.shape, dfv.shape, dft.shape)
df.head(2)


(2629, 42) (2573, 42) (21046, 42)


Unnamed: 0,id,split,message,original,genre,related,PII,request,offer,aid_related,...,aid_centers,other_infrastructure,weather_related,floods,storm,fire,earthquake,cold,other_weather,direct_report
0,9,test,UN reports Leogane 80-90 destroyed. Only Hospi...,UN reports Leogane 80-90 destroyed. Only Hospi...,direct,1,0,1,0,1,...,0,0,0,0,0,0,0,0,0,0
1,39,test,We are at Gressier we needs assistance right a...,Se gressier nou an difikilte tanpri vin ede nou,direct,1,0,1,0,1,...,0,0,1,1,0,0,0,0,0,1


In [10]:
DISASTER_LABELS = [
    "fire",
    "floods",
    "earthquake",
    "storm",
    "cold",
    "other_weather"
]


In [11]:
NUM_LABELS = len(DISASTER_LABELS)


In [12]:
tokenizer = DistilBertTokenizerFast.from_pretrained(
    "distilbert-base-uncased"
)


In [13]:
model = DistilBertForSequenceClassification.from_pretrained(
    "distilbert-base-uncased",
    num_labels=NUM_LABELS
)


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [14]:
def get_disaster_class(row):
    for idx, label in enumerate(DISASTER_LABELS):
        if row[label] == 1:
            return idx
    return -1


In [15]:
df["disaster_class"]  = df.apply(get_disaster_class, axis=1)
dfv["disaster_class"] = dfv.apply(get_disaster_class, axis=1)
dft["disaster_class"] = dft.apply(get_disaster_class, axis=1)


In [16]:
df  = df[df["disaster_class"] != -1]
dfv = dfv[dfv["disaster_class"] != -1]
dft = dft[dft["disaster_class"] != -1]


In [17]:
df["disaster_class"].value_counts()


disaster_class
1    245
2    218
3    169
5     63
4     25
0     11
Name: count, dtype: int64

In [18]:
import torch
from torch.utils.data import Dataset

class DisasterDataset(Dataset):
    def __init__(self, texts, labels, tokenizer, max_len=128):
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_len = max_len

    def __len__(self):
        return len(self.texts)

    def __getitem__(self, idx):
        encoding = self.tokenizer(
            self.texts[idx],
            truncation=True,
            padding="max_length",
            max_length=self.max_len,
            return_tensors="pt"
        )

        item = {k: v.squeeze(0) for k, v in encoding.items()}
        item["labels"] = torch.tensor(self.labels[idx], dtype=torch.long)
        return item


In [19]:
train_dataset = DisasterDataset(
    df["message"].tolist(),
    df["disaster_class"].tolist(),
    tokenizer
)

val_dataset = DisasterDataset(
    dfv["message"].tolist(),
    dfv["disaster_class"].tolist(),
    tokenizer
)

test_dataset = DisasterDataset(
    dft["message"].tolist(),
    dft["disaster_class"].tolist(),
    tokenizer
)


In [20]:
from transformers import DistilBertForSequenceClassification

model = DistilBertForSequenceClassification.from_pretrained(
    "distilbert-base-uncased",
    num_labels=NUM_LABELS
)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): DistilBertSdpaAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)


In [21]:
%pip install accelerate==0.26.1


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 25.1.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


In [22]:
from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir="./distilbert_disaster_type",
    eval_strategy="epoch",         
    save_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.01,
    logging_dir="./logs",
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss"
)



In [23]:
from torch.optim import AdamW
from torch.nn import CrossEntropyLoss

optimizer = AdamW(model.parameters(), lr=2e-5)
loss_fn = CrossEntropyLoss()


In [24]:
from torch.utils.data import DataLoader

train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)
val_loader   = DataLoader(val_dataset, batch_size=16)


In [25]:
import numpy as np
from sklearn.utils.class_weight import compute_class_weight

labels = df["disaster_class"].values

class_weights = compute_class_weight(
    class_weight="balanced",
    classes=np.unique(labels),
    y=labels
)

class_weights = torch.tensor(class_weights, dtype=torch.float).to(device)
print(class_weights)


tensor([11.0758,  0.4973,  0.5589,  0.7209,  4.8733,  1.9339], device='cuda:0')


In [26]:
from tqdm import tqdm

EPOCHS = 5

for epoch in range(EPOCHS):
    print(f"\nEpoch {epoch+1}/{EPOCHS}")

    model.train()
    total_loss = 0

    for batch in tqdm(train_loader):
        optimizer.zero_grad()

        input_ids = batch["input_ids"].to(device)
        attention_mask = batch["attention_mask"].to(device)
        labels = batch["labels"].to(device)

        outputs = model(
            input_ids=input_ids,
            attention_mask=attention_mask,
            labels=labels
        )

        loss = outputs.loss
        total_loss += loss.item()

        loss.backward()
        optimizer.step()

    avg_train_loss = total_loss / len(train_loader)
    print(f"Training loss: {avg_train_loss:.4f}")

    # ---- Validation ----
    model.eval()
    val_loss = 0

    with torch.no_grad():
        for batch in val_loader:
            input_ids = batch["input_ids"].to(device)
            attention_mask = batch["attention_mask"].to(device)
            labels = batch["labels"].to(device)

            outputs = model(
                input_ids=input_ids,
                attention_mask=attention_mask
            )

            loss = loss_fn(outputs.logits, labels)


    avg_val_loss = val_loss / len(val_loader)
    print(f"Validation loss: {avg_val_loss:.4f}")



Epoch 1/5


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 46/46 [00:09<00:00,  4.87it/s]


Training loss: 1.5056
Validation loss: 0.0000

Epoch 2/5


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 46/46 [00:08<00:00,  5.22it/s]


Training loss: 0.8905
Validation loss: 0.0000

Epoch 3/5


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 46/46 [00:08<00:00,  5.21it/s]


Training loss: 0.4952
Validation loss: 0.0000

Epoch 4/5


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 46/46 [00:08<00:00,  5.19it/s]


Training loss: 0.3183
Validation loss: 0.0000

Epoch 5/5


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 46/46 [00:08<00:00,  5.19it/s]


Training loss: 0.1917
Validation loss: 0.0000


In [27]:
test_loader = DataLoader(test_dataset, batch_size=16)

model.eval()
correct, total = 0, 0

with torch.no_grad():
    for batch in test_loader:
        input_ids = batch["input_ids"].to(device)
        attention_mask = batch["attention_mask"].to(device)
        labels = batch["labels"].to(device)

        outputs = model(
            input_ids=input_ids,
            attention_mask=attention_mask
        )

        preds = torch.argmax(outputs.logits, dim=1)
        correct += (preds == labels.to(device)).sum().item()
        total += labels.size(0)

accuracy = correct / total
print(f"Test Accuracy: {accuracy:.4f}")



Test Accuracy: 0.7674


In [28]:
model.save_pretrained("distilbert_disaster_type")
tokenizer.save_pretrained("distilbert_disaster_type")


('distilbert_disaster_type\\tokenizer_config.json',
 'distilbert_disaster_type\\special_tokens_map.json',
 'distilbert_disaster_type\\vocab.txt',
 'distilbert_disaster_type\\added_tokens.json',
 'distilbert_disaster_type\\tokenizer.json')

In [29]:
!ls


'ls' is not recognized as an internal or external command,
operable program or batch file.


In [30]:
import torch
from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

tokenizer = DistilBertTokenizerFast.from_pretrained(
    "distilbert_disaster_type"
)

model = DistilBertForSequenceClassification.from_pretrained(
    "distilbert_disaster_type"
)

model.to(device)
model.eval()


DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): DistilBertSdpaAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)


In [31]:
DISASTER_LABELS = [
    "fire",
    "floods",
    "earthquake",
    "storm",
    "cold",
    "other_weather"
]

def predict_disaster_type(text):
    inputs = tokenizer(
        text,
        return_tensors="pt",
        truncation=True,
        padding=True,
        max_length=128
    )

    inputs = {k: v.to(device) for k, v in inputs.items()}

    with torch.no_grad():
        outputs = model(**inputs)

    probs = torch.softmax(outputs.logits, dim=1)
    pred_idx = torch.argmax(probs, dim=1).item()

    return {
        "disaster_type": DISASTER_LABELS[pred_idx],
        "confidence": probs[0][pred_idx].item()
    }


In [32]:
predict_disaster_type(
    "There is a massive fire in the building and people are trapped"
)


{'disaster_type': 'other_weather', 'confidence': 0.37441959977149963}

In [33]:
predict_disaster_type(
    "There is a massive earthquake in the building and people are trapped"
)

{'disaster_type': 'earthquake', 'confidence': 0.9777637720108032}

In [34]:
predict_disaster_type(
    "Flood water has entered our houses after heavy rain"
)


{'disaster_type': 'floods', 'confidence': 0.9785876274108887}

In [35]:
dft[["message", "disaster_class"]].sample(5, random_state=42)


Unnamed: 0,message,disaster_class
12256,On 21 April a ferocious northwesterly wind tra...,3
12164,At least five people were swept away by surgin...,1
1386,Hello. Is it true the earthquacke was 7. 1 mag...,2
3131,I would like to find more information about th...,2
14382,"In flooded northeastern Assam state, another t...",1


In [36]:
text = dft.iloc[0]["message"]
print(text)

predict_disaster_type(text)


Is the Hurricane over or is it not over


{'disaster_type': 'storm', 'confidence': 0.9571081399917603}

In [37]:
text = dft.iloc[10]["message"]
print(text)

predict_disaster_type(text)


The message might be saying that they have been stuck in the presidential palace ( pal ) since the same Tuesday ( as the quake ). They need water. The message says they are not finding a little water. No names, no number of people given.


{'disaster_type': 'earthquake', 'confidence': 0.979576587677002}

In [38]:
row = dft.sample(1, random_state=42).iloc[0]

print("TEXT:")
print(row["message"])

print("\nTRUE LABEL:")
print(DISASTER_LABELS[row["disaster_class"]])

print("\nMODEL PREDICTION:")
print(predict_disaster_type(row["message"]))


TEXT:
On 21 April a ferocious northwesterly wind travelling at a speed of 120 kilometres an hour traversed the Districts of Netrokona and Rangpur in the north of Bangladesh.

TRUE LABEL:
storm

MODEL PREDICTION:
{'disaster_type': 'storm', 'confidence': 0.9425691366195679}


In [1]:
import torch
print(torch.cuda.is_available())
print(torch.cuda.get_device_name(0))


True
NVIDIA GeForce RTX 3050 Laptop GPU
