###### Credits to hugging face documentation

In [1]:
import logging
logging.getLogger("transformers").setLevel(logging.ERROR) # This blocks all warnings

# Pipeline

The pipeline() is the easiest and fastest way to use a pretrained model for inference.
Start by creating an instance of pipeline()
The pipeline() downloads and caches a default pretrained model and tokenizer for sentiment analysis.
The pipeline() can accommodate any model from the Hub, making it easy to adapt the pipeline() for other use-cases.  

In [2]:
from transformers import pipeline
classifier = pipeline("sentiment-analysis","distilbert-base-uncased-finetuned-sst-2-english")
classifier(["I don't think anybody hates cows.","I don't think anybody hate cows."]) # huh!


Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /home/arjun/NewPytorchEnv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so
/home/arjun/NewPytorchEnv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
ERROR: /home/arjun/NewPytorchEnv/bin/python3.10: undefined symbol: cudaRuntimeGetVersion
CUDA SETUP: libcudart.so path is None
CUDA SETUP: Is seems that your cuda installation is not in your path. See https://github.com/TimDettmers/bitsandbytes/issues/85 for more information.
CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!!
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 00
CUDA SETUP: Loading binary /home/arjun/NewPytorchE

  warn("The installed version of bitsandbytes was compiled without GPU support. "
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)


[{'label': 'NEGATIVE', 'score': 0.9911612272262573},
 {'label': 'POSITIVE', 'score': 0.9829936027526855}]

In [3]:
model = pipeline('text-generation', "gpt2")
print(type(model))
print(model("Last night, I saw a cow")[0])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


<class 'transformers.pipelines.text_generation.TextGenerationPipeline'>




{'generated_text': 'Last night, I saw a cow alive and dead in the middle of a muddy field, which was covered by grass, just to see it. The man was in an almost a state of panic, and we all watched a black man move to the'}


# AutomodelForCausalLM and AutoTokenizer

While pipeline() is an awesome way to use pre-trained models, it encapsulates all the working, like tokenising of the input, and back. To get all fine-grained control of the whole process, we use:

In [4]:
from transformers import AutoModelForCausalLM, AutoTokenizer

# Change the model_name to GPT-2
model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Example text
text = "I saw a cow in the office, which"

# Tokenize the text
inputs = tokenizer(text, return_tensors="pt")

# Generate text using GPT-2
generated_ids = model.generate(input_ids=inputs["input_ids"], max_length=50, num_return_sequences=1)
generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)

# Print the generated text
print("Generated Text:", generated_text)


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Generated Text: I saw a cow in the office, which was a little bit of a mess. I was like, 'What the hell is going on?' And he said, 'I'm going to take a look at it.' And I said, 'What


# Using GPU

In [5]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Checking if GPU is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# initialising model and tokenizer for gpt2
gpt_model = AutoModelForCausalLM.from_pretrained('gpt2').to(device)
gpt_tokenizer = AutoTokenizer.from_pretrained('gpt2')

input_data = 'There was a time when I saw a cow in the office'

# Encoding the input and moving tensors to GPU
encoding = gpt_tokenizer(input_data, return_tensors="pt").to(device)

# Generate text using GPT-2
generated_ids = gpt_model.generate(input_ids=encoding['input_ids'], max_length=50, num_return_sequences=1)

# Move generated IDs to CPU before decoding
generated_ids = generated_ids[0].cpu()

# Decoding on CPU
generated_text = gpt_tokenizer.decode(generated_ids, skip_special_tokens=True)
print("Generated Text:", generated_text)


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [None]:
my_loc ="/home/arjun/Desktop/./pt_save" 

# Save the model

In [None]:
gpt_model.save_pretrained(my_loc)

# Load the model

In [None]:
loaded = AutoModelForCausalLM.from_pretrained(my_loc)
# print(loaded)

# Customising models

In [None]:
from transformers import AutoConfig, AutoModel

# Create a GPT-2 configuration with attention heads of 10. instead of the default 12
gpt2_config = AutoConfig.from_pretrained("gpt2", n_heads=10)

# Create a new GPT-2 model with the modified configuration
gpt2_model = AutoModel.from_config(gpt2_config)


# Trainer

All models are a standard torch.nn.Module so you can use them in any typical training loop. While you can write your own training loop, 🤗 Transformers provides a Trainer class for PyTorch, which contains the basic training loop and adds additional functionality for features like distributed training, mixed precision, and more.

In [3]:
my_path = '/home/arjun/Desktop/my_torch'

In [4]:
from transformers import AutoModelForSequenceClassification
from transformers import DataCollatorWithPadding
from transformers import Trainer
from transformers import TrainingArguments
from transformers import AutoTokenizer

model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")


training_args = TrainingArguments(
    output_dir=my_path,
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=2,
)


tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
from datasets import load_dataset

dataset = load_dataset("rotten_tomatoes") 
def tokenize_dataset(dataset):
    return tokenizer(dataset["text"])
dataset = dataset.map(tokenize_dataset, batched=True)


data_collator = DataCollatorWithPadding(tokenizer=tokenizer)


trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
)  

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.weight', 'classifier.bias', 'pre_classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Found cached dataset rotten_tomatoes (/home/arjun/.cache/huggingface/datasets/rotten_tomatoes/default/1.0.0/40d411e45a6ce3484deed7cc15b82a53dad9a72aafd9f86f8f227134bec5ca46)


  0%|          | 0/3 [00:00<?, ?it/s]

Loading cached processed dataset at /home/arjun/.cache/huggingface/datasets/rotten_tomatoes/default/1.0.0/40d411e45a6ce3484deed7cc15b82a53dad9a72aafd9f86f8f227134bec5ca46/cache-0efd0b42608041a8.arrow
Loading cached processed dataset at /home/arjun/.cache/huggingface/datasets/rotten_tomatoes/default/1.0.0/40d411e45a6ce3484deed7cc15b82a53dad9a72aafd9f86f8f227134bec5ca46/cache-2dd2718cfdee8ed3.arrow
Loading cached processed dataset at /home/arjun/.cache/huggingface/datasets/rotten_tomatoes/default/1.0.0/40d411e45a6ce3484deed7cc15b82a53dad9a72aafd9f86f8f227134bec5ca46/cache-087d3817e6ec9479.arrow


In [5]:
trainer.train()



  0%|          | 0/2134 [00:00<?, ?it/s]

You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


{'loss': 0.4611, 'learning_rate': 1.5313964386129335e-05, 'epoch': 0.47}
{'loss': 0.3966, 'learning_rate': 1.0627928772258671e-05, 'epoch': 0.94}
{'loss': 0.2746, 'learning_rate': 5.941893158388004e-06, 'epoch': 1.41}
{'loss': 0.2575, 'learning_rate': 1.2558575445173386e-06, 'epoch': 1.87}
{'train_runtime': 48.0955, 'train_samples_per_second': 354.711, 'train_steps_per_second': 44.37, 'train_loss': 0.3421803271535857, 'epoch': 2.0}


TrainOutput(global_step=2134, training_loss=0.3421803271535857, metrics={'train_runtime': 48.0955, 'train_samples_per_second': 354.711, 'train_steps_per_second': 44.37, 'train_loss': 0.3421803271535857, 'epoch': 2.0})