# Running Inference on the Fine-Tuned LLM

Once the LLM has been fine-tuned, this notebook performs **inference** to generate synthetic data. The model processes `test.jsonl`, generates text responses, and saves the outputs.

Key Steps:  
- Load the fine-tuned LLM checkpoint.
- Process `test.jsonl`, which contains unseen examples for inference.
- Generate synthetic outputs under **two scenarios**:
  - **With Differential Privacy (DP)** → Adds noise for privacy protection.
  - **Without Differential Privacy (Non-DP)** → Pure model output without privacy constraints.
- Save results into `generated_sequences_with_dp.jsonl` and `generated_sequences_no_dp.jsonl`.

The generated synthetic data is later analyzed for **privacy leakage, fidelity, and utility**.


## Get first 10000 rows modified

In [None]:
import json

formatted_strings = []

with open("test.jsonl", "r") as f:
    for j, line in enumerate(f):
        if j <= 10000:
            j+=1
        data = json.loads(line.strip())
        # Extract values
        product_title = data['Product Title']
        product_category = data['Product Categories']
        review_rating = data['Rating']
        review_title = data['Review Title']
        review = data['Review']

        # Format the string as per the required format
        formatted_string = f'System prompt : Given the Product Title, Product Category, Review Rating and Review Title, you are required to generate the Review | Product Title: {product_title} | Product Category: {product_category} | Review Rating: {review_rating} | Review Title: {review_title} | Review: {review}'


        # Add the formatted string to the list
        formatted_strings.append(formatted_string)

print(f"Processed {len(formatted_strings)} lines.")


Processed 10000 lines.


In [None]:
data

{'System prompt': 'Given the Rating and Title, you are required to generate the review',
 'Rating': 3,
 'Review Title': 'Everything except the battery is perfect',
 'Review': 'Arrived 4 days ahead of schedule. Touch screen, facial ID, camera all work great!<br /><br />Phone was fully unlocked as advertised.<br /><br />I popped my SIM card out of my old phone into this one and it starred working immediately.<br /><br />Only complaint I have is the battery life.<br /><br />Battery might last 2 hrs on full charge. Once battery hits 15% it dies.',
 'Product Title': 'Apple iPhone X, US Version, 256GB, Silver - AT&T (Renewed)',
 'Product Categories': 'Cell Phones & Accessories'}

In [None]:
formatted_strings[0]

"System prompt : Given the Product Title, Product Category, Review Rating and Review Title, you are required to generate the Review | Product Title: Case for Galaxy Note 9,Cutebe Shockproof Series Hard PC+ TPU Bumper Protective Case for Samsung Galaxy Note 9 Crystal | Product Category: Cell Phones & Accessories | Review Rating: 4 | Review Title: Not a bad price for protection and cuteness | Review: Looks and works great. It was a little little on the loose fitting side but now it's fine. I've dropped my phone quite a bit and my phone has come out fine."

In [None]:
# from transformers import AutoModelForCausalLM, AutoTokenizer
# import torch

# save_directory = "."

# # Load the model with half precision if supported
# model = AutoModelForCausalLM.from_pretrained(save_directory, torch_dtype=torch.float16)
# tokenizer = AutoTokenizer.from_pretrained(save_directory)

# device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# print(f"Using device: {device}")

# model.to(device)
# model.eval()


# sample_prompt = ("System prompt: Given the Rating and Title, you are required to generate the review, "
#                  "Rating: 5, Title: Would definitely buy again, Review:")

# inputs = tokenizer(sample_prompt, return_tensors="pt", padding=True, truncation=True, max_length=128)
# inputs = {key: value.to(device) for key, value in inputs.items()}

# with torch.no_grad():
#     generated_ids = model.generate(**inputs, max_length=128, do_sample=True, top_k=50)
# generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
# print("Generated text:", generated_text)


In [None]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    A token is already saved on your machine. Run `huggingface-cli whoami` to get more information or `huggingface-cli logout` if you want to log out.
    Setting a new token will erase the existing one.
    To log in, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) n
Token is valid (permission: write

## Load model with DP and perform 10000 inferences


In [None]:
import json
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Directory where your model is saved
save_directory = "with-dp-finetuned-model"

# Load the model with half precision if supported
model = AutoModelForCausalLM.from_pretrained(save_directory, torch_dtype=torch.float16)
tokenizer = AutoTokenizer.from_pretrained(save_directory)

# Set up the device: use CUDA if available, otherwise fallback to CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

model.to(device)
model.eval()


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Using device: cuda


LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(128256, 4096)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaSdpaAttention(
          (q_proj): lora.Linear(
            (base_layer): Linear(in_features=4096, out_features=4096, bias=False)
            (lora_dropout): ModuleDict(
              (default): Dropout(p=0.05, inplace=False)
            )
            (lora_A): ModuleDict(
              (default): Linear(in_features=4096, out_features=8, bias=False)
            )
            (lora_B): ModuleDict(
              (default): Linear(in_features=8, out_features=4096, bias=False)
            )
            (lora_embedding_A): ParameterDict()
            (lora_embedding_B): ParameterDict()
            (lora_magnitude_vector): ModuleDict()
          )
          (k_proj): lora.Linear(
            (base_layer): Linear(in_features=4096, out_features=1024, bias=False)
            (lora_dropout): ModuleDict(
              (defau

In [None]:
# import time
# import json  # Import json to handle JSON serialization

# # Read formatted prompts from file. Each line should contain one formatted prompt.
# formatted_prompts = formatted_strings

# # Output file to save generated sequences in JSONL format
# output_file = "generated_sequences.jsonl"

# # Set batch size to 10 and initialize timing and batch results
# batch_size = 10
# results_batch = []
# batch_start_time = time.time()

# for i, prompt in enumerate(formatted_prompts):
#     # Tokenize the prompt
#     inputs = tokenizer(prompt, return_tensors="pt", padding=True, truncation=True, max_length=128)
#     inputs = {key: value.to(device) for key, value in inputs.items()}

#     # Generate text from the prompt
#     with torch.no_grad():
#         generated_ids = model.generate(**inputs, max_length=128, do_sample=True, top_k=50)
#     generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)

#     results_batch.append(generated_text)

#     # Save batch and compute time after every 10 iterations
#     if (i + 1) % batch_size == 0:
#         batch_end_time = time.time()
#         batch_time = batch_end_time - batch_start_time

#         # Append ge


In [None]:
import time

output_file = "generated_sequences.jsonl"
batch_size = 100
formatted_prompts = formatted_strings
# Process prompts in batches
num_prompts = len(formatted_prompts)
for batch_start in range(0, num_prompts, batch_size):
    batch_prompts = formatted_prompts[batch_start : batch_start + batch_size]

    batch_start_time = time.time()

    # Tokenize the entire batch
    inputs = tokenizer(batch_prompts, return_tensors="pt", padding=True, truncation=True, max_length=128, padding_side='left')
    inputs = {key: value.to(device) for key, value in inputs.items()}

    # Generate text for the batch
    with torch.no_grad():
        generated_ids = model.generate(**inputs, max_length=256, do_sample=True, top_k=50)

    # Decode the generated sequences for each prompt
    batch_generated_texts = [tokenizer.decode(ids, skip_special_tokens=True) for ids in generated_ids]

    batch_end_time = time.time()
    batch_time = batch_end_time - batch_start_time

    # Write the generated outputs in JSONL format
    with open(output_file, "a") as outfile:
        for text in batch_generated_texts:
            json_line = json.dumps({"generated_text": text})
            outfile.write(json_line + "\n")

    print(f"Processed batch {(batch_start // batch_size) + 1} (prompts {batch_start} to {batch_start+len(batch_prompts)-1}). Time taken: {batch_time:.2f} seconds.")


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 1 (prompts 0 to 99). Time taken: 34.73 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 2 (prompts 100 to 199). Time taken: 34.15 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 3 (prompts 200 to 299). Time taken: 34.56 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 4 (prompts 300 to 399). Time taken: 34.34 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 5 (prompts 400 to 499). Time taken: 34.32 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 6 (prompts 500 to 599). Time taken: 34.44 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 7 (prompts 600 to 699). Time taken: 34.38 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 8 (prompts 700 to 799). Time taken: 34.36 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 9 (prompts 800 to 899). Time taken: 34.41 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 10 (prompts 900 to 999). Time taken: 34.32 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 11 (prompts 1000 to 1099). Time taken: 34.18 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 12 (prompts 1100 to 1199). Time taken: 34.19 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 13 (prompts 1200 to 1299). Time taken: 34.37 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 14 (prompts 1300 to 1399). Time taken: 34.32 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 15 (prompts 1400 to 1499). Time taken: 34.41 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 16 (prompts 1500 to 1599). Time taken: 34.41 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 17 (prompts 1600 to 1699). Time taken: 34.30 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 18 (prompts 1700 to 1799). Time taken: 34.17 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 19 (prompts 1800 to 1899). Time taken: 34.21 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 20 (prompts 1900 to 1999). Time taken: 34.31 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 21 (prompts 2000 to 2099). Time taken: 34.37 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 22 (prompts 2100 to 2199). Time taken: 34.29 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 23 (prompts 2200 to 2299). Time taken: 34.18 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 24 (prompts 2300 to 2399). Time taken: 34.18 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 25 (prompts 2400 to 2499). Time taken: 34.21 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 26 (prompts 2500 to 2599). Time taken: 34.32 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 27 (prompts 2600 to 2699). Time taken: 34.36 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 28 (prompts 2700 to 2799). Time taken: 34.44 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 29 (prompts 2800 to 2899). Time taken: 34.40 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 30 (prompts 2900 to 2999). Time taken: 34.35 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 31 (prompts 3000 to 3099). Time taken: 34.37 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 32 (prompts 3100 to 3199). Time taken: 34.40 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 33 (prompts 3200 to 3299). Time taken: 34.40 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 34 (prompts 3300 to 3399). Time taken: 34.37 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 35 (prompts 3400 to 3499). Time taken: 34.31 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 36 (prompts 3500 to 3599). Time taken: 34.36 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 37 (prompts 3600 to 3699). Time taken: 34.32 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 38 (prompts 3700 to 3799). Time taken: 34.41 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 39 (prompts 3800 to 3899). Time taken: 34.40 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 40 (prompts 3900 to 3999). Time taken: 34.34 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 41 (prompts 4000 to 4099). Time taken: 34.42 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 42 (prompts 4100 to 4199). Time taken: 34.41 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 43 (prompts 4200 to 4299). Time taken: 34.26 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 44 (prompts 4300 to 4399). Time taken: 34.38 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 45 (prompts 4400 to 4499). Time taken: 34.39 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 46 (prompts 4500 to 4599). Time taken: 34.34 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 47 (prompts 4600 to 4699). Time taken: 34.28 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 48 (prompts 4700 to 4799). Time taken: 34.22 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 49 (prompts 4800 to 4899). Time taken: 34.41 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 50 (prompts 4900 to 4999). Time taken: 34.35 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 51 (prompts 5000 to 5099). Time taken: 34.38 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 52 (prompts 5100 to 5199). Time taken: 34.38 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 53 (prompts 5200 to 5299). Time taken: 34.30 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 54 (prompts 5300 to 5399). Time taken: 34.28 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 55 (prompts 5400 to 5499). Time taken: 34.32 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 56 (prompts 5500 to 5599). Time taken: 34.21 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 57 (prompts 5600 to 5699). Time taken: 34.20 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 58 (prompts 5700 to 5799). Time taken: 34.39 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 59 (prompts 5800 to 5899). Time taken: 34.35 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 60 (prompts 5900 to 5999). Time taken: 34.27 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 61 (prompts 6000 to 6099). Time taken: 34.20 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 62 (prompts 6100 to 6199). Time taken: 34.24 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 63 (prompts 6200 to 6299). Time taken: 34.34 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 64 (prompts 6300 to 6399). Time taken: 34.29 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 65 (prompts 6400 to 6499). Time taken: 34.18 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 66 (prompts 6500 to 6599). Time taken: 34.20 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 67 (prompts 6600 to 6699). Time taken: 34.26 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 68 (prompts 6700 to 6799). Time taken: 34.35 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 69 (prompts 6800 to 6899). Time taken: 34.37 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 70 (prompts 6900 to 6999). Time taken: 34.42 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 71 (prompts 7000 to 7099). Time taken: 34.37 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 72 (prompts 7100 to 7199). Time taken: 34.36 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 73 (prompts 7200 to 7299). Time taken: 34.36 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 74 (prompts 7300 to 7399). Time taken: 34.36 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 75 (prompts 7400 to 7499). Time taken: 34.36 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 76 (prompts 7500 to 7599). Time taken: 34.39 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 77 (prompts 7600 to 7699). Time taken: 34.39 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 78 (prompts 7700 to 7799). Time taken: 34.35 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 79 (prompts 7800 to 7899). Time taken: 34.38 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 80 (prompts 7900 to 7999). Time taken: 34.42 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 81 (prompts 8000 to 8099). Time taken: 34.37 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 82 (prompts 8100 to 8199). Time taken: 34.33 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 83 (prompts 8200 to 8299). Time taken: 34.42 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 84 (prompts 8300 to 8399). Time taken: 34.33 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 85 (prompts 8400 to 8499). Time taken: 34.37 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 86 (prompts 8500 to 8599). Time taken: 34.29 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 87 (prompts 8600 to 8699). Time taken: 34.21 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 88 (prompts 8700 to 8799). Time taken: 34.19 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 89 (prompts 8800 to 8899). Time taken: 34.38 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 90 (prompts 8900 to 8999). Time taken: 34.36 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 91 (prompts 9000 to 9099). Time taken: 34.38 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 92 (prompts 9100 to 9199). Time taken: 34.39 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 93 (prompts 9200 to 9299). Time taken: 34.37 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 94 (prompts 9300 to 9399). Time taken: 34.36 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 95 (prompts 9400 to 9499). Time taken: 34.35 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 96 (prompts 9500 to 9599). Time taken: 34.42 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 97 (prompts 9600 to 9699). Time taken: 34.39 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 98 (prompts 9700 to 9799). Time taken: 34.38 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 99 (prompts 9800 to 9899). Time taken: 34.39 seconds.
Processed batch 100 (prompts 9900 to 9999). Time taken: 34.38 seconds.


## Load model without dp and run 10000 inferences

In [None]:
import json
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Directory where your model is saved
save_directory = "/content/without-dp-finetuned-model"

# Load the model with half precision if supported
model = AutoModelForCausalLM.from_pretrained(save_directory, torch_dtype=torch.float16)
tokenizer = AutoTokenizer.from_pretrained(save_directory)

# Set up the device: use CUDA if available, otherwise fallback to CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

model.to(device)
model.eval()


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Using device: cuda


LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(128256, 4096)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaSdpaAttention(
          (q_proj): lora.Linear(
            (base_layer): Linear(in_features=4096, out_features=4096, bias=False)
            (lora_dropout): ModuleDict(
              (default): Dropout(p=0.05, inplace=False)
            )
            (lora_A): ModuleDict(
              (default): Linear(in_features=4096, out_features=8, bias=False)
            )
            (lora_B): ModuleDict(
              (default): Linear(in_features=8, out_features=4096, bias=False)
            )
            (lora_embedding_A): ParameterDict()
            (lora_embedding_B): ParameterDict()
            (lora_magnitude_vector): ModuleDict()
          )
          (k_proj): lora.Linear(
            (base_layer): Linear(in_features=4096, out_features=1024, bias=False)
            (lora_dropout): ModuleDict(
              (defau

In [None]:
import time

output_file = "generated_sequences_no_dp.jsonl"
batch_size = 100
formatted_prompts = formatted_strings
# Process prompts in batches
num_prompts = len(formatted_prompts)
for batch_start in range(0, num_prompts, batch_size):
    batch_prompts = formatted_prompts[batch_start : batch_start + batch_size]

    batch_start_time = time.time()

    # Tokenize the entire batch
    inputs = tokenizer(batch_prompts, return_tensors="pt", padding=True, truncation=True, max_length=128, padding_side='left')
    inputs = {key: value.to(device) for key, value in inputs.items()}

    # Generate text for the batch
    with torch.no_grad():
        generated_ids = model.generate(**inputs, max_length=256, do_sample=True, top_k=50)

    # Decode the generated sequences for each prompt
    batch_generated_texts = [tokenizer.decode(ids, skip_special_tokens=True) for ids in generated_ids]

    batch_end_time = time.time()
    batch_time = batch_end_time - batch_start_time

    # Write the generated outputs in JSONL format
    with open(output_file, "a") as outfile:
        for text in batch_generated_texts:
            json_line = json.dumps({"generated_text": text})
            outfile.write(json_line + "\n")

    print(f"Processed batch {(batch_start // batch_size) + 1} (prompts {batch_start} to {batch_start+len(batch_prompts)-1}). Time taken: {batch_time:.2f} seconds.")


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 1 (prompts 0 to 99). Time taken: 34.62 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 2 (prompts 100 to 199). Time taken: 34.27 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 3 (prompts 200 to 299). Time taken: 34.53 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 4 (prompts 300 to 399). Time taken: 34.26 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 5 (prompts 400 to 499). Time taken: 34.36 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 6 (prompts 500 to 599). Time taken: 34.45 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 7 (prompts 600 to 699). Time taken: 34.39 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 8 (prompts 700 to 799). Time taken: 34.39 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 9 (prompts 800 to 899). Time taken: 34.43 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 10 (prompts 900 to 999). Time taken: 34.32 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 11 (prompts 1000 to 1099). Time taken: 34.41 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 12 (prompts 1100 to 1199). Time taken: 34.41 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 13 (prompts 1200 to 1299). Time taken: 34.39 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 14 (prompts 1300 to 1399). Time taken: 34.41 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 15 (prompts 1400 to 1499). Time taken: 34.37 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 16 (prompts 1500 to 1599). Time taken: 34.30 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 17 (prompts 1600 to 1699). Time taken: 34.21 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 18 (prompts 1700 to 1799). Time taken: 34.23 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 19 (prompts 1800 to 1899). Time taken: 34.40 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 20 (prompts 1900 to 1999). Time taken: 34.39 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 21 (prompts 2000 to 2099). Time taken: 34.40 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 22 (prompts 2100 to 2199). Time taken: 34.42 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 23 (prompts 2200 to 2299). Time taken: 34.38 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 24 (prompts 2300 to 2399). Time taken: 34.41 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 25 (prompts 2400 to 2499). Time taken: 34.38 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 26 (prompts 2500 to 2599). Time taken: 34.40 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 27 (prompts 2600 to 2699). Time taken: 34.43 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 28 (prompts 2700 to 2799). Time taken: 34.41 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 29 (prompts 2800 to 2899). Time taken: 34.41 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 30 (prompts 2900 to 2999). Time taken: 34.38 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 31 (prompts 3000 to 3099). Time taken: 34.38 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 32 (prompts 3100 to 3199). Time taken: 34.39 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 33 (prompts 3200 to 3299). Time taken: 34.39 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 34 (prompts 3300 to 3399). Time taken: 34.38 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 35 (prompts 3400 to 3499). Time taken: 34.38 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 36 (prompts 3500 to 3599). Time taken: 34.44 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 37 (prompts 3600 to 3699). Time taken: 34.36 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 38 (prompts 3700 to 3799). Time taken: 34.42 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 39 (prompts 3800 to 3899). Time taken: 34.43 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 40 (prompts 3900 to 3999). Time taken: 34.40 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 41 (prompts 4000 to 4099). Time taken: 34.37 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 42 (prompts 4100 to 4199). Time taken: 34.43 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 43 (prompts 4200 to 4299). Time taken: 34.33 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 44 (prompts 4300 to 4399). Time taken: 34.21 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 45 (prompts 4400 to 4499). Time taken: 34.38 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 46 (prompts 4500 to 4599). Time taken: 34.44 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 47 (prompts 4600 to 4699). Time taken: 34.38 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 48 (prompts 4700 to 4799). Time taken: 34.42 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 49 (prompts 4800 to 4899). Time taken: 34.45 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 50 (prompts 4900 to 4999). Time taken: 34.40 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 51 (prompts 5000 to 5099). Time taken: 34.38 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 52 (prompts 5100 to 5199). Time taken: 34.41 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 53 (prompts 5200 to 5299). Time taken: 34.43 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 54 (prompts 5300 to 5399). Time taken: 34.40 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 55 (prompts 5400 to 5499). Time taken: 34.42 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 56 (prompts 5500 to 5599). Time taken: 34.42 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 57 (prompts 5600 to 5699). Time taken: 34.35 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 58 (prompts 5700 to 5799). Time taken: 34.39 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 59 (prompts 5800 to 5899). Time taken: 34.34 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 60 (prompts 5900 to 5999). Time taken: 34.28 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 61 (prompts 6000 to 6099). Time taken: 34.38 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 62 (prompts 6100 to 6199). Time taken: 34.41 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 63 (prompts 6200 to 6299). Time taken: 34.41 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 64 (prompts 6300 to 6399). Time taken: 34.39 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 65 (prompts 6400 to 6499). Time taken: 34.40 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 66 (prompts 6500 to 6599). Time taken: 34.40 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 67 (prompts 6600 to 6699). Time taken: 34.35 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 68 (prompts 6700 to 6799). Time taken: 34.45 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 69 (prompts 6800 to 6899). Time taken: 34.41 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 70 (prompts 6900 to 6999). Time taken: 34.38 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 71 (prompts 7000 to 7099). Time taken: 34.39 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 72 (prompts 7100 to 7199). Time taken: 34.37 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 73 (prompts 7200 to 7299). Time taken: 34.38 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 74 (prompts 7300 to 7399). Time taken: 34.41 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 75 (prompts 7400 to 7499). Time taken: 34.39 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 76 (prompts 7500 to 7599). Time taken: 34.38 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 77 (prompts 7600 to 7699). Time taken: 34.38 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 78 (prompts 7700 to 7799). Time taken: 34.38 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 79 (prompts 7800 to 7899). Time taken: 34.43 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 80 (prompts 7900 to 7999). Time taken: 34.39 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 81 (prompts 8000 to 8099). Time taken: 34.37 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 82 (prompts 8100 to 8199). Time taken: 34.44 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 83 (prompts 8200 to 8299). Time taken: 34.43 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 84 (prompts 8300 to 8399). Time taken: 34.38 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 85 (prompts 8400 to 8499). Time taken: 34.38 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 86 (prompts 8500 to 8599). Time taken: 34.37 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 87 (prompts 8600 to 8699). Time taken: 34.41 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 88 (prompts 8700 to 8799). Time taken: 34.41 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 89 (prompts 8800 to 8899). Time taken: 34.39 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 90 (prompts 8900 to 8999). Time taken: 34.42 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 91 (prompts 9000 to 9099). Time taken: 34.33 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 92 (prompts 9100 to 9199). Time taken: 34.20 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 93 (prompts 9200 to 9299). Time taken: 34.24 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 94 (prompts 9300 to 9399). Time taken: 34.24 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 95 (prompts 9400 to 9499). Time taken: 34.17 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 96 (prompts 9500 to 9599). Time taken: 34.26 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 97 (prompts 9600 to 9699). Time taken: 34.22 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 98 (prompts 9700 to 9799). Time taken: 34.18 seconds.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Processed batch 99 (prompts 9800 to 9899). Time taken: 34.22 seconds.
Processed batch 100 (prompts 9900 to 9999). Time taken: 34.26 seconds.


## Shortening the Prodfuct titles for amazon dataset


In [None]:
import json

data = []
with open("data/train.jsonl", "r") as file:
    for line in file:
        # Each line is a JSON object; decode it with json.loads
        data.append(json.loads(line))

# Now 'data' is a list of dictionaries
print(f"Loaded {len(data)} records.")
print(data[0])  # print the first record to check its content


In [None]:
import json
import pandas as pd

# Read each line and convert it to a dictionary
data = [json.loads(line) for line in open("data/train.jsonl", "r")]
df = pd.DataFrame(data)

print("DataFrame shape:", df.shape)


In [None]:
str(df["Product Title"][1])

In [None]:
import time
import torch
import pandas as pd
from transformers import AutoModelForCausalLM, AutoTokenizer

# Assume df is already defined with a "Product Title" column.
# For example, if you previously created it from your JSONL files:
# df = pd.read_json("new-data/train.jsonl", lines=True)

# Define the multi-shot prompt with your examples.
prompt_prefix = (
'''Task: Summarize the following product title into a concise 2–5 word product name. Your answer must include only the essential words—no extra commentary, punctuation, or formatting.

Example 1:
Product Title: VUIIMEEK Square Case for iPhone 12 Pro Max 6.7", Cute White Flowers Clear Print Design Slim Flexible Soft TPU High Impact Shockproof Case Reinforced Bumper Cool Protective Crystal Cover (Green Leaves)
Minimal Product Name: iphone 12 pro max square case

Example 2:
Product Title: Fitian Fitbit Ionic Charging Cable, Replacement USB Charging Cord Cable Accessories Charger Cable Adapter for Fitbit Ionic Wristband Smart Watch (2 Pcs Fitbit Ionic Charger Cable)
Minimal Product Name: Fitian Fitbit Ionic Charging Cable

Now, produce the minimal product name for the following product title:
Product Title: '''

)

# Build a list of prompts using the product titles from the DataFrame.
formatted_prompts = [prompt_prefix + f"\"{title}\"" for title in df["Product Title"] + "Minimal Product name is:"]

# Directory where your finetuned 8B Llama model is saved.
save_directory = "meta-llama/Llama-3.1-8B"  # Update this to your model's directory

# Load the model and tokenizer with half precision if supported.
model = AutoModelForCausalLM.from_pretrained(save_directory, torch_dtype=torch.float16)
tokenizer = AutoTokenizer.from_pretrained(save_directory)

# Set up the device (CUDA if available, otherwise CPU).
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
model.to(device)
model.eval()



In [None]:

# Process prompts in batches of 200.
batch_size = 20
num_prompts = 100
# num_prompts = len(formatted_prompts)
shortened_titles = []

tokenizer.pad_token = tokenizer.eos_token

for batch_start in range(0, num_prompts, batch_size):
    batch_prompts = formatted_prompts[batch_start : batch_start + batch_size]
    batch_start_time = time.time()


    # Tokenize the batch of prompts.
    inputs = tokenizer(
        batch_prompts,
        return_tensors="pt",
        padding=True,
        truncation=True,
        max_length=512,  # Adjust if necessary.
        # padding_side='left'
    )
    inputs = {key: value.to(device) for key, value in inputs.items()}

    # Generate outputs for the batch.
    with torch.no_grad():
        generated_ids = model.generate(
            **inputs,
            # max_length=128,   # Adjust based on expected output length.
            # do_sample=True,   # Enable sampling.
            # top_k=50          # Adjust sampling parameters if needed.
        )

    # Decode outputs.
    batch_generated_texts = [
        tokenizer.decode(ids, skip_special_tokens=True) for ids in generated_ids
    ]

    batch_end_time = time.time()
    print(f"Processed batch {(batch_start // batch_size) + 1} (prompts {batch_start} to {batch_start + len(batch_prompts) - 1}) in {batch_end_time - batch_start_time:.2f} seconds.")

    # Remove the prompt text from the generated output.
    for prompt, full_output in zip(batch_prompts, batch_generated_texts):
        shortened = full_output[len(prompt):].strip()
        shortened_titles.append(shortened)

# # Add the shortened titles as a new column to your DataFrame.
# df["Shortened Product Title"] = shortened_titles

# # Print the final DataFrame with the original and shortened product titles.
# print(df[["Product Title", "Shortened Product Title"]])

In [None]:
shortened_titles

In [None]:
formatted_prompts[:100]

In [None]:
!pip install --upgrade google-genai
!gcloud auth application-default login

In [None]:
!gcloud auth application-default set-quota-project