# **Introduction**
## *This notebook demonstrates the generation of Hindi news articles using a fine-tuned language model. The model has been fine-tuned on a dataset of Hindi news headlines and articles. I'll go through the data preparation, model loading, and inference steps in this notebook.*

In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/bbc_hindi_articles_with_categories_cleaned.csv


# Model Selection and setting up

## I will be using unloth to get llama 3 for this project. Unsloth also provides easier ways to use Lora adapters which makes fine-tuning really effcient.

In [2]:
%%capture
!mamba install --force-reinstall aiohttp -y
!pip install -U "xformers<0.0.26" --index-url https://download.pytorch.org/whl/cu121
!pip install "unsloth[kaggle-new] @ git+https://github.com/unslothai/unsloth.git"

!pip install datasets==2.16.0 fsspec==2023.10.0 gcsfs==2023.10.0

import os
os.environ["WANDB_DISABLED"] = "true"

In [3]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 
dtype = None 
load_in_4bit = True # Enables 4bit quantization to reduce memory usage


model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/llama-3-8b-bnb-4bit", 
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
==((====))==  Unsloth 2024.8: Fast Llama patching. Transformers = 4.44.0.
   \\   /|    GPU: Tesla T4. Max memory: 14.741 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.2.2+cu121. CUDA = 7.5. CUDA Toolkit = 12.1.
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.25.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.70G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/198 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/50.6k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

In [4]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, 
    bias = "none",    
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
    use_rslora = False,  
    loftq_config = None, 
)

Unsloth 2024.8 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


# **Data Preparation**
## *Here, we'll load the necessary data, which includes Hindi news articles and their corresponding headlines.*
### The data is then converted into a formatted prompt, which will be used as the training data for fine-tuning the Llama model

In [5]:
def formatting_prompts_func(example):
    output_texts = []
    for i in range(len(example['Headline'])):
        text = f"### Headline: {example['Headline'][i]}\n ### Category: {example['Category'][i]}  ### Article: {example['Content'][i]}"
        output_texts.append(text)
    return {"text": output_texts}

from datasets import load_dataset
dataset = load_dataset('csv',data_files="/kaggle/input/bbc_hindi_articles_with_categories_cleaned.csv", split = "train")
dataset = dataset.map(formatting_prompts_func, batched = True,)

Generating train split: 0 examples [00:00, ? examples/s]

  return pd.read_csv(xopen(filepath_or_buffer, "rb", download_config=download_config), **kwargs)


Map:   0%|          | 0/4796 [00:00<?, ? examples/s]

In [6]:
# Loading the pre-trained language model for inference
from trl import SFTTrainer
from transformers import TrainingArguments


# Set the max steps parameter to fix the number of steps, otherwise it will train for one full epoch
# Set Packing to True if you want to work on shorter generations like getting the headline from the article
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, 
    args = TrainingArguments(
        num_train_epochs=1,
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
#         max_steps = None,
        learning_rate = 2e-4,
        fp16 = not torch.cuda.is_bf16_supported(),
        bf16 = torch.cuda.is_bf16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none",
    ),
)

Map (num_proc=2):   0%|          | 0/4796 [00:00<?, ? examples/s]

In [7]:
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = Tesla T4. Max memory = 14.741 GB.
5.613 GB of memory reserved.


In [8]:
trainer_stats = trainer.train()


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 4,796 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 599
 "-____-"     Number of trainable parameters = 41,943,040


Step,Training Loss
1,1.4404
2,1.3627
3,1.3448
4,1.2788
5,1.4095
6,1.2836
7,1.3559
8,1.209
9,1.1513
10,1.2131


In [9]:
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory         /max_memory*100, 3)
lora_percentage = round(used_memory_for_lora/max_memory*100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.")
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

24919.8977 seconds used for training.
415.33 minutes used for training.
Peak reserved memory = 9.641 GB.
Peak reserved memory for training = 4.028 GB.
Peak reserved memory % of max memory = 65.403 %.
Peak reserved memory for training % of max memory = 27.325 %.


# Some inferences
In this section, I'll input a headline and generate a corresponding news article. The model uses the input headline to generate coherent and contextually relevant article.

In [10]:
FastLanguageModel.for_inference(model)
inputs = tokenizer(
[
    "### Headline: भारतीय शेयर बाजार में तेजी\n ### Category: भारत  ### Article: "
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 128, use_cache = True)
tokenizer.batch_decode(outputs)

['<|begin_of_text|>### Headline: भारतीय शेयर बाजार में तेजी\n ### Category: भारत  ### Article:  भारतीय शेयर बाजार में आज तेजी देखने को मिली है. बॉम्बे स्टॉक एक्सचेंज का सेंसेक्स 1.9 प्रतिशत चढ़ गया है. वहीं नेशनल स्टॉक एक्सचेंज का निफ़्टी भी 1.8 प्रतिशत की बढ़त के साथ बंद हुआ है. बीते सप्ताह भारतीय शेयर बाजार में गिरावट देखने को मिली']

In [11]:
FastLanguageModel.for_inference(model)
inputs = tokenizer(
[
    "### Headline: पीएम मोदी अफ्रीका दौरे पर गए\n ### Category: भारत  ### Article: "
], return_tensors = "pt").to("cuda")

outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    use_cache=True,
    do_sample=True,       
    top_p=0.9,            
    temperature=0.8,      
)
# Change the generate parameters to tweak the output to your specific needs
tokenizer.batch_decode(outputs)

['<|begin_of_text|>### Headline: पीएम मोदी अफ्रीका दौरे पर गए\n ### Category: भारत  ### Article:  प्रधानमंत्री नरेंद्र मोदी 12 जून से अफ्रीका दौरे पर गए हैं. वो दक्षिण अफ्रीका के पूर्वी शहर प्रिटोरिया से शुरू हुआ उनका दौरा चार देशों का है. पीएम मोदी इस दौरे में दक्षिण अफ्रीका, जांजबरी, गाबोन और रुवांडा का दौरा करेंगे. अफ्रीका दौरे के दौरान प्रधानमंत्री नरेंद्र मोदी के साथ विदेश मंत्री एस जयशंकर और वाणिज्य मंत्री पीयूष गोयल भी जाएंगे. यह पीएम मोदी का अफ्रीका दौरा 2015 के बाद पहली बार है. पीएम मोदी ने अपने दौरे के दौरान अफ्रीका की मूल भाषा जांजबरी में ट्वीट करते हुए लिखा है कि पूर']

## Conclusion
This notebook showcased the generation of Hindi news articles using a fine-tuned language model. The model was able to generate contextually accurate articles based on the provided headlines. Further improvements can be made by fine-tuning on a larger and more diverse dataset.

This same dataset can be used for many other tasks, some of them I am listing below:-
* Generating a headline for an article
* Classification of an article into different categories
* Classification of an article headline into different categories