# **Project Name**    - IndustryGPT: Specialized LLM Bot Using Pre-Trained Models


##### **Project Type**    - Deep Learning for NLP
##### **Contribution**    - Individual
##### **Team Member 1 -** Rajesh Kumar Patel

## **GitHub Link -** https://github.com/Rajesh1505/Capstone_Project_Deep_Learning_for_NLP.git

# Project Title:
**IndustryGPT: Specialized LLM Bot Using Pre-Trained Models**

## Overview

In this capstone project, We will embark on an exciting journey to create an Industry-Specific Large Language Model (LLM) Bot using state-of-the-art pre-trained models from sources like Hugging Face. The primary objective is to build an intelligent bot that can effectively engage with users by answering questions and providing insights specific to a chosen industry. This project will not only enhance our technical skills but also provide a deep understanding of the chosen industry's nuances, challenges, and trends.

## Project Background

With the rapid advancement of artificial intelligence and machine learning, Large Language Models (LLMs) have become pivotal in transforming various industries by automating and enhancing communication. These models, especially those leveraging architectures like GPT-3 and beyond, are capable of understanding and generating human-like text, making them ideal for creating intelligent conversational agents.

This capstone project focuses on harnessing the power of pre-trained LLMs from platforms like Hugging Face to develop industry-specific bots. Students are given the freedom to choose from a diverse array of industries, each presenting unique challenges and opportunities for applying LLM technology. The selected industry will guide the data collection process, ensuring that the bot is trained on relevant and specific information to enhance its contextual understanding and response accuracy.

By engaging in this project, we will not only learn to fine-tune pre-trained models but also gain hands-on experience in handling real-world data. The project is designed to be manageable even for those with limited computational resources. The primary objective is to build a bot that can think and engage with users effectively, providing coherent and contextually appropriate answers.

This project serves as a foundation for the Industry Immersion module, where students will extend their work into a comprehensive research paper, delving deeper into the chosen industry's specific applications and implications of LLM technology. This holistic approach ensures that students are well-prepared to tackle real-world challenges using cutting-edge AI technologies.




---


## Project Goal

The primary goal of this capstone project is to develop an industry-specific Large Language Model (LLM) Bot using pre-trained models from platforms such as Hugging Face. Students will be tasked with selecting one industry from a provided list, gathering relevant data, fine-tuning a pre-trained LLM, and demonstrating the bot's capability to engage users effectively by providing accurate and contextually appropriate responses.

In [None]:
# Install necessary libraries
%pip install huggingface_hub
%pip install -q accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 transformers==4.31.0 trl==0.8.3


Note: you may need to restart the kernel to use updated packages.


In [None]:
# Import required libraries
import os
import torch
from datasets import load_dataset, DatasetDict, concatenate_datasets
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    pipeline,
    logging,
)
from peft import LoraConfig, PeftModel
from trl import SFTTrainer

In [None]:
# Clear GPU memory
torch.cuda.empty_cache()

In [None]:
# Load datasets
dataset1 = load_dataset("Rajesh1505/finance-alpaca-1k-test")
dataset2 = load_dataset("Rajesh1505/alpaca_finance_en")

README.md:   0%|          | 0.00/381 [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/665k [00:00<?, ?B/s]

Generating test split:   0%|          | 0/1000 [00:00<?, ? examples/s]

README.md:   0%|          | 0.00/384 [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/23.7M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/68912 [00:00<?, ? examples/s]

In [None]:
# Preprocess datasets
def combine_text_columns(example):
    return {'text': f"{example['instruction']} ### {example['output']}"}

dataset1 = dataset1.map(combine_text_columns)
dataset2 = dataset2.map(combine_text_columns)

# Remove unused columns
dataset1['test'] = dataset1['test'].remove_columns(['instruction', 'input', 'output'])
dataset2['train'] = dataset2['train'].remove_columns(['instruction', 'input', 'output', 'id'])

# Split datasets
split_dataset1 = dataset1['test'].train_test_split(train_size=0.8)
split_dataset2 = dataset2['train'].train_test_split(test_size=0.2)

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

Map:   0%|          | 0/68912 [00:00<?, ? examples/s]

In [None]:
# Merge datasets
merged_train = concatenate_datasets([split_dataset1['train'], split_dataset2['train']])
merged_test = concatenate_datasets([split_dataset1['test'], split_dataset2['test']])

merged_dataset = DatasetDict({'train': merged_train, 'test': merged_test})


merged_train_dataset = merged_dataset['train'].shuffle(seed=42).select(range(5000))
merged_test_dataset = merged_dataset['test'].shuffle(seed=42).select(range(100))

In [None]:
# Convert to Llama format
def transform_conversation(example):
    segments = example['text'].split('###')
    reformatted_segments = []
    for i in range(0, len(segments) - 1, 2):
        prompt = segments[i].strip()
        answer = segments[i + 1].strip() if i + 1 < len(segments) else ""
        reformatted_segments.append(f'<s>[INST] {prompt} [/INST] {answer} </s>')
    return {'text': ''.join(reformatted_segments)}

transformed_train_dataset = merged_train_dataset.map(transform_conversation)
transformed_test_dataset = merged_test_dataset.map(transform_conversation)

Map:   0%|          | 0/5000 [00:00<?, ? examples/s]

Map:   0%|          | 0/100 [00:00<?, ? examples/s]

In [None]:
# Llama-2 model
model_name = "NousResearch/Llama-2-7b-chat-hf"
new_model = "llama_2_7b_finance_finetune_model"

# Configure Quantization (4-bit)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=False,
)

In [None]:
# Load model with CPU Offloading
device_map = "auto"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map=device_map,
    offload_folder="/kaggle/working"  # Offload to disk to prevent OOM
)
model.config.use_cache = False
model.config.pretraining_tp = 1



config.json:   0%|          | 0.00/583 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/200 [00:00<?, ?B/s]

In [None]:
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

# LoRA Configuration (Memory Efficient Fine-Tuning)
peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=64,
    bias="none",
    task_type="CAUSAL_LM",
)

tokenizer_config.json:   0%|          | 0.00/746 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/21.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/435 [00:00<?, ?B/s]

In [None]:
# Training Arguments (Optimized for Low VRAM)
training_arguments = TrainingArguments(
    output_dir="./results",
    num_train_epochs=1,
    per_device_train_batch_size=1,
    per_device_eval_batch_size=1,
    gradient_accumulation_steps=8,
    optim="paged_adamw_32bit",
    save_steps=0,
    logging_steps=25,
    learning_rate=2e-4,
    weight_decay=0.001,
    fp16=False,
    bf16=False,
    max_grad_norm=0.3,
    max_steps=-1,
    warmup_ratio=0.03,
    group_by_length=True,
    lr_scheduler_type="cosine",
    report_to="tensorboard",
)

In [None]:
# Fine-Tuning Trainer
trainer = SFTTrainer(
    model=model,
    train_dataset=transformed_train_dataset,
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=350,
    tokenizer=tokenizer,
    args=training_arguments,
    packing=False,
)

# Train Model
trainer.train()

Map:   0%|          | 0/5000 [00:00<?, ? examples/s]

You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss
25,2.3879
50,1.8867
75,1.8056
100,1.5309
125,1.7972
150,1.5081
175,1.7386
200,1.4429
225,1.6719
250,1.4448


TrainOutput(global_step=625, training_loss=1.6266033935546875, metrics={'train_runtime': 4265.6466, 'train_samples_per_second': 1.172, 'train_steps_per_second': 0.147, 'total_flos': 1.258110305550336e+16, 'train_loss': 1.6266033935546875, 'epoch': 1.0})

In [None]:
# Save Model
trainer.model.save_pretrained(new_model)
tokenizer.save_pretrained(new_model)

print("Model saved successfully!")

Model saved successfully!


In [None]:
# Test Model with Text Generation
logging.set_verbosity(logging.CRITICAL)
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)

# prompt = "Analyze potential financial risks based on current market conditions."
prompt = "Summary of investment portfolio optimization."
result = pipe(f"<s>[INST] {prompt} [/INST]")
print(result[0]['generated_text'])



<s>[INST] Summary of investment portfolio optimization. [/INST] Investment portfolio optimization is the process of selecting the optimal mix of assets to achieve a desired investment outcome. It involves analyzing the risk and return of different assets, and selecting the optimal mix of assets to achieve the desired investment outcome. The optimal mix of assets is determined by the investor's risk tolerance, investment horizon, and investment goals. The optimal mix of assets is also influenced by the investor's tax situation, investment costs, and other factors. The optimal mix of assets is typically a combination of low-risk assets, such as bonds, and high-risk assets, such as stocks. The optimal mix of assets is also influenced by the investor's investment horizon, which is the time frame in which the investor plans to achieve their investment goals. The optimal mix of assets is also influenced by the investor


In [None]:
# Push Model to Hugging Face Hub
from huggingface_hub import login
from kaggle_secrets import UserSecretsClient

# Retrieve Hugging Face token
user_secrets = UserSecretsClient()
hf_token = user_secrets.get_secret("hf_api_key")

# Login to Hugging Face
login(token=hf_token)

# Push model and tokenizer
model_repo_name = "Rajesh1505/llama_2_7b_finance_finetune_model"
model.push_to_hub(model_repo_name)
tokenizer.push_to_hub(model_repo_name)

# Free GPU Memory
torch.cuda.empty_cache()
print("llama_2_7b_finance_finetune_model pushed to Hugging Face Hub.")

llama_2_7b_finance_finetune_model pushed to Hugging Face Hub.


# Deployment

In [None]:
!pip -q install gradio

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m57.8/57.8 MB[0m [31m29.2 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m321.9/321.9 kB[0m [31m20.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m94.8/94.8 kB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m464.1/464.1 kB[0m [31m24.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.4/12.4 MB[0m [31m104.5 MB/s[0m eta [36m0:00:00[0m00:01[0m0:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m71.5/71.5 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.3/62.3 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behavi

In [None]:
import gradio as gr
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

# Load the fine-tuned model from Hugging Face
model_name = "Rajesh1505/llama_2_7b_finance_finetune_model"
device = "cuda" if torch.cuda.is_available() else "cpu"

# # Load tokenizer and model
tokenizer_ = AutoTokenizer.from_pretrained(model_name)
model_ = AutoModelForCausalLM.from_pretrained(model_name).to(device)

# Load pipeline for text generation
pipe = pipeline("text-generation", model=model_, tokenizer=tokenizer_, max_length=200)

def chatbot_response(prompt):
    formatted_prompt = f"<s>[INST] {prompt} [/INST]"
    response = pipe(formatted_prompt)[0]['generated_text']
    return response.split("[/INST]")[-1].strip()

# Gradio UI
app_gr = gr.Interface(
    fn=chatbot_response,
    inputs=gr.Textbox(lines=3, placeholder="Ask a finance-related question..."),
    outputs=gr.Textbox(label="Finance Chatbot Response"),
    title="Finance LLM Chatbot",
    description="Ask finance-related questions and get AI-powered responses!"
)


app_gr.launch(debug=True)


* Running on local URL:  http://127.0.0.1:7860
Kaggle notebooks require sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

* Running on public URL: https://1d58820c1fb2e40130.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://1d58820c1fb2e40130.gradio.live


