## **✅ Import Dependencies**

In [6]:
import pandas as pd, os, json
from datasets import Dataset, DatasetDict
from huggingface_hub import HfApi, HfFolder, login

## **✅ Step 1: Log in to Hugging Face Hub**

In [7]:
login(token=HfFolder.get_token())

## **✅ Step 2: Load the dataset from Hugging Face Hub**

In [8]:
# Login using e.g. `huggingface-cli login` to access this dataset
splits = {
          'train': 'data/train-00000-of-00001-eebf5ec5fd44849c.parquet',
          'test': 'data/test-00000-of-00001-d89e2792197b2513.parquet'
        }
train_df = pd.read_parquet("hf://datasets/Muhammad-Umer-Khan/FAQ_Dataset/" + splits["train"])
test_df = pd.read_parquet("hf://datasets/Muhammad-Umer-Khan/FAQ_Dataset/" + splits["test"])

In [9]:
print(f"✅ Train: {len(train_df)}, Test: {len(test_df)}")

✅ Train: 1499, Test: 265


## **✅ Step 3: Define a function to format each row for LLama prompts**

In [10]:
def format_to_llama_chat(example):
    return {
        "messages": [
            {"role": "user", "content": example["Question"].strip()},
            {"role": "assistant", "content": example["Answer"].strip()}
        ]
    }

## **✅ Step 4: Apply formatting function to both train and test DataFrames**

In [11]:
train_formatted = train_df.apply(format_to_llama_chat, axis=1).tolist()
test_formatted = test_df.apply(format_to_llama_chat, axis=1).tolist()

## **✅ Step 5: Convert formatted lists to Hugging Face Datasets**

In [12]:
train_dataset = Dataset.from_list(train_formatted)
test_dataset = Dataset.from_list(test_formatted)

## **✅ Step 6: Create a DatasetDict to hold train and test splits**

In [13]:
dataset_dict = DatasetDict({
    "train": train_dataset,
    "test": test_dataset
})

## **✅ Step 7: Preview the first row from each split**

In [14]:
print("Train dataset first row:", train_dataset[0])
print("Test dataset first row:", test_dataset[0])

Train dataset first row: {'messages': [{'content': 'Can I take a loan under HDFC Life Uday in case I need money during any emergencies', 'role': 'user'}, {'content': 'Yes. You can take a policy loan under this policy provided that your policy has acquired a surrender value and subject to terms and conditions that the company may specify from time to time.', 'role': 'assistant'}]}
Test dataset first row: {'messages': [{'content': 'Can I reinstate the policy if it is lapsed', 'role': 'user'}, {'content': 'If your policy is lapsed, you may request HDFC Life in writing to revive your policy within 2 consecutive years from the date of first unpaid premium. The following conditions will apply in case of revival of the policy: All pending premium should be immediately paid along with any interest that is advised by HDFC Life. The current interest rate used for revival is 10.5% p.a. Any agreement to revive or reinstate would be subject to satisfactory evidence of good health Reinstatement requ

## **✅ Step 8: Push the formatted dataset to Hugging Face Hub**

In [15]:
dataset_dict.push_to_hub("Muhammad-Umer-Khan/FAQs-Meta-Llama-3-8B-Instruct")

Pushing dataset shards to the dataset hub:   0%|          | 0/1 [00:00<?, ?it/s]

Creating parquet from Arrow format:   0%|          | 0/2 [00:00<?, ?ba/s]

Pushing dataset shards to the dataset hub:   0%|          | 0/1 [00:00<?, ?it/s]

Creating parquet from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]

Downloading metadata: 0.00B [00:00, ?B/s]

Repo card metadata block was not found. Setting CardData to empty.


- **Check Out Dataset Here: [Click Here](https://huggingface.co/datasets/Muhammad-Umer-Khan/FAQs-Meta-Llama-3-8B-Instruct)**