## **✅ Import Dependencies**

In [25]:
import pandas as pd, os, json
from datasets import Dataset, DatasetDict
from huggingface_hub import HfApi, HfFolder, login
from sklearn.model_selection import train_test_split

## **✅ Step 1: Load your dataset**

In [26]:
# data = pd.read_csv(os.path.join("..", "..", "FAQS", "BankFAQS.csv"))  # Replace with the actual filename

data = pd.read_csv("/content/FAQs/BankFAQs.csv")
data.drop(columns=["Class"], inplace=True)

# Step 2: Drop rows with missing values in 'Question' or 'Answer'
data = data.dropna(subset=["Question", "Answer"])

data.head()

Unnamed: 0,Question,Answer
0,Do I need to enter ‘#’ after keying in my Card...,Please listen to the recorded message and foll...
1,What details are required when I want to perfo...,"To perform a secure IVR transaction, you will ..."
2,How should I get the IVR Password if I hold a...,An IVR password can be requested only from the...
3,How do I register my Mobile number for IVR Pas...,Please call our Customer Service Centre and en...
4,How can I obtain an IVR Password,By Sending SMS request: Send an SMS 'PWD<space...


## **✅ Step 2: Split the dataset into train and test sets (85%/15%)**

In [27]:
train_df, test_df = train_test_split(data, test_size=0.15, random_state=42)

In [28]:
print(f"✅ Train: {len(train_df)}, Test: {len(test_df)}")

✅ Train: 1499, Test: 265


## **✅ Step 3: Convert pandas DataFrames to Hugging Face Datasets**

In [29]:
train_dataset = Dataset.from_pandas(train_df)
test_dataset = Dataset.from_pandas(test_df)

## **✅ Step 4: Create a DatasetDict to hold train and test splits**

In [30]:
faq_dict = DatasetDict({
    "train": train_dataset,
    "test": test_dataset
})

## **✅ Step 5: Log in to Hugging Face Hub (ensure you have a token)**

In [31]:
login(token=HfFolder.get_token())

## **✅ Step 6: Push the dataset to Hugging Face Hub**

In [32]:
faq_dict.push_to_hub("Muhammad-Umer-Khan/FAQ_Dataset")

Pushing dataset shards to the dataset hub:   0%|          | 0/1 [00:00<?, ?it/s]

Pushing dataset shards to the dataset hub:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading metadata:   0%|          | 0.00/617 [00:00<?, ?B/s]