# FAQ ChatBot model prototype notebook

## 1. Project Overview

The aim of this notebook is to use asymmetric semantic search to retrieve the top 3 closest answers from the ReactJS FAQ dataset based on a user's question input. Questions and answers are mapped to the same vector space using a pre-trained SBERT model. Given a new question, the closest 3 answers are retrieved using similarity search. The dataset will be split, with 90% used as reference corpus and 10% of questions reserved to evaluate how well the system retrieves relevant answers.

## 2. Dataset

The ReactJS FAQ dataset will be used, which consists approximately 26,300 question-answer pairs in a JSON format.

In [25]:
import pandas as pd
import numpy as np

# Import display module from IPython to render pandas DataFrames in VS Code notebooks
# from IPython.display import display # uncomment if needed to display multiple objects in one cell

In [26]:
# Load dataset from kaggle

import kagglehub
from kagglehub import KaggleDatasetAdapter

# JSON file inside the ReactJS FAQ dataset folder
file_path = "React_dataset.json"

# Load the latest version
df = kagglehub.dataset_load(
  KaggleDatasetAdapter.PANDAS,
  "savanidhruv/reactjs-faq-dataset",
  file_path,
)

print("First 5 records:")
df.head()

First 5 records:


Unnamed: 0,question,answer,detailedanswer,detailed_answer,argument
0,What is React?,"React is an open-source JavaScript library for building user interfaces, primarily for single-page applications, using a component-based architecture.",,,
1,What is a Date Picker in React?,A Date Picker in React is a UI component that allows users to select a date or date range from a calendar pop-up or dropdown.,,,
2,what are feature of React?,"1) Component-Based Architecture, 2) Virtual DOM, 3) Declarative UI, 4) JSX (JavaScript XML), 5) Unidirectional Data Flow, 6) Hooks API, 7) Ecosystem Support & Routing, 8) Fast Rendering & Performance, 9)Developer Tools, 10)Community and Open Source",,,
3,Which libraries are commonly used for date pickers in React?,"Popular libraries include react-datepicker, react-date-range, react-dates (Airbnb), MUI DatePicker, and Ant Design DatePicker.",,,
4,How to install react-datepicker?,Run `npm install react-datepicker` and also install `date-fns` if needed.,,,


## 3. Preprocessing



In [28]:
# Remove other columns from the dataframe
df = df[["question", "answer"]]

# Drop rows where either 'question' or 'answer' is null
df = df.dropna(subset=["question", "answer"]).reset_index(drop=True)

# Display the full answers
pd.set_option("display.max_colwidth", None)

# Shuffle dataset (so that similar Q and A pairs are not grouped)
df = df.sample(frac=1, random_state=42).reset_index(drop=True)

# Show the first 10 rows of the table
print("First 5 Q and A pairs:")
df.head()

First 5 Q and A pairs:


Unnamed: 0,question,answer
0,What is the difference between Fluent UI React v8 and v9?,"Fluent UI v8 uses the older Fabric architecture, and styling is handled with Sass/SCSS or inline styles. Fluent UI v9 is a complete redesign that uses modern patterns like hooks, CSS-in-JS, modular tokens, and Fluent Design System. v9 is more performance-focused, accessible, and ready for future Microsoft experiences. For new projects, v9 is highly recommended."
1,How do you use the Geolocation plugin in a React app?,"Leveraging the Geolocation plugin in a React application allows you to retrieve the user's device location, specifically their latitude, longitude, and accuracy. The primary way to integrate this is by importing `Geolocation` from `@capacitor/geolocation`.\n\nTypically, you'll use React's `useState` hook to manage the retrieved position data (e.g., `latitude`, `longitude`, `accuracy`) and any error messages that might occur. The `useEffect` hook is an ideal place to initiate the location retrieval process when your component mounts. Inside `useEffect`, you'll define an asynchronous function, say `getCurrentPosition`, which will call `await Geolocation.getCurrentPosition()`. This method attempts to get the device's current location. It resolves with a `coordinates` object containing the `latitude`, `longitude`, and `accuracy` (in meters).\n\nUpon successful retrieval, you update your `position` state with these values. It's vital to wrap this call in a `try-catch` block to gracefully handle potential errors, such as location services being disabled, permissions not being granted by the user, or the device failing to acquire a fix. In the `catch` block, you can update an `error` state to display a user-friendly message, prompting them to check their device's location settings and app permissions.\n\nFor scenarios where you need continuous location updates (e.g., for navigation or tracking), the Geolocation plugin also offers `Geolocation.watchPosition()`. This method takes a callback function that is invoked every time the device's position changes. It returns a `watchId` that you can then use with `Geolocation.clearWatch({ id: watchId })` in the `useEffect` cleanup function to stop watching for updates when the component unmounts, preventing memory leaks.\n\nIn your component's render method, you would conditionally display the fetched latitude, longitude, and accuracy if `position` is available, or a loading message if it's still being fetched. Always remind users that location services must be enabled and permissions granted in their native app settings for the feature to function."
2,How do you use Radix UI with React Query?,"Use React Query to fetch data that drives dynamic Radix UI components (like Dropdowns or Tabs):\n\nconst { data } = useQuery(['menuItems'], fetchMenuItems);\n\n<DropdownMenu.Root>\n {data.map(item => (\n <DropdownMenu.Item key={item.id}>{item.name}</DropdownMenu.Item>\n ))}\n</DropdownMenu.Root>"
3,How do you resolve User is not authorized to perform error?,"This authorization error means the AWS IAM user or role configured for your Amplify project (or the authenticated user in your app) lacks the necessary permissions to perform a specific action on an AWS resource. To resolve: 1. **Check IAM Policy:** Go to the AWS IAM console and inspect the policy attached to the user or role being used by Amplify (or your authenticated user's Cognito Identity Pool roles). 2. **Add Permissions:** Add the specific permissions required for the failing action (e.g., `s3:PutObject`, `dynamodb:CreateTable`, `lambda:InvokeFunction`). 3. **Amplify CLI:** If it's a backend operation, ensure your Amplify CLI user has admin privileges. 4. **Cognito Identity Pool Roles:** For authenticated users, verify the IAM roles associated with your Cognito Identity Pool (authenticated and unauthenticated roles) have the correct permissions."
4,How do you test React apps with Tailwind CSS using Cypress?,"Tailwind CSS doesn’t impact Cypress testing directly since it generates utility-based classes. However, avoid relying on class names like `bg-blue-500` for selectors—use `data-testid` or `data-cy` instead. You can use `cy.get('[data-cy=button]')` to assert behavior, visibility, or layout. Tailwind's dynamic styles can be visually validated using `cy.screenshot()` or visual diffing tools integrated with Cypress."


Next we will separate the data into a 90/10 split: a training corpus and a testing set. Since we are using a pre-trained model, we are not training the model itself. Instead, we use the training corpus to generate embeddings, and then test accuracy by checking whether the model retrieves the correct answers for unseen questions in the test set.

In [36]:
# Calculate the size of the training corpus (90% of the dataset)
train_size = int(len(df) * 0.9)

# Separate dataset into training corpus and testing set
train_df = df.iloc[:train_size].reset_index(drop=True)
test_df  = df.iloc[train_size:].reset_index(drop=True)

# Separate questions and answers
train_questions = train_df["question"].tolist()
train_answers   = train_df["answer"].tolist()

test_questions = test_df["question"].tolist()
test_answers   = test_df["answer"].tolist()

print(f"Train size: {len(train_questions)}, Test size: {len(test_questions)}")


Train size: 23653, Test size: 2629


## 4. Embedding Generation

In [4]:
import torch

# from sentence_transformers import SentanceTransformer

ModuleNotFoundError: No module named 'torch'

## 5. Similarity Measure

## 6. Testing

## 7. Evaluation