# FAQ ChatBot model prototype notebook

## 1. Project Overview

The aim of this notebook is to use asymmetric semantic search to retrieve the top 3 closest answers from the ReactJS FAQ dataset based on a user's question input. Questions and answers are mapped to the same vector space using a pre-trained SBERT model. Given a new question, the closest 3 answers are retrieved using similarity search. The dataset will be split, with 90% used as reference corpus and 10% of questions reserved to evaluate how well the system retrieves relevant answers.

https://www.sbert.net/examples/sentence_transformer/applications/semantic-search/README.html

## 2. Dataset

The ReactJS FAQ dataset will be used, which consists approximately 26,300 question-answer pairs in a JSON format.

https://www.kaggle.com/datasets/savanidhruv/reactjs-faq-dataset

In [13]:
import pandas as pd

# Import display module from IPython to render pandas DataFrames in VS Code notebooks
# from IPython.display import display # uncomment if needed to display multiple objects in one cell

In [14]:
# Load dataset from kaggle

import kagglehub
from kagglehub import KaggleDatasetAdapter

# JSON file inside the ReactJS FAQ dataset folder
file_path = "React_dataset.json"

# Load the latest version
df = kagglehub.dataset_load(
  KaggleDatasetAdapter.PANDAS,
  "savanidhruv/reactjs-faq-dataset",
  file_path,
)

print("First 5 records:")
df.head()

First 5 records:


Unnamed: 0,question,answer,detailedanswer,detailed_answer,argument
0,What is React?,"React is an open-source JavaScript library for building user interfaces, primarily for single-page applications, using a component-based architecture.",,,
1,What is a Date Picker in React?,A Date Picker in React is a UI component that allows users to select a date or date range from a calendar pop-up or dropdown.,,,
2,what are feature of React?,"1) Component-Based Architecture, 2) Virtual DOM, 3) Declarative UI, 4) JSX (JavaScript XML), 5) Unidirectional Data Flow, 6) Hooks API, 7) Ecosystem Support & Routing, 8) Fast Rendering & Performance, 9)Developer Tools, 10)Community and Open Source",,,
3,Which libraries are commonly used for date pickers in React?,"Popular libraries include react-datepicker, react-date-range, react-dates (Airbnb), MUI DatePicker, and Ant Design DatePicker.",,,
4,How to install react-datepicker?,Run `npm install react-datepicker` and also install `date-fns` if needed.,,,


## 3. Preprocessing



In [15]:
# Remove other columns from the dataframe
df = df[["question", "answer"]]

# Drop rows where either 'question' or 'answer' is null
df = df.dropna(subset=["question", "answer"]).reset_index(drop=True)

# Display the full answers
pd.set_option("display.max_colwidth", None)

# Shuffle dataset (so that similar Q and A pairs are not grouped)
df = df.sample(frac=1, random_state=42).reset_index(drop=True)

# Show the first 10 rows of the table
print("First 5 Q and A pairs:")
df.head()

First 5 Q and A pairs:


Unnamed: 0,question,answer
0,How do you debug “No response from server” in urql?,"""No response from server"" in urql typically means the HTTP request was sent but no response was received within a reasonable timeframe, or the connection was abruptly closed. This often points to a network issue, a server that's down, or a firewall blocking the connection. To debug:\n1. **Check network connection:** Ensure your device has internet access.\n2. **Verify server status:** Is your GraphQL server running and accessible?\n3. **Correct endpoint URL:** Double-check the `url` in your `urql` client configuration.\n4. **Firewall/Proxy:** See if a firewall or proxy is blocking the request.\n5. **Browser DevTools:** Open the Network tab in your browser's developer tools. Look for the pending request; it might eventually show a timeout or connection refused error.\n6. **Server Logs:** Check your GraphQL server logs for any incoming request attempts or errors."
1,How to enforce token usage via linting rules?,"To enforce token usage, create custom Stylelint rules or extend your ESLint config with plugins that analyze your .css.ts files. These rules should check whether hardcoded values (like colors or spacing) match predefined token values. You can use Abstract Syntax Tree (AST) parsing with ESLint or integrate with tools like `stylelint-declaration-strict-value` to disallow raw values and ensure tokens are consistently used."
2,How to integrate store with Three.js for UI sync?,"1. Three.js render loop is external, so sync Zustand state manually:\n```ts\nstore.subscribe(state => {\n mesh.position.x = state.x;\n});\n```\n2. Use `useFrame()` in `@react-three/fiber` to pull values from store:\n```ts\nuseFrame(() => {\n meshRef.current.position.x = useStore.getState().x;\n});\n```\n3. Avoid using `useStore()` inside `useFrame()` directly for performance."
3,How do you test WebSockets in Playwright?,Intercept and monitor WebSocket messages using the `page.on('websocket')` event or test the UI reflecting live updates:\n\n```ts\nawait page.goto('/chat');\nawait expect(page.locator('text=User joined')).toBeVisible();\n```
4,How to test styling logic using Jest + React Testing Library?,"You can use `@testing-library/react` along with Stitches' `toString()` method to test styling. Also, test for class names or inline styles:\n\n```tsx\nimport { render } from '@testing-library/react';\nimport { styled } from '@stitches/react';\n\nconst Button = styled('button', {\n backgroundColor: 'red',\n});\n\ntest('Button renders with correct styles', () => {\n const { getByRole } = render(<Button>Click</Button>);\n const button = getByRole('button');\n expect(button).toHaveStyle('background-color: red');\n});\n```\nStitches generates scoped class names, so testing for presence of styles via inline snapshots or `jest-styled` libraries can be useful."


Next we will separate the data into a 90/10 split: a training corpus and a testing set. Since we are using a pre-trained model, we are not training the model itself. Instead, we use the training corpus to generate embeddings, and then test accuracy by checking whether the model retrieves the correct answers for unseen questions in the test set.

In [16]:
# Calculate the size of the training corpus (90% of the dataset)
train_size = int(len(df) * 0.9)

# Separate dataset into training corpus and testing set
train_df = df.iloc[:train_size].reset_index(drop=True)
test_df  = df.iloc[train_size:].reset_index(drop=True)

# Separate questions and answers
train_questions = train_df["question"].tolist()
train_answers   = train_df["answer"].tolist()

test_questions = test_df["question"].tolist()
test_answers   = test_df["answer"].tolist()

print(f"Train size: {len(train_questions)}, Test size: {len(test_questions)}")


Train size: 23653, Test size: 2629


## 4. Embedding Generation

In [12]:
import torch

from sentence_transformers import SentenceTransformer

embedder = SentenceTransformer("all-MiniLM-L6-v2")

# Training embeddings
train_question_embeddings = embedder.encode(train_questions, convert_to_tensor=True)
train_answer_embeddings   = embedder.encode(train_answers, convert_to_tensor=True)

# Test embeddings
test_question_embeddings  = embedder.encode(test_questions, convert_to_tensor=True)
test_answer_embeddings    = embedder.encode(test_answers, convert_to_tensor=True)


## 5. Similarity Check

First we will check the cosine simularity between question and answer pairs

In [21]:
import random

sample_size = 5  # number of pairs to check
sample_indices = random.sample(range(len(train_questions)), sample_size)

results = []
for i in sample_indices:
    similarity = embedder.similarity(train_question_embeddings[i], train_answer_embeddings[i])
    results.append({
        "Question": train_questions[i],
        "Answer": train_answers[i],
        "Similarity": float(similarity)
    })

# Create DataFrame
df = pd.DataFrame(results)

# Display the table
df


Unnamed: 0,Question,Answer,Similarity
0,How do you handle browser compatibility issues?,"React Window supports modern browsers, but for older browsers (like IE11), issues may arise due to lack of support for features like `ResizeObserver`. Use polyfills if needed and test against your browser support matrix. Also avoid using unsupported CSS features in your list styles.",0.332162
1,How do you import a component using React.lazy?,"You import a component using `React.lazy` by assigning the result of `React.lazy()` to a variable. The function passed to `React.lazy` must use the dynamic `import()` syntax and return a Promise that resolves to the component.\n\n```jsx\nimport React, { lazy } from 'react';\n\nconst MyLazyComponent = lazy(() => import('./MyComponent'));\n// Then, use it inside a Suspense boundary:\n// function App() {\n// return (\n// <Suspense fallback={<div>Loading...</div>}>\n// <MyLazyComponent />\n// </Suspense>\n// );\n// }\n```\nIt's crucial that `MyComponent` has a `default export`.",0.691587
2,"How do you resolve ""React.memo not working with nested components""?","When `React.memo` appears not to work with nested components, it's almost always a misunderstanding of how memoization propagates (or doesn't propagate) down the component tree. `React.memo` only optimizes the component it wraps based on *its direct props*. It doesn't magically memoize its children or components rendered inside it, unless those children are *also* explicitly wrapped with `React.memo`.\n\nThe common issues and resolutions are:\n\n1. **Unmemoized Children:** If a memoized parent re-renders (because *its* props changed), and it renders an *unmemoized* child, that child will re-render, even if the child's own props haven't changed. **Resolution:** If the nested child is also expensive and receives stable props, *also* wrap the nested child with `React.memo`.\n2. **Unstable Props to Nested Memoized Children:** If a memoized parent passes new object, array, or function references as props to a memoized nested child on every render, the nested child will re-render. **Resolution:** Use `React.useCallback` for functions and `React.useMemo` for objects/arrays in the parent component to stabilize props passed down to memoized children.\n3. **Children as `props.children`:** If a parent receives `props.children` and passes them directly to a memoized wrapper, the `children` prop itself can often be unstable (as JSX elements are objects). If `children` changes (e.g., new JSX is rendered), the memoized wrapper might re-render. **Resolution:** If the `children` themselves are static or memoized, `React.memo` can still work. If `children` are dynamic, the memoized wrapper might not be as effective for that specific prop.\n\nExample:\n\n```jsx\nimport React, { useState, useCallback, useMemo } from 'react';\n\n// Level 3: Deeply nested and memoized component\nconst GrandchildComponent = React.memo(({ value }) => {\n console.log('Rendering GrandchildComponent:', value);\n return <p>Grandchild: {value}</p>;\n});\n\n// Level 2: Nested component, also memoized\nconst ChildComponent = React.memo(({ count, onAction, data }) => {\n console.log('Rendering ChildComponent:', count);\n return (\n <div style={{ border: '1px dotted green', padding: '10px', margin: '5px' }}>\n <h4>Child: {count}</h4>\n <button onClick={onAction}>Child Action</button>\n <GrandchildComponent value={data.nestedValue} />\n </div>\n );\n});\n\n// Level 1: Parent component\nexport default function App() {\n const [appCount, setAppCount] = useState(0);\n\n // Memoize the callback passed to ChildComponent\n const handleChildAction = useCallback(() => {\n console.log('Child action triggered!');\n }, []);\n\n // Memoize the data object passed to ChildComponent (and then GrandchildComponent)\n const memoizedData = useMemo(() => ({ nestedValue: 100 }), []);\n\n return (\n <div style={{ border: '1px solid red', padding: '20px' }}>\n <h1>App Count: {appCount}</h1>\n <button onClick={() => setAppCount(appCount + 1)}>Increment App Count</button>\n \n {/* ChildComponent will only re-render if appCount changes, or if handleChildAction/memoizedData change */}\n <ChildComponent \n count={appCount} \n onAction={handleChildAction} \n data={memoizedData} \n />\n\n {/* If you pass an unstable object, ChildComponent AND GrandchildComponent will re-render unnecessarily */}\n {/* <ChildComponent \n count={appCount} \n onAction={handleChildAction} \n data={{ nestedValue: 200 }} \n /> */}\n </div>\n );\n}\n```\n`React.memo` is local optimization. For effective memoization of nested components, ensure each component in the hierarchy is memoized if appropriate, and that stable props are passed down the chain using `useCallback` and `useMemo`.",0.772952
3,How to debug hydration mismatch warnings?,"Hydration mismatch occurs when server-rendered HTML doesn't match client-rendered HTML.\n1. Use `console.log()` to compare SSR and client output.\n2. Avoid accessing `window`, `document`, or random values (like `Math.random()`) during render.\n3. Use `useEffect()` to defer client-specific code.\n4. Consistently render components and avoid conditional rendering that differs between client and server.\n5. Wrap dynamic parts like dates in `useEffect()` or SSR them with consistent values.",0.570225
4,How do you fetch data for routed components in Relay?,"Use `useQueryLoader` in the parent component and pass the loaded data to the routed child. Or, place a `useLazyLoadQuery` directly in the routed component, wrapped in a `<Suspense>` boundary. For server-rendered apps, preload the query in the router and hydrate it into the Relay environment. This ensures fast, coordinated routing and data rendering.",0.679267


## 6. Testing

Next we need to test how well new questions pair with existing answers in the training corpus.

https://www.sbert.net/examples/sentence_transformer/applications/semantic-search/README.html

Then we can check how close these align with their actual answers.

## 7. Evaluation

https://pretrained.dev/article/How_to_evaluate_the_performance_of_pretrained_models.htm