# FAQ ChatBot model prototype notebook

## 1. Project Overview

The aim of this notebook is to use asymmetric semantic search to retrieve the top 3 closest answers from the ReactJS FAQ dataset based on a user's question input. Questions and answers are mapped to the same vector space using a pre-trained SBERT model. Given a new question, the closest 3 answers are retrieved using similarity search. The dataset will be split, with 90% used as reference corpus and 10% of questions reserved to evaluate how well the system retrieves relevant answers.

https://www.sbert.net/examples/sentence_transformer/applications/semantic-search/README.html

## 2. Dataset

The ReactJS FAQ dataset will be used, which consists approximately 26,300 question-answer pairs in a JSON format.

https://www.kaggle.com/datasets/savanidhruv/reactjs-faq-dataset

In [13]:
import pandas as pd

# Import display module from IPython to render pandas DataFrames in VS Code notebooks
# from IPython.display import display # uncomment if needed to display multiple objects in one cell

In [14]:
# Load dataset from kaggle

import kagglehub
from kagglehub import KaggleDatasetAdapter

# JSON file inside the ReactJS FAQ dataset folder
file_path = "React_dataset.json"

# Load the latest version
df = kagglehub.dataset_load(
  KaggleDatasetAdapter.PANDAS,
  "savanidhruv/reactjs-faq-dataset",
  file_path,
)

print("First 5 records:")
df.head()

First 5 records:


Unnamed: 0,question,answer,detailedanswer,detailed_answer,argument
0,What is React?,"React is an open-source JavaScript library for building user interfaces, primarily for single-page applications, using a component-based architecture.",,,
1,What is a Date Picker in React?,A Date Picker in React is a UI component that allows users to select a date or date range from a calendar pop-up or dropdown.,,,
2,what are feature of React?,"1) Component-Based Architecture, 2) Virtual DOM, 3) Declarative UI, 4) JSX (JavaScript XML), 5) Unidirectional Data Flow, 6) Hooks API, 7) Ecosystem Support & Routing, 8) Fast Rendering & Performance, 9)Developer Tools, 10)Community and Open Source",,,
3,Which libraries are commonly used for date pickers in React?,"Popular libraries include react-datepicker, react-date-range, react-dates (Airbnb), MUI DatePicker, and Ant Design DatePicker.",,,
4,How to install react-datepicker?,Run `npm install react-datepicker` and also install `date-fns` if needed.,,,


## 3. Preprocessing



In [15]:
# Remove other columns from the dataframe
df = df[["question", "answer"]]

# Drop rows where either 'question' or 'answer' is null
df = df.dropna(subset=["question", "answer"]).reset_index(drop=True)

# Display the full answers
pd.set_option("display.max_colwidth", None)

# Shuffle dataset (so that similar Q and A pairs are not grouped)
df = df.sample(frac=1, random_state=42).reset_index(drop=True)

# Show the first 10 rows of the table
print("First 5 Q and A pairs:")
df.head()

First 5 Q and A pairs:


Unnamed: 0,question,answer
0,How do you debug “No response from server” in urql?,"""No response from server"" in urql typically means the HTTP request was sent but no response was received within a reasonable timeframe, or the connection was abruptly closed. This often points to a network issue, a server that's down, or a firewall blocking the connection. To debug:\n1. **Check network connection:** Ensure your device has internet access.\n2. **Verify server status:** Is your GraphQL server running and accessible?\n3. **Correct endpoint URL:** Double-check the `url` in your `urql` client configuration.\n4. **Firewall/Proxy:** See if a firewall or proxy is blocking the request.\n5. **Browser DevTools:** Open the Network tab in your browser's developer tools. Look for the pending request; it might eventually show a timeout or connection refused error.\n6. **Server Logs:** Check your GraphQL server logs for any incoming request attempts or errors."
1,How to enforce token usage via linting rules?,"To enforce token usage, create custom Stylelint rules or extend your ESLint config with plugins that analyze your .css.ts files. These rules should check whether hardcoded values (like colors or spacing) match predefined token values. You can use Abstract Syntax Tree (AST) parsing with ESLint or integrate with tools like `stylelint-declaration-strict-value` to disallow raw values and ensure tokens are consistently used."
2,How to integrate store with Three.js for UI sync?,"1. Three.js render loop is external, so sync Zustand state manually:\n```ts\nstore.subscribe(state => {\n mesh.position.x = state.x;\n});\n```\n2. Use `useFrame()` in `@react-three/fiber` to pull values from store:\n```ts\nuseFrame(() => {\n meshRef.current.position.x = useStore.getState().x;\n});\n```\n3. Avoid using `useStore()` inside `useFrame()` directly for performance."
3,How do you test WebSockets in Playwright?,Intercept and monitor WebSocket messages using the `page.on('websocket')` event or test the UI reflecting live updates:\n\n```ts\nawait page.goto('/chat');\nawait expect(page.locator('text=User joined')).toBeVisible();\n```
4,How to test styling logic using Jest + React Testing Library?,"You can use `@testing-library/react` along with Stitches' `toString()` method to test styling. Also, test for class names or inline styles:\n\n```tsx\nimport { render } from '@testing-library/react';\nimport { styled } from '@stitches/react';\n\nconst Button = styled('button', {\n backgroundColor: 'red',\n});\n\ntest('Button renders with correct styles', () => {\n const { getByRole } = render(<Button>Click</Button>);\n const button = getByRole('button');\n expect(button).toHaveStyle('background-color: red');\n});\n```\nStitches generates scoped class names, so testing for presence of styles via inline snapshots or `jest-styled` libraries can be useful."


Next we will separate the data into a 90/10 split: a training corpus and a testing set. Since we are using a pre-trained model, we are not training the model itself. Instead, we use the training corpus to generate embeddings, and then test accuracy by checking whether the model retrieves the correct answers for unseen questions in the test set.

In [16]:
# Calculate the size of the training corpus (90% of the dataset)
train_size = int(len(df) * 0.9)

# Separate dataset into training corpus and testing set
train_df = df.iloc[:train_size].reset_index(drop=True)
test_df  = df.iloc[train_size:].reset_index(drop=True)

# Separate questions and answers
train_questions = train_df["question"].tolist()
train_answers   = train_df["answer"].tolist()

test_questions = test_df["question"].tolist()
test_answers   = test_df["answer"].tolist()

print(f"Train size: {len(train_questions)}, Test size: {len(test_questions)}")


Train size: 23653, Test size: 2629


## 4. Embedding Generation

In [12]:
import torch

from sentence_transformers import SentenceTransformer

embedder = SentenceTransformer("all-MiniLM-L6-v2")

# Training embeddings
train_question_embeddings = embedder.encode(train_questions, convert_to_tensor=True)
train_answer_embeddings   = embedder.encode(train_answers, convert_to_tensor=True)

# Test embeddings
test_question_embeddings  = embedder.encode(test_questions, convert_to_tensor=True)
test_answer_embeddings    = embedder.encode(test_answers, convert_to_tensor=True)


## 5. Similarity Check

First we will check the cosine simularity between question and answer pairs

In [21]:
import random

sample_size = 5  # number of pairs to check
sample_indices = random.sample(range(len(train_questions)), sample_size)

results = []
for i in sample_indices:
    similarity = embedder.similarity(train_question_embeddings[i], train_answer_embeddings[i])
    results.append({
        "Question": train_questions[i],
        "Answer": train_answers[i],
        "Similarity": float(similarity)
    })

# Create DataFrame
df = pd.DataFrame(results)

# Display the table
df


Unnamed: 0,Question,Answer,Similarity
0,How do you handle browser compatibility issues?,"React Window supports modern browsers, but for older browsers (like IE11), issues may arise due to lack of support for features like `ResizeObserver`. Use polyfills if needed and test against your browser support matrix. Also avoid using unsupported CSS features in your list styles.",0.332162
1,How do you import a component using React.lazy?,"You import a component using `React.lazy` by assigning the result of `React.lazy()` to a variable. The function passed to `React.lazy` must use the dynamic `import()` syntax and return a Promise that resolves to the component.\n\n```jsx\nimport React, { lazy } from 'react';\n\nconst MyLazyComponent = lazy(() => import('./MyComponent'));\n// Then, use it inside a Suspense boundary:\n// function App() {\n// return (\n// <Suspense fallback={<div>Loading...</div>}>\n// <MyLazyComponent />\n// </Suspense>\n// );\n// }\n```\nIt's crucial that `MyComponent` has a `default export`.",0.691587
2,"How do you resolve ""React.memo not working with nested components""?","When `React.memo` appears not to work with nested components, it's almost always a misunderstanding of how memoization propagates (or doesn't propagate) down the component tree. `React.memo` only optimizes the component it wraps based on *its direct props*. It doesn't magically memoize its children or components rendered inside it, unless those children are *also* explicitly wrapped with `React.memo`.\n\nThe common issues and resolutions are:\n\n1. **Unmemoized Children:** If a memoized parent re-renders (because *its* props changed), and it renders an *unmemoized* child, that child will re-render, even if the child's own props haven't changed. **Resolution:** If the nested child is also expensive and receives stable props, *also* wrap the nested child with `React.memo`.\n2. **Unstable Props to Nested Memoized Children:** If a memoized parent passes new object, array, or function references as props to a memoized nested child on every render, the nested child will re-render. **Resolution:** Use `React.useCallback` for functions and `React.useMemo` for objects/arrays in the parent component to stabilize props passed down to memoized children.\n3. **Children as `props.children`:** If a parent receives `props.children` and passes them directly to a memoized wrapper, the `children` prop itself can often be unstable (as JSX elements are objects). If `children` changes (e.g., new JSX is rendered), the memoized wrapper might re-render. **Resolution:** If the `children` themselves are static or memoized, `React.memo` can still work. If `children` are dynamic, the memoized wrapper might not be as effective for that specific prop.\n\nExample:\n\n```jsx\nimport React, { useState, useCallback, useMemo } from 'react';\n\n// Level 3: Deeply nested and memoized component\nconst GrandchildComponent = React.memo(({ value }) => {\n console.log('Rendering GrandchildComponent:', value);\n return <p>Grandchild: {value}</p>;\n});\n\n// Level 2: Nested component, also memoized\nconst ChildComponent = React.memo(({ count, onAction, data }) => {\n console.log('Rendering ChildComponent:', count);\n return (\n <div style={{ border: '1px dotted green', padding: '10px', margin: '5px' }}>\n <h4>Child: {count}</h4>\n <button onClick={onAction}>Child Action</button>\n <GrandchildComponent value={data.nestedValue} />\n </div>\n );\n});\n\n// Level 1: Parent component\nexport default function App() {\n const [appCount, setAppCount] = useState(0);\n\n // Memoize the callback passed to ChildComponent\n const handleChildAction = useCallback(() => {\n console.log('Child action triggered!');\n }, []);\n\n // Memoize the data object passed to ChildComponent (and then GrandchildComponent)\n const memoizedData = useMemo(() => ({ nestedValue: 100 }), []);\n\n return (\n <div style={{ border: '1px solid red', padding: '20px' }}>\n <h1>App Count: {appCount}</h1>\n <button onClick={() => setAppCount(appCount + 1)}>Increment App Count</button>\n \n {/* ChildComponent will only re-render if appCount changes, or if handleChildAction/memoizedData change */}\n <ChildComponent \n count={appCount} \n onAction={handleChildAction} \n data={memoizedData} \n />\n\n {/* If you pass an unstable object, ChildComponent AND GrandchildComponent will re-render unnecessarily */}\n {/* <ChildComponent \n count={appCount} \n onAction={handleChildAction} \n data={{ nestedValue: 200 }} \n /> */}\n </div>\n );\n}\n```\n`React.memo` is local optimization. For effective memoization of nested components, ensure each component in the hierarchy is memoized if appropriate, and that stable props are passed down the chain using `useCallback` and `useMemo`.",0.772952
3,How to debug hydration mismatch warnings?,"Hydration mismatch occurs when server-rendered HTML doesn't match client-rendered HTML.\n1. Use `console.log()` to compare SSR and client output.\n2. Avoid accessing `window`, `document`, or random values (like `Math.random()`) during render.\n3. Use `useEffect()` to defer client-specific code.\n4. Consistently render components and avoid conditional rendering that differs between client and server.\n5. Wrap dynamic parts like dates in `useEffect()` or SSR them with consistent values.",0.570225
4,How do you fetch data for routed components in Relay?,"Use `useQueryLoader` in the parent component and pass the loaded data to the routed child. Or, place a `useLazyLoadQuery` directly in the routed component, wrapped in a `<Suspense>` boundary. For server-rendered apps, preload the query in the router and hydrate it into the Relay environment. This ensures fast, coordinated routing and data rendering.",0.679267


## 6. Testing

Next we need to test how well new questions pair with existing answers in the training corpus.

https://www.sbert.net/examples/sentence_transformer/applications/semantic-search/README.html

In [26]:
from sentence_transformers import util

top_k = 3  # number of top matches to show

# Run semantic search
results = util.semantic_search(test_question_embeddings, train_answer_embeddings, top_k=top_k)

# Build a DataFrame
df_rows = []
for i, hits in enumerate(results):
    for rank, hit in enumerate(hits, start=1):
        df_rows.append({
            "Test Question": test_questions[i],
            "Rank": rank,
            "Training Answer": train_answers[hit['corpus_id']],
            "Similarity": hit['score']
        })

df = pd.DataFrame(df_rows)

# Display 3 questions
df.head(3 * top_k)

Unnamed: 0,Test Question,Rank,Training Answer,Similarity
0,How do you throttle form submissions in Formik?,1,Formik doesn't have rate limiting built-in. You must enforce this on the backend using IP-based or account-based throttling. On the client:\n- You can debounce or throttle API calls with libraries like `lodash.throttle`\n- Use Formik’s `isSubmitting` flag to block rapid resubmission\n- Show appropriate error messages if rate-limiting is triggered by the server,0.754446
1,How do you throttle form submissions in Formik?,2,"Use a throttling utility like lodash’s `throttle` or a custom hook to limit how often `onSubmit` can be triggered:\n\nimport { throttle } from 'lodash-es';\n\nconst onSubmitThrottled = throttle(onSubmit, 1000, { leading: true, trailing: false });\n\n<form onSubmit={handleSubmit(onSubmitThrottled)}>\n <!-- form fields -->\n</form>",0.628531
2,How do you throttle form submissions in Formik?,3,"Rate limiting can be implemented on the client by disabling rapid submissions using throttling/debouncing (e.g., Lodash). On the server, detect excessive requests from an IP/user and return a 429 response. In the form, catch this and display a cooldown message ('Please wait 30 seconds before submitting again'). Optionally show a countdown. Persist cooldown state in localStorage or context. For anonymous users, use fingerprinting or device ID to enforce limits. Always inform users when their request is blocked to improve UX.",0.626402
3,How do you use useCallback with streaming rendering?,1,"useCallback is a hook that memoizes a function so it doesn’t get recreated on every render, useful for preventing unnecessary re-renders in child components.",0.59536
4,How do you use useCallback with streaming rendering?,2,"`useCallback` returns a memoized function, which appears in DevTools as a stable function reference. To debug it, inspect whether the callback reference changes between renders (shown by a change in hook value). Unexpected changes may indicate a missing or incorrect dependency.",0.588214
5,How do you use useCallback with streaming rendering?,3,`useCallback` helps you avoid re-creating functions on every render. It's useful when passing callbacks to optimized child components that rely on reference equality to avoid re-rendering.,0.583129
6,How do you set the language attribute on the HTML tag with React Helmet?,1,"Set the `lang` attribute in `<html>` using React Helmet:\n\n```jsx\n<Helmet>\n <html lang=""en-US"" />\n</Helmet>\n```\n\nThis improves screen reader behavior by ensuring correct language pronunciation and localization settings.\n\nFor dynamic locales:\n```jsx\n<Helmet>\n <html lang={userLocale} />\n</Helmet>\n```\n\nAlways provide valid BCP 47 language codes like `'es'`, `'ar'`, or `'zh-CN'`.",0.751366
7,How do you set the language attribute on the HTML tag with React Helmet?,2,"Manage multilingual head tags by detecting locale and passing translations into Helmet:\n\n```jsx\nimport { useTranslation } from 'react-i18next';\n\nfunction LocalizedHelmet({ titleKey, descriptionKey }) {\n const { t, i18n } = useTranslation();\n return (\n <Helmet>\n <html lang={i18n.language} />\n <title>{t(titleKey)}</title>\n <meta name=""description"" content={t(descriptionKey)} />\n <link rel=""alternate"" hrefLang={i18n.language} href={window.location.href} />\n </Helmet>\n );\n}\n```\n\nAlso consider adding alternate language links and canonical URLs for full internationalization support.",0.705496
8,How do you set the language attribute on the HTML tag with React Helmet?,3,"You set the `charset` meta tag with React Helmet by including a `<meta>` tag with the `charset` attribute as a child of the `Helmet` component. It's generally recommended to set this as `utf-8` to ensure proper character encoding for most languages.\n\n```jsx\nimport React from 'react';\nimport { Helmet } from 'react-helmet';\n\nfunction MyApp() {\n return (\n <div>\n <Helmet>\n <title>My UTF-8 Encoded App</title>\n <meta charSet=""utf-8"" />\n <meta name=""description"" content=""This app uses UTF-8 encoding."" />\n </Helmet>\n {/* ... your app content */}\n </div>\n );\n}\n\nexport default MyApp;\n```\n\nNote that the `charset` meta tag is usually one of the first tags in the `<head>` of an HTML document. While React Helmet places it, for maximum compatibility and immediate browser interpretation, it's often best to ensure it's present in your `public/index.html` file (for CRA) or the base HTML template for SSR applications, even if you also set it with Helmet.",0.681841


Now we can check how well these align with their actual answers and compare the simularity between the top answer and the real answer.

In [28]:
eval_rows = []
for i, hits in enumerate(results):  # results from semantic_search
    correct_sim = util.cos_sim(test_question_embeddings[i], test_answer_embeddings[i]).item()
    top_hit = hits[0]  # best training answer
    eval_rows.append({
        "Test Question": test_questions[i],
        "Correct Test Answer": test_answers[i],
        "Correct Similarity": correct_sim,
        "Top Retrieved Answer": train_answers[top_hit['corpus_id']],
        "Top Retrieved Similarity": top_hit['score']
    })

df_eval = pd.DataFrame(eval_rows)
df_eval.head()

Unnamed: 0,Test Question,Correct Test Answer,Correct Similarity,Top Retrieved Answer,Top Retrieved Similarity
0,How do you throttle form submissions in Formik?,"Throttle form submissions by wrapping `onSubmit` logic with `lodash.throttle()` or tracking the last submit time:\n```jsx\nconst throttledSubmit = useCallback(throttle(values => submitToApi(values), 5000), []);\n<Formik onSubmit={throttledSubmit} />\n```",0.670487,Formik doesn't have rate limiting built-in. You must enforce this on the backend using IP-based or account-based throttling. On the client:\n- You can debounce or throttle API calls with libraries like `lodash.throttle`\n- Use Formik’s `isSubmitting` flag to block rapid resubmission\n- Show appropriate error messages if rate-limiting is triggered by the server,0.754446
1,How do you use useCallback with streaming rendering?,"When using `useCallback` in streaming environments, ensure that the memoized function doesn’t rely on partially loaded or unresolved data. Functions should either be deferred until all dependencies are ready or guarded with null-checks and fallbacks to prevent runtime errors.",0.66654,"useCallback is a hook that memoizes a function so it doesn’t get recreated on every render, useful for preventing unnecessary re-renders in child components.",0.59536
2,How do you set the language attribute on the HTML tag with React Helmet?,"You can set the `lang` attribute on the `<html>` tag using Helmet like this:\n```jsx\n<Helmet htmlAttributes={{ lang: 'en' }} />\n```\nThis is particularly important for accessibility, SEO, and internationalization. You can dynamically switch this based on app locale settings or i18n state. React Helmet ensures this attribute is injected directly into the DOM on load or update.",0.842265,"Set the `lang` attribute in `<html>` using React Helmet:\n\n```jsx\n<Helmet>\n <html lang=""en-US"" />\n</Helmet>\n```\n\nThis improves screen reader behavior by ensuring correct language pronunciation and localization settings.\n\nFor dynamic locales:\n```jsx\n<Helmet>\n <html lang={userLocale} />\n</Helmet>\n```\n\nAlways provide valid BCP 47 language codes like `'es'`, `'ar'`, or `'zh-CN'`.",0.751366
3,How to create reusable layout components with Tailwind?,"Reusable layout components in React with Tailwind can be created by combining flexbox/grid utilities and defining consistent padding, margins, and max widths.\n\nExample: Container component to center and constrain content width:\n\n```jsx\nfunction Container({ children }) {\n return <div className=""max-w-7xl mx-auto px-4 sm:px-6 lg:px-8"">{children}</div>;\n}\n```\n\nExample: Flex Row component with spacing:\n\n```jsx\nfunction FlexRow({ children, gap = '4' }) {\n return <div className={`flex space-x-${gap}`}>{children}</div>;\n}\n```\n\nExample: Responsive grid layout:\n\n```jsx\nfunction GridLayout({ children }) {\n return <div className=""grid grid-cols-1 md:grid-cols-3 gap-6"">{children}</div>;\n}\n```\n\nUsing these components across your app ensures consistent layout and spacing. You can also extend these to accept props for custom behavior or styles.",0.579723,"Use Tailwind for layout and utility styling, and antd for interactive components. Combine them by wrapping antd components in Tailwind utility classes via `className`. Avoid class name conflicts and ensure consistent theming.",0.712194
4,How do you use SWR in a CMS application?,Load pages and content blocks:\n```tsx\nfunction usePageContent(slug) {\n const { data } = useSWR(`/api/pages/${slug}`);\n return data;\n}\n\nfunction useBlocks(pageId) {\n const { data } = useSWR(`/api/blocks?page=${pageId}`);\n return data;\n}\n```\n**Notes**:\n- Versioned content can be fetched via query params.\n- Allow preview/edit modes.,0.128066,"SWR is a React Hooks library for data fetching, developed by Vercel. It simplifies and optimizes the process of fetching, caching, and updating data in React applications.",0.56298


## 7. Evaluation

https://pretrained.dev/article/How_to_evaluate_the_performance_of_pretrained_models.htm