# Survey Text Response Analysis

This notebook processes open-ended survey text responses using NLP techniques, sentiment analysis, topic modeling, and keyword extraction. Visualizations and markdown explanations are provided for each step.

## 1. Import Required Libraries
We will use pandas, numpy, nltk, sklearn, matplotlib, seaborn, wordcloud, VADER, and optionally HuggingFace, BERTopic, and KeyBERT.

In [23]:

# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from wordcloud import WordCloud
import json
import nltk
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation
from nltk.sentiment import SentimentIntensityAnalyzer
import warnings
warnings.filterwarnings('ignore')

# Optional imports
try:
    from transformers import pipeline
except ImportError:
    pipeline = None
try:
    from bertopic import BERTopic
except ImportError:
    BERTopic = None
try:
    from keybert import KeyBERT
except ImportError:
    KeyBERT = None
try:
    from rake_nltk import Rake
except ImportError:
    Rake = None

## 2. Load Survey Responses from JSON
We will load the survey responses from the JSON file (`survey-results.json`).

In [24]:
# Load the survey data from JSON file
with open('survey-results-f5ae24e1-9985-450b-b36c-878ffa7f471d.json', 'r', encoding='utf-8') as f:
    survey_data = json.load(f)

# Preview the structure of the JSON data
print('Type:', type(survey_data))
if isinstance(survey_data, dict):
    for k, v in survey_data.items():
        print(f'{k}:', str(v)[:300], '\n')
        break
elif isinstance(survey_data, list):
    print('Sample record:', survey_data[0])

Type: <class 'dict'>
survey: {'survey_id': 'f5ae24e1-9985-450b-b36c-878ffa7f471d', 'topic': 'Student AI/ML session review', 'audience': 'College Student', 'created_at': '2025-08-30T16:56:55.019098+00:00', 'questions_count': 5, 'responses_count': 5} 



## 3. Extract All Text-Based Answers
We will extract all open-ended (text) responses from the survey data for NLP analysis.

In [25]:
# Extract all text-based answers from the survey data
def extract_text_responses(survey_data):
    # Flatten all text answers from all responses
    text_responses = []
    if isinstance(survey_data, dict):
        records = list(survey_data.values())
        if isinstance(records[0], list):
            records = records[0]
    elif isinstance(survey_data, list):
        records = survey_data
    else:
        records = []
    for resp in records:
        if isinstance(resp, dict):
            for k, v in resp.items():
                if isinstance(v, str) and len(v.split()) > 2:  # Heuristic: longer text
                    text_responses.append(v)
    print(f'Extracted {len(text_responses)} text responses.')
    return pd.DataFrame({'response': text_responses})

text_df = extract_text_responses(survey_data)
display(text_df.head())

Extracted 1 text responses.


Unnamed: 0,response
0,Student AI/ML session review


## 4. NLP Preprocessing
We will preprocess the text responses: lowercasing, stopword removal, and lemmatization.

In [26]:
# Download NLTK resources if not already present
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('omw-1.4')

stop_words = set(stopwords.words('english'))
lemmatizer = WordNetLemmatizer()

def preprocess(text):
    tokens = [w.lower() for w in text.split()]
    tokens = [w for w in tokens if w.isalpha() and w not in stop_words]
    tokens = [lemmatizer.lemmatize(w) for w in tokens]
    return ' '.join(tokens)

text_df['processed'] = text_df['response'].apply(preprocess)
display(text_df.head())

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\Abrarali\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\Abrarali\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to
[nltk_data]     C:\Users\Abrarali\AppData\Roaming\nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


Unnamed: 0,response,processed
0,Student AI/ML session review,student session review


## 5. Sentiment Analysis
We will perform sentiment analysis using VADER (rule-based) and optionally a HuggingFace transformer model.

In [27]:
# Ensure tf-keras is installed for compatibility with Transformers and Keras 3
%pip install -q tf-keras

# Sentiment analysis using VADER
nltk.download('vader_lexicon')
sia = SentimentIntensityAnalyzer()
text_df['vader_sentiment'] = text_df['response'].apply(lambda x: sia.polarity_scores(x)['compound'])

# Optionally, use HuggingFace transformer model for sentiment
if pipeline:
    hf_sentiment = pipeline('sentiment-analysis')
    text_df['hf_sentiment'] = text_df['response'].apply(lambda x: hf_sentiment(x)[0]['label'])
else:
    text_df['hf_sentiment'] = None

display(text_df[['response', 'vader_sentiment', 'hf_sentiment']].head())

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\Abrarali\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!
No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


Note: you may need to restart the kernel to use updated packages.


Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Error while downloading from https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english/resolve/714eb0f/model.safetensors: HTTPSConnectionPool(host='cas-bridge.xethub.hf.co', port=443): Read timed out.
Trying to resume download...
Error while downloading from https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english/resolve/714eb0f/model.safetensors: HTTPSConnectionPool(host='cas-bridge.xethub.hf.co', port=443): Read timed out.
Trying to resume download...


ValueError: Could not load model distilbert/distilbert-base-uncased-finetuned-sst-2-english with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForSequenceClassification'>, <class 'transformers.models.auto.modeling_tf_auto.TFAutoModelForSequenceClassification'>, <class 'transformers.models.distilbert.modeling_distilbert.DistilBertForSequenceClassification'>). See the original errors:

while loading with AutoModelForSequenceClassification, an error is thrown:
Traceback (most recent call last):
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\urllib3\response.py", line 779, in _error_catcher
    yield
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\urllib3\response.py", line 904, in _raw_read
    data = self._fp_read(amt, read1=read1) if not fp_closed else b""
           ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\urllib3\response.py", line 887, in _fp_read
    return self._fp.read(amt) if amt is not None else self._fp.read()
           ~~~~~~~~~~~~~^^^^^
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\http\client.py", line 479, in read
    s = self.fp.read(amt)
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\socket.py", line 719, in readinto
    return self._sock.recv_into(b)
           ~~~~~~~~~~~~~~~~~~~~^^^
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\ssl.py", line 1304, in recv_into
    return self.read(nbytes, buffer)
           ~~~~~~~~~^^^^^^^^^^^^^^^^
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\ssl.py", line 1138, in read
    return self._sslobj.read(len, buffer)
           ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
TimeoutError: The read operation timed out

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\requests\models.py", line 820, in generate
    yield from self.raw.stream(chunk_size, decode_content=True)
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\urllib3\response.py", line 1091, in stream
    data = self.read(amt=amt, decode_content=decode_content)
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\urllib3\response.py", line 980, in read
    data = self._raw_read(amt)
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\urllib3\response.py", line 903, in _raw_read
    with self._error_catcher():
         ~~~~~~~~~~~~~~~~~~~^^
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\contextlib.py", line 162, in __exit__
    self.gen.throw(value)
    ~~~~~~~~~~~~~~^^^^^^^
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\urllib3\response.py", line 784, in _error_catcher
    raise ReadTimeoutError(self._pool, None, "Read timed out.") from e  # type: ignore[arg-type]
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='cas-bridge.xethub.hf.co', port=443): Read timed out.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\huggingface_hub\file_download.py", line 494, in http_get
    for chunk in r.iter_content(chunk_size=constants.DOWNLOAD_CHUNK_SIZE):
                 ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\requests\models.py", line 826, in generate
    raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='cas-bridge.xethub.hf.co', port=443): Read timed out.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\urllib3\connection.py", line 198, in _new_conn
    sock = connection.create_connection(
        (self._dns_host, self.port),
    ...<2 lines>...
        socket_options=self.socket_options,
    )
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\urllib3\util\connection.py", line 60, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
               ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\socket.py", line 977, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
               ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
socket.gaierror: [Errno 11001] getaddrinfo failed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\urllib3\connectionpool.py", line 787, in urlopen
    response = self._make_request(
        conn,
    ...<10 lines>...
        **response_kw,
    )
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\urllib3\connectionpool.py", line 488, in _make_request
    raise new_e
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\urllib3\connectionpool.py", line 464, in _make_request
    self._validate_conn(conn)
    ~~~~~~~~~~~~~~~~~~~^^^^^^
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\urllib3\connectionpool.py", line 1093, in _validate_conn
    conn.connect()
    ~~~~~~~~~~~~^^
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\urllib3\connection.py", line 753, in connect
    self.sock = sock = self._new_conn()
                       ~~~~~~~~~~~~~~^^
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\urllib3\connection.py", line 205, in _new_conn
    raise NameResolutionError(self.host, self, e) from e
urllib3.exceptions.NameResolutionError: <urllib3.connection.HTTPSConnection object at 0x000001C6FD251F90>: Failed to resolve 'huggingface.co' ([Errno 11001] getaddrinfo failed)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\requests\adapters.py", line 667, in send
    resp = conn.urlopen(
        method=request.method,
    ...<9 lines>...
        chunked=chunked,
    )
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\urllib3\connectionpool.py", line 841, in urlopen
    retries = retries.increment(
        method, url, error=new_e, _pool=self, _stacktrace=sys.exc_info()[2]
    )
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\urllib3\util\retry.py", line 519, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /distilbert/distilbert-base-uncased-finetuned-sst-2-english/resolve/714eb0f/model.safetensors (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x000001C6FD251F90>: Failed to resolve 'huggingface.co' ([Errno 11001] getaddrinfo failed)"))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\transformers\pipelines\base.py", line 292, in infer_framework_load_model
    model = model_class.from_pretrained(model, **kwargs)
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\transformers\models\auto\auto_factory.py", line 600, in from_pretrained
    return model_class.from_pretrained(
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\transformers\modeling_utils.py", line 311, in _wrapper
    return func(*args, **kwargs)
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\transformers\modeling_utils.py", line 4680, in from_pretrained
    checkpoint_files, sharded_metadata = _get_resolved_checkpoint_files(
                                         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        pretrained_model_name_or_path=pretrained_model_name_or_path,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<15 lines>...
        transformers_explicit_filename=transformers_explicit_filename,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\transformers\modeling_utils.py", line 1137, in _get_resolved_checkpoint_files
    resolved_archive_file = cached_file(pretrained_model_name_or_path, filename, **cached_file_kwargs)
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\transformers\utils\hub.py", line 312, in cached_file
    file = cached_files(path_or_repo_id=path_or_repo_id, filenames=[filename], **kwargs)
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\transformers\utils\hub.py", line 557, in cached_files
    raise e
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\transformers\utils\hub.py", line 470, in cached_files
    hf_hub_download(
    ~~~~~~~~~~~~~~~^
        path_or_repo_id,
        ^^^^^^^^^^^^^^^^
    ...<10 lines>...
        local_files_only=local_files_only,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\huggingface_hub\file_download.py", line 1008, in hf_hub_download
    return _hf_hub_download_to_cache_dir(
        # Destination
    ...<14 lines>...
        force_download=force_download,
    )
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\huggingface_hub\file_download.py", line 1161, in _hf_hub_download_to_cache_dir
    _download_to_tmp_and_move(
    ~~~~~~~~~~~~~~~~~~~~~~~~~^
        incomplete_path=Path(blob_path + ".incomplete"),
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<8 lines>...
        xet_file_data=xet_file_data,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\huggingface_hub\file_download.py", line 1725, in _download_to_tmp_and_move
    http_get(
    ~~~~~~~~^
        url_to_download,
        ^^^^^^^^^^^^^^^^
    ...<4 lines>...
        expected_size=expected_size,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\huggingface_hub\file_download.py", line 511, in http_get
    return http_get(
        url=url,
    ...<6 lines>...
        _tqdm_bar=_tqdm_bar,
    )
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\huggingface_hub\file_download.py", line 420, in http_get
    r = _request_wrapper(
        method="GET", url=url, stream=True, proxies=proxies, headers=headers, timeout=constants.HF_HUB_DOWNLOAD_TIMEOUT
    )
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\huggingface_hub\file_download.py", line 309, in _request_wrapper
    response = http_backoff(method=method, url=url, **params, retry_on_exceptions=(), retry_on_status_codes=(429,))
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\huggingface_hub\utils\_http.py", line 310, in http_backoff
    response = session.request(method=method, url=url, **kwargs)
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\requests\sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\requests\sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\huggingface_hub\utils\_http.py", line 96, in send
    return super().send(request, *args, **kwargs)
           ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\requests\adapters.py", line 700, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: (MaxRetryError('HTTPSConnectionPool(host=\'huggingface.co\', port=443): Max retries exceeded with url: /distilbert/distilbert-base-uncased-finetuned-sst-2-english/resolve/714eb0f/model.safetensors (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x000001C6FD251F90>: Failed to resolve \'huggingface.co\' ([Errno 11001] getaddrinfo failed)"))'), '(Request ID: 09e99534-f013-4b02-a65d-9f1e16e03ff3)')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\transformers\pipelines\base.py", line 310, in infer_framework_load_model
    model = model_class.from_pretrained(model, **fp32_kwargs)
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\transformers\models\auto\auto_factory.py", line 600, in from_pretrained
    return model_class.from_pretrained(
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\transformers\modeling_utils.py", line 311, in _wrapper
    return func(*args, **kwargs)
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\transformers\modeling_utils.py", line 4680, in from_pretrained
    checkpoint_files, sharded_metadata = _get_resolved_checkpoint_files(
                                         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        pretrained_model_name_or_path=pretrained_model_name_or_path,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<15 lines>...
        transformers_explicit_filename=transformers_explicit_filename,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\transformers\modeling_utils.py", line 1243, in _get_resolved_checkpoint_files
    raise OSError(
    ...<3 lines>...
    )
OSError: distilbert/distilbert-base-uncased-finetuned-sst-2-english does not appear to have a file named pytorch_model.bin, model.safetensors, tf_model.h5, model.ckpt or flax_model.msgpack.

while loading with TFAutoModelForSequenceClassification, an error is thrown:
Traceback (most recent call last):
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\transformers\models\auto\auto_factory.py", line 737, in getattribute_from_module
    return getattribute_from_module(transformers_module, attr)
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\transformers\models\auto\auto_factory.py", line 741, in getattribute_from_module
    raise ValueError(f"Could not find {attr} in {transformers_module}!")
ValueError: Could not find TFDistilBertForSequenceClassification in <module 'transformers' from 'c:\\Users\\Abrarali\\AppData\\Local\\Programs\\Python\\Python313\\Lib\\site-packages\\transformers\\__init__.py'>!

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\transformers\pipelines\base.py", line 292, in infer_framework_load_model
    model = model_class.from_pretrained(model, **kwargs)
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\transformers\models\auto\auto_factory.py", line 597, in from_pretrained
    model_class = _get_model_class(config, cls._model_mapping)
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\transformers\models\auto\auto_factory.py", line 394, in _get_model_class
    supported_models = model_mapping[type(config)]
                       ~~~~~~~~~~~~~^^^^^^^^^^^^^^
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\transformers\models\auto\auto_factory.py", line 803, in __getitem__
    return self._load_attr_from_module(model_type, model_name)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\transformers\models\auto\auto_factory.py", line 817, in _load_attr_from_module
    return getattribute_from_module(self._modules[module_name], attr)
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\transformers\models\auto\auto_factory.py", line 739, in getattribute_from_module
    raise ValueError(f"Could not find {attr} neither in {module} nor in {transformers_module}!")
ValueError: Could not find TFDistilBertForSequenceClassification neither in <module 'transformers.models.distilbert' from 'c:\\Users\\Abrarali\\AppData\\Local\\Programs\\Python\\Python313\\Lib\\site-packages\\transformers\\models\\distilbert\\__init__.py'> nor in <module 'transformers' from 'c:\\Users\\Abrarali\\AppData\\Local\\Programs\\Python\\Python313\\Lib\\site-packages\\transformers\\__init__.py'>!

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\transformers\models\auto\auto_factory.py", line 737, in getattribute_from_module
    return getattribute_from_module(transformers_module, attr)
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\transformers\models\auto\auto_factory.py", line 741, in getattribute_from_module
    raise ValueError(f"Could not find {attr} in {transformers_module}!")
ValueError: Could not find TFDistilBertForSequenceClassification in <module 'transformers' from 'c:\\Users\\Abrarali\\AppData\\Local\\Programs\\Python\\Python313\\Lib\\site-packages\\transformers\\__init__.py'>!

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\transformers\pipelines\base.py", line 310, in infer_framework_load_model
    model = model_class.from_pretrained(model, **fp32_kwargs)
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\transformers\models\auto\auto_factory.py", line 597, in from_pretrained
    model_class = _get_model_class(config, cls._model_mapping)
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\transformers\models\auto\auto_factory.py", line 394, in _get_model_class
    supported_models = model_mapping[type(config)]
                       ~~~~~~~~~~~~~^^^^^^^^^^^^^^
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\transformers\models\auto\auto_factory.py", line 803, in __getitem__
    return self._load_attr_from_module(model_type, model_name)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\transformers\models\auto\auto_factory.py", line 817, in _load_attr_from_module
    return getattribute_from_module(self._modules[module_name], attr)
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\transformers\models\auto\auto_factory.py", line 739, in getattribute_from_module
    raise ValueError(f"Could not find {attr} neither in {module} nor in {transformers_module}!")
ValueError: Could not find TFDistilBertForSequenceClassification neither in <module 'transformers.models.distilbert' from 'c:\\Users\\Abrarali\\AppData\\Local\\Programs\\Python\\Python313\\Lib\\site-packages\\transformers\\models\\distilbert\\__init__.py'> nor in <module 'transformers' from 'c:\\Users\\Abrarali\\AppData\\Local\\Programs\\Python\\Python313\\Lib\\site-packages\\transformers\\__init__.py'>!

while loading with DistilBertForSequenceClassification, an error is thrown:
Traceback (most recent call last):
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\transformers\pipelines\base.py", line 292, in infer_framework_load_model
    model = model_class.from_pretrained(model, **kwargs)
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\transformers\modeling_utils.py", line 311, in _wrapper
    return func(*args, **kwargs)
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\transformers\modeling_utils.py", line 4680, in from_pretrained
    checkpoint_files, sharded_metadata = _get_resolved_checkpoint_files(
                                         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        pretrained_model_name_or_path=pretrained_model_name_or_path,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<15 lines>...
        transformers_explicit_filename=transformers_explicit_filename,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\transformers\modeling_utils.py", line 1243, in _get_resolved_checkpoint_files
    raise OSError(
    ...<3 lines>...
    )
OSError: distilbert/distilbert-base-uncased-finetuned-sst-2-english does not appear to have a file named pytorch_model.bin, model.safetensors, tf_model.h5, model.ckpt or flax_model.msgpack.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\transformers\pipelines\base.py", line 310, in infer_framework_load_model
    model = model_class.from_pretrained(model, **fp32_kwargs)
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\transformers\modeling_utils.py", line 311, in _wrapper
    return func(*args, **kwargs)
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\transformers\modeling_utils.py", line 4680, in from_pretrained
    checkpoint_files, sharded_metadata = _get_resolved_checkpoint_files(
                                         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        pretrained_model_name_or_path=pretrained_model_name_or_path,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<15 lines>...
        transformers_explicit_filename=transformers_explicit_filename,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "c:\Users\Abrarali\AppData\Local\Programs\Python\Python313\Lib\site-packages\transformers\modeling_utils.py", line 1243, in _get_resolved_checkpoint_files
    raise OSError(
    ...<3 lines>...
    )
OSError: distilbert/distilbert-base-uncased-finetuned-sst-2-english does not appear to have a file named pytorch_model.bin, model.safetensors, tf_model.h5, model.ckpt or flax_model.msgpack.




## 6. Topic Modeling
We will use LDA (Latent Dirichlet Allocation) for topic modeling, and demonstrate BERTopic if available.

In [None]:
# Topic modeling with LDA
vectorizer = CountVectorizer(max_df=0.95, min_df=2, stop_words='english')
dtm = vectorizer.fit_transform(text_df['processed'])
lda = LatentDirichletAllocation(n_components=5, random_state=42)
lda_topics = lda.fit_transform(dtm)

# Display top words for each topic
def print_top_words(model, feature_names, n_top_words=10):
    for topic_idx, topic in enumerate(model.components_):
        print(f"Topic #{topic_idx+1}: ",
              " ".join([feature_names[i] for i in topic.argsort()[:-n_top_words - 1:-1]]))
print_top_words(lda, vectorizer.get_feature_names_out())

# Optionally, BERTopic
if BERTopic:
    bertopic_model = BERTopic()
    topics, probs = bertopic_model.fit_transform(text_df['processed'])
    text_df['bertopic'] = topics
    print('BERTopic topics:', set(topics))
else:
    text_df['bertopic'] = None

## 7. Keyword Extraction
We will extract keywords from the text responses using RAKE or KeyBERT.

In [None]:
# Keyword extraction with RAKE or KeyBERT
if Rake:
    rake = Rake()
    text_df['rake_keywords'] = text_df['response'].apply(lambda x: rake.extract_keywords_from_text(x) or rake.get_ranked_phrases())
    print('Sample RAKE keywords:', text_df['rake_keywords'].head().tolist())
elif KeyBERT:
    kw_model = KeyBERT()
    text_df['keybert_keywords'] = text_df['response'].apply(lambda x: kw_model.extract_keywords(x, top_n=5))
    print('Sample KeyBERT keywords:', text_df['keybert_keywords'].head().tolist())
else:
    print('Neither RAKE nor KeyBERT is available.')

## 8. Visualizations
We will visualize frequent terms (WordCloud), sentiment distribution (bar chart), and top topics with sample answers.

In [None]:
# WordCloud for frequent terms
all_words = ' '.join(text_df['processed'])
wordcloud = WordCloud(width=800, height=400, background_color='white').generate(all_words)
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.title('WordCloud of Frequent Terms')
plt.show()

# Bar chart for sentiment distribution
plt.figure(figsize=(6, 4))
sns.histplot(text_df['vader_sentiment'], bins=20, kde=True)
plt.title('Sentiment Distribution (VADER)')
plt.xlabel('Compound Sentiment Score')
plt.ylabel('Frequency')
plt.show()

# Top topics with sample answers (LDA)
lda_topic_assignments = lda_topics.argmax(axis=1)
text_df['lda_topic'] = lda_topic_assignments
for topic_num in range(lda.n_components):
    print(f'\nTopic {topic_num+1} sample answers:')
    display(text_df[text_df['lda_topic'] == topic_num]['response'].head(3))

## 9. Results and Insights

- **WordCloud** highlights the most frequent terms in open-ended responses.
- **Sentiment analysis** shows the distribution of positive, neutral, and negative feedback.
- **Topic modeling** groups responses into main themes/topics, with sample answers for each.
- **Keyword extraction** surfaces the most important phrases and terms.

These analyses help identify key concerns, suggestions, and overall sentiment from open-ended survey feedback.