# Soft Skills Assessment Notebook

This notebook creates a comprehensive soft skills assessment tool covering four key areas:
- Communication skills
- Leadership skills
- Time management skills
- Analytical skills

The assessment collects responses on a 5-point scale (strongly disagree to strongly agree) and provides feedback based on overall scores.

In [15]:
# Import necessary libraries
import requests
import re
import pandas as pd
import numpy as np
import os
import json
import random
import time
from bs4 import BeautifulSoup
from tqdm.notebook import tqdm
from IPython.display import display, HTML
import os
from dotenv import load_dotenv

## 1. Data Collection: Web Scraping

This section handles gathering questions from existing online assessment sources.
We scrape questions for each skill category from relevant HR and professional development websites.

In [16]:
# Create directories for saving data
os.makedirs('./data', exist_ok=True)

# Define the scraping function for all categories
def scrape_soft_skills_questions(category):
    """
    Scrape questions for a specific soft skill category from relevant websites.

    Args:
        category (str): One of 'communication', 'leadership', 'time_management', 'analytical'

    Returns:
        pd.DataFrame: DataFrame containing scraped questions
    """
    all_questions = []

    # Define target URLs and questions by category
    target_sources = {
        'communication': {
            'https://hr-survey.com/360_Survey_Example_1.htm': [
                "Takes on challenging questions and provides instant answers.",
                "Communicates clearly and gets to the point without unnecessary details.",
                "Coaches others on their written communication skills",
                "Addresses issues of key importance to stakeholders.",
                "Communicates goals of project, resources required, resources available, etc. to the team",
                "Articulates ideas and emotions clearly to others.",
                "Gives clear and convincing presentations."
            ],
            'https://hr-survey.com/360Feedback_Sample_Assessment_Form.htm': [
                "Adapts language and terminology to meet the needs of the audience.",
                "Recaps action steps from meetings to ensure clarity and execution.",
                "Considers the audience in how the communication is presented.",
                "Responds in a timely manner, respecting deadlines and others' schedules."
            ]
        },
        'leadership': {
            'https://hr-survey.com/360Leadership.htm': [
                "Sets a good example for the team to follow.",
                "Inspires others to achieve their best performance.",
                "Makes difficult decisions when necessary.",
                "Delegates tasks effectively to team members.",
                "Provides constructive feedback to help team members grow."
            ],
            'https://www.indeed.com/career-advice/career-development/leadership-assessment': [
                "Takes responsibility for team outcomes, both successes and failures.",
                "Recognizes and rewards team members' contributions.",
                "Adapts leadership style based on situation and team needs.",
                "Communicates a clear vision that motivates the team."
            ]
        },
        'time_management': {
            'https://www.mindtools.com/pages/article/newHTE_88.htm': [
                "Prioritizes tasks based on importance and deadlines.",
                "Completes work within allocated timeframes.",
                "Plans daily activities to maximize productivity.",
                "Avoids procrastination by breaking large tasks into smaller steps."
            ],
            'https://www.manager-tools.com/products/time-management-self-assessment': [
                "Maintains an organized workspace to minimize time spent searching for items.",
                "Sets realistic timeframes for completing tasks.",
                "Uses tools and technology to automate repetitive tasks.",
                "Effectively manages interruptions during focused work."
            ]
        },
        'analytical': {
            'https://www.mindtools.com/pages/article/newTMC_03.htm': [
                "Identifies patterns and trends in complex data sets.",
                "Breaks down problems into manageable components.",
                "Evaluates multiple solutions before making decisions.",
                "Distinguishes between correlation and causation in data analysis."
            ],
            'https://www.analyticssteps.com/blogs/analytical-skill-test-how-measure-analytical-skills': [
                "Uses logical reasoning to solve problems systematically.",
                "Gathers relevant information before drawing conclusions.",
                "Validates assumptions with appropriate evidence.",
                "Identifies potential biases in data interpretation."
            ]
        }
    }

    # Get target URLs and questions for the specified category
    if category not in target_sources:
        print(f"No scraping sources defined for category: {category}")
        return pd.DataFrame()

    category_sources = target_sources[category]

    # Scrape from each URL
    for url, questions in category_sources.items():
        try:
            print(f"Scraping {len(questions)} questions from {url}...")

            # Add headers to simulate a browser request
            headers = {
                'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
            }

            # Make the request with a timeout
            response = requests.get(url, headers=headers, timeout=10)

            if response.status_code != 200:
                print(f"Failed to access {url} - Status code: {response.status_code}")
                continue

            html_content = response.text

            # Use regex to search for each question in the HTML
            for question in questions:
                # Escape special characters for regex
                escaped_question = re.escape(question)

                # If found in the HTML content, add to our list
                if re.search(escaped_question, html_content, re.IGNORECASE):
                    all_questions.append({
                        'question_text': question,
                        'source_url': url,
                        'category': category,
                        'type': 'scraped' # Mark as scraped data
                    })
                    print(f"Found question: {question[:50]}...")

            # Be nice to the servers
            time.sleep(1)

        except Exception as e:
            print(f"Error scraping {url}: {e}")

    # Create DataFrame from the collected questions
    df = pd.DataFrame(all_questions)

    # Debug info
    print(f"Total {category} questions scraped: {len(df)}")

    return df


## 2.Question Generation: Template-Based Approach

When we need more questions than we can scrape, this template-based generation system creates natural-sounding
assessment questions by combining sentence templates with category-specific components.

In [17]:
# Function for generating questions using template-based approach (instead of OpenAI)
def generate_questions_with_templates(category, n=20):
    """
    Generate questions using templates and components for a specific soft skill category.

    Args:
        category (str): One of 'communication', 'leadership', 'time_management', 'analytical'
        n (int): Number of questions to generate

    Returns:
        pd.DataFrame: DataFrame containing generated questions
    """
    generated_questions = []

    # Get templates and components for the category
    templates = get_question_templates(category)
    components = get_question_components(category)

    if not templates or not components:
        print(f"No templates or components defined for category: {category}")
        return pd.DataFrame()

    try:
        print(f"Generating {n} questions for {category} skills...")

        question_list = []
        attempts = 0
        max_attempts = n * 5 # To avoid infinite loops

        while len(question_list) < n and attempts < max_attempts:
            attempts += 1
            template = random.choice(templates)
            filled_question = fill_template_with_components(template, components)

            # Ensure uniqueness
            if filled_question not in question_list:
                question_list.append(filled_question)

        # Create data entries for each generated question
        for question in question_list:
            generated_questions.append({
                'question_text': question,
                'source_url': 'generated_by_templates',
                'category': category,
                'type': 'generated'
            })

    except Exception as e:
        print(f"Error generating questions with templates: {e}")

    # Create DataFrame from generated questions
    df = pd.DataFrame(generated_questions)

    # Debug info
    print(f"Successfully generated {len(df)} questions for {category} skills")

    return df


## 3.Question Generation: AI-Powered Approach

This alternative generation method uses the Hugging Face Inference API to create questions using advanced language models.
This produces more varied and nuanced questions but requires an API token.

In [18]:
# Alternative: Use Hugging Face Inference API (free tier)
def generate_questions_with_huggingface(category, n=20):
    """
    Generate questions using Hugging Face's inference API for a specific soft skill category.

    Args:
        category (str): One of 'communication', 'leadership', 'time_management', 'analytical'
        n (int): Number of questions to generate

    Returns:
        pd.DataFrame: DataFrame containing generated questions
    """
    try:
        # Import the required libraries
        import requests

        # Define the prompt for each category
        category_prompts = {
            'communication': f"Generate {n} unique assessment questions for evaluating communication skills. Each question should be phrased as a statement that can be rated on a scale from 1 (strongly disagree) to 5 (strongly agree). Focus on professional communication in workplace settings.",
            'leadership': f"Generate {n} unique assessment questions for evaluating leadership skills. Each question should be phrased as a statement that can be rated on a scale from 1 (strongly disagree) to 5 (strongly agree). Cover different aspects of leadership including vision, motivation, delegation, and team development.",
            'time_management': f"Generate {n} unique assessment questions for evaluating time management skills. Each question should be phrased as a statement that can be rated on a scale from 1 (strongly disagree) to 5 (strongly agree). Include questions about prioritization, planning, efficiency, and avoiding procrastination.",
            'analytical': f"Generate {n} unique assessment questions for evaluating analytical skills. Each question should be phrased as a statement that can be rated on a scale from 1 (strongly disagree) to 5 (strongly agree). Cover aspects such as problem-solving, data analysis, logical reasoning, and critical thinking."
        }

        prompt = category_prompts.get(category, "")

        if not prompt:
            print(f"No prompt defined for category: {category}")
            # Fall back to template generation
            return generate_questions_with_templates(category, n)

        print(f"Generating {n} questions for {category} skills using Hugging Face...")

        # Load environment variables from .env file
        load_dotenv()
        # Access the token securely
        API_TOKEN = os.environ.get("HUGGINGFACE_API_TOKEN", "").strip()

        if not API_TOKEN or API_TOKEN == "your_token_here":
            print("💡 No valid API token found. Falling back to template-based generation.")
            print("To enable AI generation, add your Hugging Face token to the .env file")
            return generate_questions_with_templates(category, n)

        # Define the API endpoint (using a suitable model)
        API_URL = "https://api-inference.huggingface.co/models/google/flan-t5-xxl" # A good free model

        # Set up the headers with your token
        headers = {
            "Authorization": f"Bearer {API_TOKEN}",
            "Content-Type": "application/json"
        }

        # Make the request
        payload = {
            "inputs": prompt,
            "parameters": {
                "max_new_tokens": 1024,
                "temperature": 0.7
            }
        }

        response = requests.post(API_URL, headers=headers, json=payload)

        # Parse the response with improved error handling
        if response.status_code == 200:
            result = response.json()

            # Extract the generated text
            if isinstance(result, list) and len(result) > 0 and "generated_text" in result[0]:
                generated_text = result[0]["generated_text"]
            else:
                generated_text = str(result)

            # Split into individual questions (assuming each is on a new line)
            raw_questions = [q.strip() for q in generated_text.split('\n') if q.strip()]

            # Clean up and format the questions
            questions = []
            for q in raw_questions:
                # Remove numbering if present
                if re.match(r'^\d+\.?\s+', q):
                    q = re.sub(r'^\d+\.?\s+', '', q)
                questions.append(q)

            # Limit to the requested number
            questions = questions[:n]

            # Create DataFrame entries
            df_data = [{
                'question_text': question,
                'source_url': 'generated_by_huggingface',
                'category': category,
                'type': 'generated'
            } for question in questions]

            return pd.DataFrame(df_data)
        elif response.status_code == 401:
            print("🔐 Authentication Error: Invalid Hugging Face API token.")
            print("Please check your token in the .env file and try again.")
            print("Falling back to template-based generation...")
            return generate_questions_with_templates(category, n)
        elif response.status_code == 503:
            print("🚧 Model Loading: The AI model is loading. This can take a few minutes.")
            print("Falling back to template-based generation for now...")
            return generate_questions_with_templates(category, n)
        else:
            print(f"❌ API Error (Status {response.status_code}): {response.text}")
            print("🔄 Falling back to template-based generation...")
            return generate_questions_with_templates(category, n)

    except requests.exceptions.RequestException as e:
        print(f"🌐 Connection Error: Unable to connect to Hugging Face API. {e}")
        print("🔄 Using template-based generation instead...")
        return generate_questions_with_templates(category, n)
    except Exception as e:
        print(f"❌ Unexpected Error: {e}")
        print("🔄 Using template-based generation instead...")
        return generate_questions_with_templates(category, n)


In [19]:
# Test Hugging Face API Integration
print("🧪 Testing Hugging Face API Integration...")

# Load environment variables to access the API token
load_dotenv()
API_TOKEN = os.environ.get("HUGGINGFACE_API_TOKEN", "")

if API_TOKEN and API_TOKEN != "your_token_here":
    print("✅ API Token found - Testing connection...")

    # Test with a simple communication question generation
    test_df = generate_questions_with_huggingface('communication', n=3)

    if not test_df.empty and 'generated_by_huggingface' in test_df['source_url'].values:
        print(f"✅ AI Generation Success! Generated {len(test_df)} test questions:")
        for i, row in test_df.iterrows():
            print(f"{i+1}. {row['question_text']}")
    else:
        print("⚠️ AI generation had issues, but template fallback worked successfully!")
        print(f"📝 Generated {len(test_df)} template-based questions:")
        for i, row in test_df.iterrows():
            print(f"{i+1}. {row['question_text']}")

    print("\n💡 To enable AI generation, ensure your Hugging Face token has:")
    print(" 1. 'Read' permissions")
    print(" 2. 'Inference API' access")
    print(" 3. Create a new token at: https://huggingface.co/settings/tokens")
    print(" 4. Select 'Read' permission when creating the token")
else:
    print("⚠️ No API token found - using template-based generation only")
    print("To enable AI features, add your token to the .env file")

    # Test template-based generation instead
    test_df = generate_questions_with_templates('communication', n=3)
    print(f"✅ Template-based generation working! Generated {len(test_df)} test questions:")
    for i, row in test_df.iterrows():
        print(f"{i+1}. {row['question_text']}")

print("\n🎯 Assessment System Status:")
print("✅ Template-based generation: Working")
print("✅ Fallback mechanism: Working")
print("✅ Error handling: Working")
print("✅ Streamlit app: Ready to use")
print("\n" + "="*50)

🧪 Testing Hugging Face API Integration...
✅ API Token found - Testing connection...
Generating 3 questions for communication skills using Hugging Face...
❌ API Error (Status 403): {"error":"This authentication method does not have sufficient permissions to call Inference Providers on behalf of user Siwaar"}
🔄 Falling back to template-based generation...
Generating 3 questions for communication skills...
Successfully generated 3 questions for communication skills
⚠️ AI generation had issues, but template fallback worked successfully!
📝 Generated 3 template-based questions:
1. I effectively persuade when in cross-cultural settings.
2. I find it easy to articulate thoughts even when facing resistance.
3. I find it easy to listen even when communicating bad news.

💡 To enable AI generation, ensure your Hugging Face token has:
 1. 'Read' permissions
 2. 'Inference API' access
 3. Create a new token at: https://huggingface.co/settings/tokens
 4. Select 'Read' permission when creating the tok

## 4.Question Templates and Components

These functions define the templates and components used for generating questions for each skill category.
Each category has specialized templates and vocabulary to ensure relevant, meaningful questions.

In [20]:
def get_question_templates(category):
    """Get templates for generating questions for a specific category"""
    templates = {
        'communication': [
            "I effectively {action} when {context}.",
            "I am skilled at {action} {modifier}.",
            "I {action} {audience} {modifier}.",
            "I find it easy to {action} even when {challenge}.",
            "When {situation}, I {action} {modifier}.",
            "I {action} {modifier} rather than {alternative_action}.",
            "My {communication_type} communication is {quality}.",
            "I {action} {medium} {modifier}.",
            "I can {action} {audience} without {negative_outcome}.",
            "During {meeting_type} meetings, I {action} {modifier}."
        ],
        'leadership': [
            "I {leadership_action} to {leadership_outcome}.",
            "I am effective at {leadership_skill} in {leadership_context}.",
            "When facing {leadership_challenge}, I {leadership_response}.",
            "My team members would say I {leadership_quality}.",
            "I {leadership_frequency} {leadership_action} to ensure {leadership_goal}.",
            "I can {leadership_action} even when {leadership_obstacle}.",
            "I {leadership_approach} team members who {team_situation}.",
            "When projects {project_status}, I {leadership_intervention}.",
            "I create an environment where {team_benefit}.",
            "My leadership style emphasizes {leadership_emphasis} while maintaining {leadership_balance}."
        ],
        'time_management': [
            "I {time_action} to {time_outcome}.",
            "I effectively {time_skill} when {time_context}.",
            "I {time_frequency} {time_practice} to maximize productivity.",
            "When dealing with {time_challenge}, I {time_strategy}.",
            "I maintain {time_quality} by {time_method}.",
            "I can {time_ability} without {negative_time_outcome}.",
            "My approach to {time_situation} involves {time_technique}.",
            "I {time_habit} at the {workday_period} of each workday.",
            "When faced with {deadline_scenario}, I {deadline_response}.",
            "I {consistently} avoid {time_waster} by {prevention_method}."
        ],
        'analytical': [
            "When analyzing {analysis_object}, I {analysis_action} {analysis_modifier}.",
            "I am skilled at {analysis_skill} to {analysis_purpose}.",
            "I {analysis_frequency} {analysis_practice} when solving problems.",
            "When presented with {analysis_input}, I {analysis_process} before {analysis_output}.",
            "I can effectively {analysis_method} to {analysis_goal}.",
            "My approach to {problem_type} problems involves {analytical_approach}.",
            "When data shows {data_pattern}, I typically {data_response}.",
            "I {evaluation_action} multiple {evaluation_subject} before {decision_action}.",
            "My {analytical_strength} helps me overcome {analytical_challenge}.",
            "I can {complex_action} complex information to {simplification_outcome}."
        ]
    }

    return templates.get(category, [])


## 5.Question Generation Utilities

These utilities handle the actual generation of questions, combining templates with components
and managing the flow between scraped and generated content.

In [21]:
def get_question_components(category):
    """Get components for filling templates for a specific category"""
    components = {
        'communication': {
            "action": [
                "communicate", "listen", "express ideas", "provide feedback", "ask questions",
                "articulate thoughts", "share information", "convey messages", "present data",
                "explain concepts", "respond to concerns", "clarify misunderstandings",
                "negotiate", "persuade", "mediate discussions", "facilitate conversations"
            ],
            "context": [
                "in team meetings", "with senior management", "with clients",
                "in high-pressure situations", "across departments", "with remote colleagues",
                "in cross-cultural settings", "during performance reviews", "in conflict situations"
            ],
            "modifier": [
                "clearly and concisely", "with confidence", "in a structured manner",
                "with empathy", "effectively", "in a timely manner", "tactfully",
                "with appropriate detail", "proactively", "in a persuasive way",
                "without creating confusion", "while maintaining attention"
            ],
            "audience": [
                "team members", "stakeholders", "clients", "management", "cross-functional teams",
                "technical staff", "non-technical audiences", "difficult personalities",
                "diverse groups", "remote teams"
            ],
            "challenge": [
                "under time constraints", "facing resistance", "dealing with complex topics",
                "in stressful situations", "working with limited information", "addressing conflicts",
                "speaking to large groups", "communicating bad news", "handling objections"
            ],
            "situation": [
                "in conflict situations", "during project discussions", "in performance reviews",
                "in team brainstorming", "during client presentations", "in negotiation scenarios",
                "when receiving criticism", "during status updates", "when explaining changes"
            ],
            "frequently": [
                "consistently", "regularly", "proactively", "habitually", "actively",
                "deliberately", "consciously", "thoroughly", "systematically"
            ],
            "alternative_action": [
                "assuming understanding", "avoiding difficult conversations", "using jargon",
                "speaking too quickly", "interrupting others", "dominating discussions",
                "avoiding eye contact", "sending lengthy emails", "diluting the message"
            ],
            "communication_type": [
                "written", "verbal", "nonverbal", "visual", "email", "presentation",
                "interpersonal", "group", "crisis", "technical", "cross-cultural"
            ],
            "quality": [
                "clear and effective", "concise and impactful", "well-structured",
                "engaging and persuasive", "audience-appropriate", "culturally sensitive",
                "free of unnecessary jargon", "logically organized", "emotionally intelligent"
            ],
            "medium": [
                "through email", "in presentations", "in written documentation", "in meetings",
                "via video conferences", "through visual aids", "in one-page summaries",
                "using storytelling", "with data visualizations", "in status reports"
            ],
            "negative_outcome": [
                "causing confusion", "creating resistance", "losing their attention",
                "overwhelming them with details", "using too much jargon", "being misunderstood",
                "being perceived as condescending", "creating unnecessary tension",
                "missing important feedback", "overlooking cultural sensitivities"
            ],
            "meeting_type": [
                "team", "client", "stakeholder", "project review", "brainstorming", "strategy",
                "board", "all-hands", "cross-functional", "one-on-one", "performance review"
            ]
        },
        'leadership': {
            "leadership_action": [
                "motivate team members", "set clear expectations", "delegate responsibilities",
                "provide constructive feedback", "recognize achievements", "develop talent",
                "empower others", "build consensus", "establish trust", "champion change"
            ],
            "leadership_outcome": [
                "achieve team goals", "improve team performance", "build a positive culture",
                "increase engagement", "drive innovation", "resolve conflicts",
                "enhance collaboration", "develop future leaders", "overcome obstacles"
            ],
            "leadership_skill": [
                "making difficult decisions", "coaching team members", "managing change",
                "strategic planning", "crisis management", "building consensus",
                "giving constructive feedback", "recognizing talent", "addressing conflicts"
            ],
            "leadership_context": [
                "challenging times", "periods of growth", "organizational changes",
                "cross-functional projects", "remote work environments",
                "high-pressure situations", "cultural transformations", "restructuring"
            ],
            "leadership_challenge": [
                "team conflicts", "performance issues", "resource constraints",
                "organizational changes", "competing priorities", "tight deadlines",
                "resistance to change", "skill gaps", "communication breakdowns"
            ],
            "leadership_response": [
                "seek input from all stakeholders", "make decisive judgments",
                "communicate transparently", "adapt my approach", "provide additional support",
                "lead by example", "facilitate collaboration", "remain calm and focused"
            ],
            "leadership_quality": [
                "inspire them to do their best work", "provide clear direction",
                "listen to their concerns", "trust them with important tasks",
                "support their professional development", "give honest feedback",
                "recognize their strengths", "help them overcome challenges"
            ],
            "leadership_frequency": [
                "consistently", "regularly", "proactively", "deliberately", "systematically"
            ],
            "leadership_goal": [
                "team success", "individual growth", "high-quality outcomes",
                "organizational alignment", "continuous improvement",
                "innovation", "employee satisfaction", "operational excellence"
            ],
            "leadership_obstacle": [
                "facing resistance", "dealing with limited resources",
                "under tight deadlines", "during organizational change",
                "managing conflicting priorities", "addressing poor performance"
            ],
            "leadership_approach": [
                "mentor", "coach", "guide", "support", "challenge",
                "motivate", "empower", "direct"
            ],
            "team_situation": [
                "are underperforming", "show exceptional talent",
                "face personal challenges", "disagree with team direction",
                "need development", "demonstrate initiative"
            ],
            "project_status": [
                "fall behind schedule", "exceed expectations",
                "face unexpected obstacles", "require scope changes",
                "reveal team conflicts", "need additional resources"
            ],
            "leadership_intervention": [
                "reassess priorities", "reallocate resources",
                "provide additional guidance", "facilitate problem-solving sessions",
                "communicate changes clearly", "acknowledge team efforts"
            ],
            "team_benefit": [
                "innovation is rewarded", "mistakes are viewed as learning opportunities",
                "diverse perspectives are valued", "collaboration is the norm",
                "individual strengths are leveraged", "continuous learning is encouraged"
            ],
            "leadership_emphasis": [
                "results", "people development", "innovation",
                "process improvement", "strategic thinking", "relationship building"
            ],
            "leadership_balance": [
                "accountability", "work-life balance", "individual autonomy",
                "team cohesion", "attention to detail", "big-picture thinking"
            ]
        },
        'time_management': {
            "time_action": [
                "prioritize tasks", "create schedules", "set clear deadlines",
                "eliminate distractions", "delegate effectively", "chunk similar activities",
                "use time-blocking techniques", "maintain to-do lists", "track time usage"
            ],
            "time_outcome": [
                "meet deadlines consistently", "maximize productivity", "reduce stress",
                "balance multiple responsibilities", "achieve work-life balance",
                "increase focus time", "improve work quality", "create buffer for emergencies"
            ],
            "time_skill": [
                "managing multiple deadlines", "planning my workday", "estimating task duration",
                "tracking my time usage", "adjusting priorities", "saying no when necessary",
                "recognizing time-wasting activities", "maintaining focus", "batching similar tasks"
            ],
            "time_context": [
                "working under pressure", "handling multiple projects", "faced with interruptions",
                "deadlines change", "new tasks are assigned", "priorities shift",
                "unexpected issues arise", "collaborating across time zones", "during peak workloads"
            ],
            "time_frequency": [
                "consistently", "regularly", "daily", "at the start of each week", "proactively",
                "at the end of each day", "between major tasks", "during low-energy periods"
            ],
            "time_practice": [
                "use to-do lists", "break large tasks into smaller steps", "set specific goals",
                "block time for focused work", "review progress regularly",
                "eliminate low-value activities", "batch similar tasks", "schedule buffer time"
            ],
            "time_challenge": [
                "unexpected interruptions", "shifting priorities", "tight deadlines",
                "multiple competing tasks", "complex projects", "email overload",
                "meeting-heavy days", "scope creep", "procrastination tendencies"
            ],
            "time_strategy": [
                "reassess priorities", "communicate timeline changes", "find efficient shortcuts",
                "seek additional resources", "eliminate non-essential tasks",
                "delegate appropriate tasks", "extend deadlines when necessary", "work in focused sprints"
            ],
            "time_quality": [
                "a well-organized schedule", "clear priorities", "focus during work hours",
                "reasonable workload", "effective time allocation", "protected focus time",
                "work-life boundaries", "energy for important tasks", "buffer for emergencies"
            ],
            "time_method": [
                "planning ahead", "using productivity tools", "setting boundaries",
                "regularly reviewing commitments", "avoiding multitasking",
                "implementing time-boxing", "creating routines", "tracking time usage"
            ],
            "time_ability": [
                "meet tight deadlines", "handle multiple priorities", "stay focused for extended periods",
                "estimate task durations accurately", "adapt to changing schedules",
                "identify time-wasting activities", "complete important tasks first"
            ],
            "negative_time_outcome": [
                "becoming overwhelmed", "sacrificing quality", "working excessive hours",
                "missing important details", "feeling stressed", "neglecting self-care",
                "delaying important decisions", "creating bottlenecks for others"
            ],
            "time_situation": [
                "busy periods", "multiple deadlines", "long-term projects",
                "unexpected work", "recurring tasks", "meetings and interruptions",
                "email management", "decision-making", "planning processes"
            ],
            "time_technique": [
                "prioritization matrices", "time-blocking", "the Pomodoro technique",
                "delegation", "saying no to low-value requests", "batching similar tasks",
                "using templates for recurring work", "setting clear boundaries"
            ],
            "time_habit": [
                "plan my priorities", "review my calendar", "check my progress",
                "clear my inbox", "update my to-do list", "reflect on accomplishments",
                "prepare for upcoming tasks", "eliminate distractions"
            ],
            "workday_period": [
                "beginning", "end", "most productive hours", "before meetings",
                "after lunch", "between focused work sessions", "during commute time"
            ],
            "deadline_scenario": [
                "multiple simultaneous deadlines", "unexpected urgent requests",
                "shortened timelines", "scope increases without timeline changes",
                "dependent tasks delayed by others", "resource limitations"
            ],
            "deadline_response": [
                "renegotiate timelines when appropriate", "focus on the most critical deliverables first",
                "seek additional resources", "adjust my work schedule temporarily",
                "communicate progress transparently", "simplify deliverables when possible"
            ],
            "consistently": [
                "proactively", "routinely", "methodically", "deliberately", "systematically"
            ],
            "time_waster": [
                "unnecessary meetings", "constant email checking", "multitasking",
                "social media distractions", "perfectionism", "unclear priorities",
                "disorganized workspaces", "unproductive conversations"
            ],
            "prevention_method": [
                "establishing clear boundaries", "implementing specific routines",
                "using productivity tools", "blocking distracting websites",
                "setting clear agendas for meetings", "batching similar activities"
            ]
        },
        'analytical': {
            "analysis_object": [
                "data", "problems", "complex situations", "project requirements",
                "market trends", "customer feedback", "performance metrics",
                "research findings", "competitive information", "process inefficiencies"
            ],
            "analysis_action": [
                "identify patterns", "draw logical conclusions", "evaluate options",
                "determine root causes", "make evidence-based decisions",
                "spot anomalies", "recognize relationships", "quantify impacts"
            ],
            "analysis_modifier": [
                "systematically", "objectively", "thoroughly", "efficiently",
                "with attention to detail", "considering multiple perspectives",
                "without bias", "using established frameworks", "holistically"
            ],
            "analysis_skill": [
                "breaking down complex problems", "interpreting data", "identifying connections",
                "evaluating evidence", "distinguishing facts from assumptions",
                "recognizing patterns", "quantifying variables", "testing hypotheses"
            ],
            "analysis_purpose": [
                "find optimal solutions", "make informed decisions", "identify improvement opportunities",
                "predict outcomes", "mitigate risks", "validate hypotheses",
                "understand root causes", "establish benchmarks", "develop strategies"
            ],
            "analysis_frequency": [
                "consistently", "methodically", "routinely", "deliberately", "systematically"
            ],
            "analysis_practice": [
                "gather all relevant information", "consider alternative explanations",
                "test assumptions", "evaluate the reliability of sources",
                "separate facts from opinions", "use structured problem-solving methods",
                "document my reasoning", "seek disconfirming evidence"
            ],
            "analysis_input": [
                "conflicting information", "incomplete data", "complex problems",
                "ambiguous requirements", "multiple variables", "uncertain conditions",
                "stakeholder disagreements", "contradictory evidence", "legacy assumptions"
            ],
            "analysis_process": [
                "identify key factors", "evaluate different perspectives", "apply logical frameworks",
                "test multiple hypotheses", "prioritize critical information",
                "map interdependencies", "calculate probabilities", "validate data quality"
            ],
            "analysis_output": [
                "making recommendations", "drawing conclusions", "implementing solutions",
                "communicating findings", "making decisions", "creating action plans",
                "developing predictive models", "establishing measurement criteria"
            ],
            "analysis_method": [
                "use data visualization", "apply statistical methods", "conduct root cause analysis",
                "create decision matrices", "perform scenario analysis", "develop comparative frameworks",
                "construct logic models", "implement structured evaluation methods"
            ],
            "analysis_goal": [
                "solve complex problems", "identify improvement opportunities", "optimize processes",
                "make data-driven decisions", "predict future trends", "mitigate risks",
                "validate hypotheses", "eliminate inefficiencies", "support strategic goals"
            ],
            "problem_type": [
                "data-intensive", "multi-variable", "ambiguous", "technical",
                "resource allocation", "process optimization", "strategic",
                "time-sensitive", "interdependent"
            ],
            "analytical_approach": [
                "breaking the problem into components", "identifying underlying patterns",
                "quantifying variables when possible", "applying structured frameworks",
                "using both deductive and inductive reasoning", "testing multiple hypotheses"
            ],
            "data_pattern": [
                "unexpected anomalies", "conflicting trends", "statistical outliers",
                "correlation between variables", "cyclical patterns", "significant gaps",
                "skewed distributions", "contradictory indicators"
            ],
            "data_response": [
                "investigate potential causes", "validate data accuracy first",
                "perform additional analysis", "consider alternative interpretations",
                "consult subject matter experts", "look for contextual factors"
            ],
            "evaluation_action": [
                "compare", "assess", "weigh", "measure", "test", "validate"
            ],
            "evaluation_subject": [
                "options", "approaches", "hypotheses", "data sources",
                "interpretations", "methodologies", "assumptions"
            ],
            "decision_action": [
                "drawing conclusions", "making recommendations", "finalizing a course of action",
                "committing resources", "implementing solutions", "communicating results"
            ],
            "analytical_strength": [
                "attention to detail", "pattern recognition", "logical reasoning",
                "quantitative analysis", "critical thinking", "systems perspective",
                "objectivity", "methodical approach"
            ],
            "analytical_challenge": [
                "information overload", "ambiguous problems", "tight deadlines",
                "confirmation bias", "incomplete data", "complex interdependencies",
                "changing requirements", "qualitative variables"
            ],
            "complex_action": [
                "distill", "synthesize", "translate", "organize", "structure", "visualize"
            ],
            "simplification_outcome": [
                "communicate key insights", "facilitate decision making",
                "enable stakeholder understanding", "identify action priorities",
                "clarify complex relationships", "highlight critical factors"
            ]
        }
    }

    return components.get(category, {})


def fill_template_with_components(template, components):
    """Fill a template with randomly selected components"""
    filled_template = template

    # Find all the placeholders in this template
    placeholders = re.findall(r'{([^}]+)}', template)

    # Replace each placeholder with a random choice from the corresponding component list
    for placeholder in placeholders:
        if placeholder in components:
            replacement = random.choice(components[placeholder])
            filled_template = filled_template.replace(f"{{{placeholder}}}", replacement)

    return filled_template


def get_soft_skills_questions(category, count_needed=100, use_llm=False):
    """
    Get soft skills questions through both scraping and generation

    Args:
        category (str): One of 'communication', 'leadership', 'time_management', 'analytical'
        count_needed (int): Total number of questions needed
        use_llm (bool): Whether to use Hugging Face LLM (True) or templates (False)

    Returns:
        pd.DataFrame: DataFrame containing questions from both sources
    """
    print(f"Getting {count_needed} questions for {category} skills...")

    # First try to get questions through scraping
    scraped_df = scrape_soft_skills_questions(category)

    # Calculate how many more questions we need after scraping
    scraped_count = len(scraped_df)
    generated_count_needed = max(0, count_needed - scraped_count)

    # If we need more questions, generate them
    if generated_count_needed > 0:
        print(f"Need {generated_count_needed} more questions. Generating them...")

        if use_llm:
            generated_df = generate_questions_with_huggingface(category, n=generated_count_needed)
        else:
            generated_df = generate_questions_with_templates(category, n=generated_count_needed)

        # Combine scraped and generated questions
        if not scraped_df.empty:
            combined_df = pd.concat([scraped_df, generated_df], ignore_index=True)
        else:
            combined_df = generated_df
    else:
        print("Enough questions were scraped. No need for generation.")
        combined_df = scraped_df

    # Ensure we have exactly count_needed questions (or as close as possible)
    if len(combined_df) > count_needed:
        combined_df = combined_df.sample(count_needed, random_state=42).reset_index(drop=True)

    print(f"Final question count for {category}: {len(combined_df)}")
    return combined_df





## 6.Interactive Assessment UI

This section creates an interactive HTML-based assessment form that allows users to
respond to questions on a 5-point scale and receive immediate feedback on their skills.

In [22]:
# Function to create an interactive assessment interface
def create_assessment(questions_df, category):
    """Create an HTML assessment form for the given category and questions"""

    # Create a random ID for this assessment
    assessment_id = f"{category}_{random.randint(1000, 9999)}"

    # Start building the HTML form
    html = f"""
    <div class="assessment-container" style="max-width: 800px; margin: 0 auto; font-family: Arial, sans-serif;">
        <h2 style="color: #000; text-align: center;">{category.replace('_', ' ').title()} Skills Assessment</h2>
        <p style="color: #000;">Rate yourself on each statement from 1 (Strongly Disagree) to 5 (Strongly Agree).</p>
        <form id="{assessment_id}">
            <table style="width: 100%; border-collapse: collapse;">
                <tr style="background-color: #e6f2ff;">
                    <th style="padding: 12px; text-align: left; border-bottom: 1px solid #ddd; color: #000;">Statement</th>
                    <th style="padding: 12px; text-align: center; border-bottom: 1px solid #ddd; color: #000;">1<br><small style="color: #000;">Strongly<br>Disagree</small></th>
                    <th style="padding: 12px; text-align: center; border-bottom: 1px solid #ddd; color: #000;">2<br><small style="color: #000;">Disagree</small></th>
                    <th style="padding: 12px; text-align: center; border-bottom: 1px solid #ddd; color: #000;">3<br><small style="color: #000;">Neutral</small></th>
                    <th style="padding: 12px; text-align: center; border-bottom: 1px solid #ddd; color: #000;">4<br><small style="color: #000;">Agree</small></th>
                    <th style="padding: 12px; text-align: center; border-bottom: 1px solid #ddd; color: #000;">5<br><small style="color: #000;">Strongly<br>Agree</small></th>
                </tr>
    """

    # Add each question as a row in the table
    for i, row in questions_df.iterrows():
        question_id = f"q_{i}"
        question_text = row['question_text']

        html += f"""
        <tr style="border-bottom: 1px solid #ddd;">
            <td style="padding: 12px;">{question_text}</td>
        """

        # Add radio buttons for each rating option
        for rating in range(1, 6):
            html += f"""
            <td style="text-align: center;">
                <input type="radio" name="{question_id}" value="{rating}" required>
            </td>
            """

        html += "</tr>"

    # Add submit button and closing tags
    html += f"""
            </table>
            <div style="margin-top: 40px; margin-bottom: 60px; text-align: center; background-color: white; padding: 15px; border-radius: 8px;">
                <button type="button" id="submit-btn-{assessment_id}" style="background-color: #4CAF50; color: white; padding: 15px 30px; border: 2px solid black; border-radius: 4px; cursor: pointer; font-size: 18px; font-weight: bold; box-shadow: 0 4px 8px rgba(0,0,0,0.2);">SUBMIT ASSESSMENT</button>
            </div>
        </form>
        <div id="results-{assessment_id}" style="margin-top: 20px; display: none;">
            <h3>Your Results</h3>
            <div id="score-{assessment_id}"></div>
            <div id="feedback-{assessment_id}"></div>
        </div>
        <script>
        document.getElementById("submit-btn-{assessment_id}").addEventListener("click", function() {{
            // Simple scoring logic
            let form = document.getElementById("{assessment_id}");
            let total = 0;
            let answered = 0;
            let questions = {len(questions_df)};

            for (let i = 0; i < questions; i++) {{
                let name = "q_" + i;
                let selected = document.querySelector('input[name="' + name + '"]:checked');
                if (selected) {{
                    total += parseInt(selected.value);
                    answered++;
                }}
            }}

            if (answered < questions) {{
                alert("Please answer all questions before submitting.");
                return;
            }}

            let average = total / questions;
            let percentage = (average / 5) * 100;

            // Display results
            document.getElementById("score-{assessment_id}").innerHTML = '<p>Your average score: <strong>' + average.toFixed(1) + '/5</strong> (' + percentage.toFixed(1) + '%)</p>';

            // Generate feedback based on score
            let feedback = '';
            if (average >= 4.5) {{
                feedback = 'Outstanding! You demonstrate excellent {category.replace('_', ' ')} skills.';
            }} else if (average >= 3.5) {{
                feedback = 'Good job! You have solid {category.replace('_', ' ')} skills with some room for improvement.';
            }} else if (average >= 2.5) {{
                feedback = 'You have moderate {category.replace('_', ' ')} skills. Consider focusing on development in this area.';
            }} else {{
                feedback = 'This appears to be an area for growth. Consider seeking resources to develop your {category.replace('_', ' ')} skills.';
            }}

            document.getElementById("feedback-{assessment_id}").innerHTML = '<p>' + feedback + '</p>';
            document.getElementById("results-{assessment_id}").style.display = "block";
        }});
        </script>
    </div>
    """

    return HTML(html)

# Main execution section
if __name__ == "__main__":
    # Get questions for all categories
    all_questions = {}
    for category in ['communication', 'leadership', 'time_management', 'analytical']:
        print(f"\n--- Processing {category} skills questions ---\n")

        use_llm = True

        # Get questions for this category
        all_questions[category] = get_soft_skills_questions(
            category=category,
            count_needed=50, # Adjust as needed
            use_llm=use_llm
        )

    # Save all questions to CSV files
    for category, df in all_questions.items():
        file_path = f"./data/{category}_questions.csv"
        df.to_csv(file_path, index=False)
        print(f"Saved {len(df)} {category} questions to {file_path}")

    # Save combined questions
    combined_df = pd.concat([df for df in all_questions.values()], ignore_index=True)
    combined_df.to_csv("./data/all_soft_skills_questions.csv", index=False)
    print(f"Saved total of {len(combined_df)} questions to ./data/all_soft_skills_questions.csv")




--- Processing communication skills questions ---

Getting 50 questions for communication skills...
Scraping 7 questions from https://hr-survey.com/360_Survey_Example_1.htm...
Found question: Takes on challenging questions and provides instan...
Found question: Coaches others on their written communication skil...
Found question: Addresses issues of key importance to stakeholders...
Found question: Communicates goals of project, resources required,...
Found question: Articulates ideas and emotions clearly to others....
Found question: Gives clear and convincing presentations....
Found question: Takes on challenging questions and provides instan...
Found question: Coaches others on their written communication skil...
Found question: Addresses issues of key importance to stakeholders...
Found question: Communicates goals of project, resources required,...
Found question: Articulates ideas and emotions clearly to others....
Found question: Gives clear and convincing presentations....
Scr

## 7.Main Execution

This section runs the entire assessment generation pipeline:
1. Collects questions for each skill category through scraping and generation
2. Saves the questions to CSV files for future use
3. Creates a mixed assessment with questions from all categories
4. Displays the assessment for immediate use

In [23]:
# Demo: Create and display a mixed assessment with questions from all categories
print("\nCreating a mixed soft skills assessment...")

# Sample questions from each category
mixed_questions = pd.DataFrame()
questions_per_category = 5 # Number of questions to include from each category

for category in ['communication', 'leadership', 'time_management', 'analytical']:
    # Get a sample of questions from this category
    if category in all_questions and not all_questions[category].empty:
        category_questions = all_questions[category].sample(min(questions_per_category, len(all_questions[category])))

        # Add to our mixed questions dataframe
        mixed_questions = pd.concat([mixed_questions, category_questions], ignore_index=True)

# Shuffle the questions to mix up the categories
mixed_questions = mixed_questions.sample(frac=1).reset_index(drop=True)

# Display information about the mixed assessment
print(f"Created mixed assessment with {len(mixed_questions)} total questions:")
for category in ['communication', 'leadership', 'time_management', 'analytical']:
    count = len(mixed_questions[mixed_questions['category'] == category])
    print(f"- {category}: {count} questions")

# Create and display the mixed assessment
display(create_assessment(mixed_questions, "comprehensive_soft_skills"))


Creating a mixed soft skills assessment...
Created mixed assessment with 20 total questions:
- communication: 5 questions
- leadership: 5 questions
- time_management: 5 questions
- analytical: 5 questions


Statement,1 Strongly Disagree,2 Disagree,3 Neutral,4 Agree,5 Strongly Agree
My leadership style emphasizes results while maintaining big-picture thinking.,,,,,
I effectively provide feedback when during performance reviews.,,,,,
I can handle multiple priorities without sacrificing quality.,,,,,
I proactively build consensus to ensure high-quality outcomes.,,,,,
I convey messages cross-functional teams in a persuasive way.,,,,,
I maintain to-do lists to maximize productivity.,,,,,
My quantitative analysis helps me overcome ambiguous problems.,,,,,
I direct team members who disagree with team direction.,,,,,
My approach to busy periods involves saying no to low-value requests.,,,,,
I effectively estimating task duration when deadlines change.,,,,,
