In [11]:
# **Preprompt for Code Understanding and Modification**
#
# You are being provided with a Python script that processes conversation transcripts to extract and analyze specific information related to Socratic methods. Your task is to understand the structure and functionality of the code and modify it as needed. This may include improving performance, adding new features, or adapting it to different requirements.
#
# Please carefully review the code provided below, paying special attention to the following key aspects:
#
# ---
#
# ### **Functionality Overview:**
#
# #### **1. Conversation Parsing:**
#
# - **Transcript Reading:**
#   - The script reads a chat transcript from a file specified by the user.
#   - It checks if the conversation starts with a specific Socratic system prompt using the `check_socratic_system_prompt` function.
#
# - **Segmentation into Pairs:**
#   - The conversation is segmented into pairs of user prompts and assistant responses.
#   - It preserves the original paragraph structure of the conversation.
#
# #### **2. Response Segmentation:**
#
# - **Assistant Response Categories:**
#   - Assistant responses are further segmented into four specific categories:
#     - **Selected Principle(s)**
#     - **Socratic Reformulation**
#     - **Self-Query and Answer**
#     - **Follow-Up Questions**
#   - The segmentation preserves paragraph structure within each category.
#
# #### **3. Socratic Methods Identification:**
#
# - **Method Sequence Extraction:**
#   - The script identifies sequences of Socratic methods within texts using a predefined list.
#   - It handles various formats and connectors between methods (e.g., "→", "-&gt;", ",", "/", "and").
#
# - **Application Areas:**
#   - Socratic methods are identified in:
#     - Selected principles
#     - Socratic reformulations
#     - Follow-up questions
#
# #### **4. Similarity Computation:**
#
# - **Purpose:**
#   - Computes the similarity between user prompts and the assistant's previous follow-up questions.
#
# - **Methodology:**
#   - Uses TF-IDF vectorization and cosine similarity for comparison.
#   - Incorporates text preprocessing steps such as lowercasing, punctuation removal, tokenization, and stopword removal.
#
# #### **5. Processing and Exporting Results:**
#
# - **Data Structuring:**
#   - Processes conversation data to build structured representations of each conversation turn.
#   - Associates extracted Socratic methods and similarity scores with each turn.
#
# - **Result Exporting:**
#   - Exports results in either a verbose or concise format.
#     - **Verbose:** Includes detailed conversation data and comparisons.
#     - **Concise:** Displays key information such as Socratic methods and similarity scores.
#
# ---
#
# ### **Key Functions:**
#
# #### **1. `parse_assistant_response(response_text)`:**
#
# - **Purpose:**
#   - Segments an assistant's response into the four predefined categories.
# - **Features:**
#   - Preserves paragraph structure within each category.
#   - Returns a dictionary with each category as keys.
#
# #### **2. `segment_conversation_preserve_paragraphs(file_path)`:**
#
# - **Purpose:**
#   - Reads the conversation transcript from a file and segments it into user-assistant pairs.
# - **Features:**
#   - Preserves the original paragraph structure.
#   - Calls `parse_assistant_response` for each assistant response.
#
# #### **3. `identify_socratic_sequence(text, socratic_methods)`:**
#
# - **Purpose:**
#   - Identifies the sequence of Socratic methods in a given text.
# - **Features:**
#   - Handles various separators and connectors.
#   - Performs case-insensitive matching.
#   - Filters valid Socratic methods from the provided list.
#
# #### **4. `parse_followup_questions(followup_text)`:**
#
# - **Purpose:**
#   - Parses the "Follow-Up Questions" section into individual questions.
# - **Features:**
#   - Extracts question numbers, associated Socratic methods, and question texts.
#   - Handles various formatting scenarios.
#
# #### **5. `process_conversation(paired_conversation, socratic_methods)`:**
#
# - **Purpose:**
#   - Processes the conversation pairs and builds data structures.
# - **Features:**
#   - Extracts and associates assistant methods and segmented responses.
#   - Prepares data for similarity computation.
#
# #### **6. `compute_similarities(conversation_data)`:**
#
# - **Purpose:**
#   - Computes similarities between user prompts and assistant's previous follow-up questions.
# - **Features:**
#   - Utilizes text preprocessing and cosine similarity calculation.
#   - Associates highest similarity scores with conversation turns.
#
# #### **7. `compute_similarity(text1, text2)`:**
#
# - **Purpose:**
#   - Computes the cosine similarity between two text strings after preprocessing.
# - **Features:**
#   - Uses NLTK for text processing and scikit-learn for vectorization and similarity computation.
#
# #### **8. `export_results(conversation_data)`:**
#
# - **Purpose:**
#   - Exports the results by printing specified information.
# - **Features:**
#   - Displays Socratic methods and similarity scores.
#   - Provides a concise summary of each conversation turn.
#
# #### **9. `export_results_verbose(conversation_data)`:**
#
# - **Purpose:**
#   - Exports detailed conversation data and comparison results.
# - **Features:**
#   - Includes all segmented assistant responses and associated information.
#
# #### **10. `check_socratic_system_prompt(file_path)`:**
#
# - **Purpose:**
#   - Checks if the conversation file starts with the Socratic system prompt.
# - **Features:**
#   - Searches for specific keywords at the beginning of the file.
#
# ---
#
# ### **Points to Consider:**
#
# #### **1. Text Preprocessing:**
#
# - **Enhancements:**
#   - Ensure text preprocessing in `compute_similarity` is thorough and efficient.
#   - Consider additional steps (e.g., lemmatization) to improve similarity accuracy.
#
# #### **2. Regex Patterns:**
#
# - **Review:**
#   - Refine regular expressions used in parsing functions.
#   - Ensure they handle various formats and connectors correctly.
#   - Pay attention to edge cases and inconsistencies in conversation transcripts.
#
# #### **3. Error Handling:**
#
# - **Robustness:**
#   - Verify that functions handle unexpected input gracefully.
#   - Implement error checks or exception handling where necessary.
#   - Prevent index errors and ensure lists are not modified unexpectedly (e.g., cautious use of `del` statements).
#
# #### **4. Extensibility:**
#
# - **Adaptability:**
#   - Consider adapting the script for additional categories or conversational structures.
#   - Modularize code for reusability and scalability.
#   - Facilitate easy updates to the list of Socratic methods.
#
# #### **5. Code Organization and Readability:**
#
# - **Best Practices:**
#   - Refactor code to enhance readability and maintainability.
#   - Use descriptive variable and function names.
#   - Ensure consistent coding style and formatting.
#
# ---
#
# ### **Modification Guidelines:**
#
# #### **1. Maintain Existing Functionality:**
#
# - **Integrity:**
#   - Do not break current behavior of the script.
#   - Preserve data parsing and analysis integrity.
#
# #### **2. Improve Readability and Efficiency:**
#
# - **Optimization:**
#   - Refactor code to enhance clarity.
#   - Optimize performance without sacrificing readability.
#   - Reduce redundancy and simplify complex logic where possible.
#
# #### **3. Document Changes:**
#
# - **Clarity:**
#   - Include comments explaining modifications or additions.
#   - Update docstrings to reflect any changes in function behavior or parameters.
#   - Maintain thorough documentation for future reference.
#
# #### **4. Testing:**
#
# - **Validation:**
#   - Test the script with various conversation transcripts.
#   - Ensure correct identification of Socratic methods.
#   - Verify that similarity scores are accurate and meaningful.
#
# #### **5. Compatibility and Dependencies:**
#
# - **Reliability:**
#   - Ensure all dependencies (e.g., NLTK, scikit-learn) are properly handled.
#   - Check compatibility with different Python versions.
#   - Provide clear instructions for setting up the environment if necessary.
#
# ---
#
# ### **Socratic Methods List:**
#
# The script uses the following list of Socratic methods for identification:
#
# - **Definition**
# - **Generalization**
# - **Induction**
# - **Elenchus**
# - **Hypothesis Elimination**
# - **Maieutics**
# - **Dialectic**
# - **Recollection**
# - **Analogy**
# - **Irony**
#
# ---
#
# **Note:** Ensure that you thoroughly understand each component of the script before making modifications. Pay close attention to how functions interact and how data flows through the program. Your goal is to enhance the script while maintaining its core functionality and improving its overall quality.
#
# --- End of Preprompt ---

In [12]:
import re
import nltk
nltk.download('punkt')
nltk.download('stopwords')

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

def compute_similarity(text1, text2):
    """
    Computes similarity between two texts using TF-IDF vectorization and cosine similarity.

    Args:
        text1 (str): First text string.
        text2 (str): Second text string.

    Returns:
        float: Cosine similarity score between 0 and 1.
    """
    import string
    from nltk.corpus import stopwords

    # Ensure NLTK stopwords are downloaded
    nltk.download('stopwords', quiet=True)
    stop_words = set(stopwords.words('english'))

    def preprocess(text):
        # Lowercase
        text = text.lower()
        # Remove punctuation
        text = text.translate(str.maketrans('', '', string.punctuation))
        # Tokenize
        words = nltk.word_tokenize(text)
        # Remove stopwords
        words = [word for word in words if word not in stop_words]
        # Rejoin into a string
        return ' '.join(words)

    # Preprocess both texts
    text1 = preprocess(text1)
    text2 = preprocess(text2)

    if not text1 or not text2:
        return 0.0  # If either text is empty after preprocessing

    vectorizer = TfidfVectorizer()
    vectors = vectorizer.fit_transform([text1, text2])

    cosine_sim = cosine_similarity(vectors)
    return cosine_sim[0][1]

def parse_assistant_response(response_text):
    """
    Further segments an assistant's response into four specific categories:
      - Selected Principle(s)
      - Socratic Reformulation
      - Self-Query and Answer
      - Follow-Up Questions

    Preserves paragraph structure for each category.
    Returns a dictionary where each key corresponds to one of these categories,
    and the value is a string containing that segment of text.

    If a category is not found, its value will be an empty string.
    """

    categories = {
        "Selected Principle(s)": [],
        "Socratic Reformulation": [],
        "Self-Query and Answer": [],
        "Follow-Up Questions": []
    }
    current_category = None

    # Split the entire assistant response into lines
    lines = response_text.split("\n")

    for line in lines:
        stripped_line = line.strip()

        # Check for possible category headers. 
        # Use "in" rather than "startswith" to handle lines like "Argo: Selected Principle(s): ...".
        if "Selected Principle(s):" in stripped_line:
            current_category = "Selected Principle(s)"
            categories[current_category].append(stripped_line)
        elif "Socratic Reformulation:" in stripped_line:
            current_category = "Socratic Reformulation"
            categories[current_category].append(stripped_line)
        elif "Self-Query and Answer:" in stripped_line:
            current_category = "Self-Query and Answer"
            categories[current_category].append(stripped_line)
        elif "Follow-Up Questions" in stripped_line:
            current_category = "Follow-Up Questions"
            categories[current_category].append(stripped_line)
        else:
            # If we are currently in one of the known categories, append the line there.
            if current_category:
                categories[current_category].append(line)

    # Convert each list of lines back into a single string (preserving original paragraph structure)
    segmented_response = {}
    for cat, cat_lines in categories.items():
        segmented_response[cat] = "\n".join(cat_lines).strip()

    return segmented_response


def segment_conversation_preserve_paragraphs(file_path):
    """
    Reads a chat transcript from a file and segments it into
    pairs of (user_prompt, assistant_prompt, segmented_assistant_response).
    The function assumes:
      - A user prompt always starts with a line beginning with "User:" or "You:".
      - An assistant prompt always starts with a line beginning with "Argo:" or "Assistant:".
      - Each user prompt is matched to the next assistant prompt in chronological order.

    This version preserves the original paragraph structure by joining lines
    with newline characters, rather than spaces.

    Additionally, for each assistant_prompt, we further segment the text based on:
      - Selected Principle(s)
      - Socratic Reformulation
      - Self-Query and Answer
      - Follow-Up Questions

    Returns a list of tuples:
      [
        (
          "User Prompt i", user_text,
          "Assistant Prompt i", assistant_text,
          {
            "Selected Principle(s)": ...,
            "Socratic Reformulation": ...,
            "Self-Query and Answer": ...,
            "Follow-Up Questions": ...
          }
        ),
        ...
      ]
    """

    conversation_pairs = []
    current_user_text = []
    current_assistant_text = []
    reading_user = False
    reading_assistant = False

    with open(file_path, 'r', encoding='utf-8') as f:
        for line in f:
            stripped_line = line.rstrip("\n")

            # Check if the line begins with a user marker
            if stripped_line.startswith("User:") or stripped_line.startswith("You:"):
                # If we were reading assistant text, that means we ended an assistant block
                if reading_assistant and current_assistant_text and current_user_text:
                    # Save the finished pair
                    conversation_pairs.append(
                        ("\n".join(current_user_text).strip(), "\n".join(current_assistant_text).strip())
                    )
                    current_assistant_text = []

                # Start reading a new user block
                reading_user = True
                reading_assistant = False
                current_user_text = [stripped_line]

            # Check if the line begins with an assistant marker
            elif stripped_line.startswith("Argo:") or stripped_line.startswith("Assistant:"):
                # If we were reading user text, that means we ended a user block
                if reading_user and current_user_text:
                    reading_user = False
                    reading_assistant = True
                    current_assistant_text = [stripped_line]
                else:
                    # If we are already reading assistant text, continue appending
                    reading_assistant = True
                    current_assistant_text.append(stripped_line)

            else:
                # Continue reading current block
                if reading_user:
                    current_user_text.append(stripped_line)
                elif reading_assistant:
                    current_assistant_text.append(stripped_line)

        # Handle the last pair if the file ends on an assistant block
        if current_user_text and current_assistant_text:
            conversation_pairs.append(
                ("\n".join(current_user_text).strip(), "\n".join(current_assistant_text).strip())
            )

    # Assign sequential numbering and parse each assistant response into its categories
    numbered_pairs = []
    for i, (user, assistant) in enumerate(conversation_pairs, start=1):
        segmented_assistant_response = parse_assistant_response(assistant)
        numbered_pairs.append(
            (f"User Prompt {i}",
             user,
             f"Assistant Prompt {i}",
             assistant,
             segmented_assistant_response)
        )

    return numbered_pairs

def parse_followup_questions(followup_text):
    """
    Parses the 'Follow-Up Questions' section into a list of dictionaries,
    each containing the question number, Socratic methods, and question text.

    Args:
        followup_text (str): The text containing all follow-up questions.

    Returns:
        A list of dictionaries with keys:
            - 'number': The question number.
            - 'methods': A list of Socratic methods identified in the header.
            - 'text': The question text without the header.
    """
    import re

    lines = followup_text.strip().split("\n")

    # Pattern to match lines starting with a number followed by a period
    question_start_pattern = re.compile(r"^(\d+)\.\s*(.*)")

    questions = []
    current_question = []
    current_methods = []
    question_number = None

    for line in lines:
        line = line.strip()
        if not line:
            continue  # Skip empty lines

        match = question_start_pattern.match(line)
        if match:
            # Save the previous question if it exists
            if current_question:
                questions.append({
                    'number': question_number,
                    'methods': current_methods,
                    'text': ' '.join(current_question).strip()
                })
                current_question = []
                current_methods = []

            question_number = match.group(1)
            header = match.group(2)

            # Identify Socratic methods in the header
            methods = identify_socratic_sequence(header, socratic_methods)
            current_methods = methods

            # Remove the Socratic methods and numbering from the question text
            question_text = header.split(':', 1)[-1].strip() if ':' in header else ''
            if question_text:
                current_question.append(question_text)
        else:
            current_question.append(line)

    # Add the last question
    if current_question:
        questions.append({
            'number': question_number,
            'methods': current_methods,
            'text': ' '.join(current_question).strip()
        })
    del questions[0]
    return questions

def identify_socratic_sequence(text, socratic_methods):
    """
    Identifies the sequence of Socratic methods in a given text using the provided list of keywords.

    Args:
        text (str): The text from which to extract the Socratic method sequence.
        socratic_methods (list): A list of Socratic method keywords.

    Returns:
        A list of Socratic methods identified in the text in the order they appear.
    """
    import re

    methods = []

    # Remove any leading numbering or identifiers (e.g., '1. ', 'Argo: ')
    sequence_part = re.sub(r'^\s*(\d+\.\s*|\w+:\s*)?', '', text)

    # Extract the part before the colon, if present
    sequence_part = sequence_part.split(':')[0]

    # Replace ' and ' with commas
    sequence_part = sequence_part.replace(' and ', ', ')

    # Split the sequence based on connectors
    methods_raw = re.split(r'\s*(?:→|->|,|/|and)\s*', sequence_part)

    # Filter out valid Socratic methods (case-insensitive)
    for method in methods_raw:
        method_clean = method.strip()
        for sm in socratic_methods:
            if method_clean.lower() == sm.lower():
                methods.append(sm)
                break

    return methods
    

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\yunkai.sun\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\yunkai.sun\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [13]:
def process_conversation(paired_conversation, socratic_methods):
    """
    Processes the conversation and builds data structures.

    Args:
        paired_conversation (list): The list of conversation pairs.
        socratic_methods (list): List of Socratic method keywords.

    Returns:
        list: A list of conversation data dictionaries.
    """
    conversation_data = []

    for idx, item in enumerate(paired_conversation):
        user_label, user_text, asst_label, asst_text, segmented = item

        # Remove 'You: ' from user_text for similarity comparison
        clean_user_text = re.sub(r'^You:\s*', '', user_text).strip()

        # Process assistant's segmented response
        assistant_methods = {}

        for cat, seg_text in segmented.items():
            if cat in ["Selected Principle(s)", "Follow-Up Questions", "Socratic Reformulation"] and seg_text:
                if cat == "Selected Principle(s)":
                    seg_text_clean = re.sub(r'^.*Selected Principle\(s\):\s*', '', seg_text)
                    methods = identify_socratic_sequence(seg_text_clean, socratic_methods)
                    assistant_methods['Selected Principle(s)'] = methods
                elif cat == "Follow-Up Questions":
                    followup_questions = parse_followup_questions(seg_text)
                    assistant_methods['Follow-Up Questions'] = followup_questions
                elif cat == "Socratic Reformulation":
                    methods = identify_socratic_sequence(seg_text, socratic_methods)
                    assistant_methods['Socratic Reformulation'] = methods

        # Build the data for this conversation turn
        conversation_data.append({
            'idx': idx,
            'user_label': user_label,
            'user_text': user_text,
            'clean_user_text': clean_user_text,
            'asst_label': asst_label,
            'asst_text': asst_text,
            'segmented': segmented,
            'assistant_methods': assistant_methods,
        })

    return conversation_data

def compute_similarities(conversation_data):
    """
    Computes similarities between user prompts and assistant's previous follow-up questions.

    Args:
        conversation_data (list): The list of conversation data dictionaries.
    """
    for i, item in enumerate(conversation_data):
        # For idx 0 (first user prompt), there is no previous assistant follow-up to compare
        if i == 0:
            item['similarity'] = None
            continue

        current_user_label = item['user_label']
        current_user_text = item['clean_user_text']

        # Aggregate assistant follow-up questions up to Assistant Prompt N-1
        all_previous_followups = []
        for prev_item in conversation_data[:i]:
            assistant_followups = prev_item['assistant_methods'].get('Follow-Up Questions', [])
            all_previous_followups.extend(assistant_followups)

        # Compute similarity between current user prompt and each follow-up question
        similarities = []
        for fq in all_previous_followups:
            fq_text = fq['text']
            sim_score = compute_similarity(current_user_text, fq_text)
            similarities.append((fq, sim_score))

        # Find the highest similarity score
        if similarities:
            similarities.sort(key=lambda x: x[1], reverse=True)
            highest_score = similarities[0][1]
            most_similar_question_dict = similarities[0][0]
            most_similar_question_text = most_similar_question_dict['text']
            associated_methods = most_similar_question_dict['methods']

            # Store the similarity data
            item['similarity'] = {
                'next_user_label': current_user_label,
                'most_similar_question_text': most_similar_question_text,
                'associated_methods': associated_methods,
                'similarity_score': highest_score
            }
        else:
            item['similarity'] = None  # No previous follow-up questions to compare

def export_results_verbose(conversation_data):
    """
    Exports the results by printing the conversation data and comparison results.

    Args:
        conversation_data (list): The list of conversation data dictionaries.
    """
    for item in conversation_data:
        idx = item['idx']
        user_label = item['user_label']
        user_text = item['user_text']
        asst_label = item['asst_label']
        asst_text = item['asst_text']
        assistant_methods = item['assistant_methods']
        segmented = item['segmented']
        similarity = item.get('similarity')

        print(f"{user_label}:\n{user_text}")
        print(f"{asst_label}:\n{asst_text}")

        if similarity:
            print(f"Comparing {similarity['next_user_label']} to assistant's previous follow-up questions...\n")
            print(f"Most similar follow-up question:")
            print(f"Text: {similarity['most_similar_question_text']}")
            print(f"Socratic Methods: {similarity['associated_methods']}")
            print(f"Similarity score: {similarity['similarity_score']:.4f}\n")
        else:
            print("No previous follow-up questions to compare with.\n")
        
        print("Segmented Assistant Response:")

        for cat, seg_text in segmented.items():
            print(f"  {cat}:\n    {seg_text}\n")

            if cat in ["Selected Principle(s)", "Follow-Up Questions"] and seg_text:
                if cat == "Selected Principle(s)":
                    methods = assistant_methods.get('Selected Principle(s)', [])
                    print(f"    Identified Socratic Methods: {methods}\n")
                elif cat == "Follow-Up Questions":
                    followup_questions = assistant_methods.get('Follow-Up Questions', [])
                    print("    Individual Follow-Up Questions:")
                    for question_dict in followup_questions:
                        q_num = question_dict['number']
                        q_methods = question_dict['methods']
                        q_text = question_dict['text']
                        print(f"      Question {q_num}:")
                        print(f"        Socratic Methods: {q_methods}")
                        print(f"        Text: {q_text}\n")

        print("-" * 80)

def export_results(conversation_data):
    """
    Exports the results by printing only the specified information:
        1) Socratic Methods and Similarity scores of user's prompts
        2) Identified Socratic Methods of reformulated question
        3) Socratic Methods of Follow-Up Questions

    Args:
        conversation_data (list): The list of conversation data dictionaries.
    """
    for item in conversation_data:
        idx = item['idx']
        user_label = item['user_label']
        user_text = item['user_text']
        assistant_methods = item['assistant_methods']
        similarity = item.get('similarity')

        print(f"{user_label}:\n{user_text}\n")

        # 1) Socratic Methods and Similarity scores of user's prompts
        if similarity:
            next_user_label = similarity['next_user_label']
            similarity_score = similarity['similarity_score']
            most_similar_question_text = similarity['most_similar_question_text']
            associated_methods = similarity['associated_methods']

            print(f"Comparing {next_user_label} to assistant's previous follow-up questions...\n")
            print(f"Most similar follow-up question:")
            print(f"Text: {most_similar_question_text}")
            print(f"Socratic Methods: {associated_methods}")
            print(f"Similarity score: {similarity_score:.4f}\n")
        else:
            print("No previous follow-up questions to compare with.\n")

        # 2) Identified Socratic Methods of reformulated question
        socratic_methods_reformulation = assistant_methods.get('Socratic Reformulation', [])
        if socratic_methods_reformulation:
            print(f"Identified Socratic Methods in Reformulated Question: {socratic_methods_reformulation}\n")

        # 3) Socratic Methods of Follow-Up Questions
        followup_questions = assistant_methods.get('Follow-Up Questions', [])
        if followup_questions:
            print("Socratic Methods of Follow-Up Questions:")
            for question_dict in followup_questions:
                q_num = question_dict['number']
                q_methods = question_dict['methods']
                print(f"  Question {q_num} Socratic Methods: {q_methods}")
            print()

        print("-" * 80)

def check_socratic_system_prompt(file_path):
    """
    Checks if the conversation file at file_path starts with the Socratic system prompt,
    by searching for specific keywords in the beginning of the file.

    Args:
        file_path (str): The path to the conversation file.

    Returns:
        bool: True if all keywords are found in the beginning of the file, False otherwise.
    """
    keywords = [
        "You are a Socratic AI assistant",
        "Integration of the Mixed Socratic Prompting Approach",
        "Response Structure for Any User Query"
    ]
    try:
        with open(file_path, 'r', encoding='utf-8') as f:
            # Read the first 10000 characters (or adjust as needed)
            beginning_text = f.read(10000)

        # Check if all keywords are present in the beginning text
        return all(keyword in beginning_text for keyword in keywords)

    except FileNotFoundError:
        print(f"File not found: {file_path}")
        return False

def main(verbose=False):
    # Replace with your actual file path
    file_path = r"C:\Users\yunkai.sun\Box\SM_Assessment\Submissions\Unal_Mustafa\SiC Dislocation Analysis\SiC - Socratic Approach.txt"
    file_path = r"C:\Users\yunkai.sun\Box\SM_Assessment\Submissions\Yunkai_Sun\Raw data\SM-4 Practical Application, YKS, Argo □ Argonne National Laboratory_5-56-12PM_02-22-2025.txt"
    
    if not check_socratic_system_prompt(file_path):
        print("The conversation file does not follow the Socratic system prompt.")
        return  # Exit or handle accordingly

    paired_conversation = segment_conversation_preserve_paragraphs(file_path)
    #del paired_conversation[0]  # Remove initial system prompt if necessary

    # Define the Socratic methods list
    
    # Process conversation and build data structures
    conversation_data = process_conversation(paired_conversation, socratic_methods)

    # Compute similarities
    compute_similarities(conversation_data)

    # Export results
    if verbose:
        export_results_verbose(conversation_data)
    else:
        export_results(conversation_data)

socratic_methods = [
        'Definition',
        'Generalization',
        'Induction',
        'Elenchus',
        'Hypothesis Elimination',
        'Maieutics',
        'Dialectic',
        'Recollection',
        'Analogy',
        'Irony'
    ]   

if __name__ == "__main__":
    verbose_main = False
    #verbose_main = True
    main(verbose_main)

User Prompt 1:
User: yunkai.sun
Time: 2/22/2025, 5:56:12 PM
Argo Version: v1.3.0
-----------------------------------------------

No previous follow-up questions to compare with.

Socratic Methods of Follow-Up Questions:
  Question 2 Socratic Methods: []
  Question 3 Socratic Methods: []
  Question 4 Socratic Methods: []
  Question 5 Socratic Methods: []
  Question 6 Socratic Methods: []
  Question 7 Socratic Methods: []
  Question 8 Socratic Methods: []
  Question 9 Socratic Methods: []
  Question 10 Socratic Methods: []
  Question 4 Socratic Methods: []

--------------------------------------------------------------------------------
User Prompt 2:
You: Choose suitable electrolyte system and deposition parameters for Fe-Pt electrodeposition.

Comparing User Prompt 2 to assistant's previous follow-up questions...

Most similar follow-up question:
Text: o	User Input: "What do all strong acids have in common? (Generalization)" o	Follow-Up Questions: 	"How do these properties extend to 