### <center> Algorithm Steps </center>

- **Loaded the data**: Loaded the student data from a CSV file into a Pandas DataFrame.
- **Displayed the original records**: Printed the original student records to ensure the data was loaded correctly.
- **Sorted the data**: Sorted the DataFrame by profession (ascending) and by score (descending) to organize students based on these attributes.
- **Displayed the sorted data**: Printed the sorted student data to verify the organization of students by profession and score.
- **Initialized the groups**: Decided to create three groups and set up empty lists for each one to store students.
- **Distributed the students**: For each unique profession, filtered the students and distributed them evenly into the groups using a round-robin method to ensure balance.
- **Calculated the averages**: Created a function to calculate the average score for each group to measure performance distribution.
- **Calculated the standard deviation**: Computed the initial standard deviation of the group averages to assess the variation between groups.
- **Check for standard deviation ≤ 2**: Added a check to skip the swapping process if the initial standard deviation was already ≤ 2.
- **Displayed the initial distribution**: Printed the initial distribution of students in each group, including average scores and the standard deviation.
- **Swapped students**: Defined a function to minimize the standard deviation by swapping students between groups:
    - Identified the group with the highest average score and the group with the lowest average score.
    - Swapped the highest-scoring student from the group with the highest average score with the lowest-scoring student from the group with the lowest average score, focusing on students with the same profession.
- **Attempted improvement**: Called the swap function to check if the swap reduced the group averages' standard deviation.
- **Displayed the updated distribution**: Printed the updated distribution of students in each group after the swap and the new standard deviation.
- **Compared the results**: Showed both the initial and final standard deviations to analyze the improvement (if any).
- **Try Catch Block**: Added try catch blocks to handle exceotion in file reading etc.



In [2]:
import os  # This imports the OS module to work with file paths.
import pandas as pd  # This imports pandas, this will be used for data manipulation and analysis.
import numpy as np  # This imports NumPy, this will be used for numerical operations.

In [4]:

# This Function is used to process each file and avoid writing the same code multiple times.
def process_file(file_path):
   
    try:
        
        # Load the student data from a excel file into a DataFrame
        students_df = pd.read_csv(file_path)
        
        # Show the original student records to make sure the data was loaded correctly
        print("\nOriginal Student Data:")
        print(students_df.to_string(index=False))

        # K will be used to create X number of group, we can set it to any values
        k = 3  

        # Step 1: Sort the students by their profession and score
        # Sort the DataFrame first by 'profession' (in ascending order) and then by 'score' (in descending order)
        students_sorted = students_df.sort_values(by=['profession', 'score'], ascending=[True, False])

        # Show the sorted student data so we can see how it's organized
        print("\nSorted Student Data:")
        print(students_sorted[['userId', 'score', 'profession']].to_string(index=False))

        # Step 2: Initialize empty lists for each group
        groups = [[] for _ in range(k)]

        # Step 3: Distribute students into groups based on their profession
        # For each unique profession, filter the students and add them to the groups in a round-robin fashion like each profession in each group one by one.
        professions = students_sorted['profession'].unique()
        for profession in professions:
            profession_group = students_sorted[students_sorted['profession'] == profession]
            for i, (_, student) in enumerate(profession_group.iterrows()):
                groups[i % k].append(student)

        # Step 4: Calculating the average score for each group
        def calculate_group_averages(groups):
            return [pd.DataFrame(group)['score'].mean() for group in groups]

        group_averages = calculate_group_averages(groups)
        
        # Step 5: Calculating the initial standard deviation of the group averages
        initial_std_dev = np.std(group_averages)

        # Step 6: Displaying the initial distribution of students in each group
        print("\nInitial Group Distribution:")
        for i, group in enumerate(groups):
            print(f"\nGroup {i+1}:")
            for student in group:
                print(f"UserId: {student['userId']}, Score: {student['score']}, Profession: {student['profession']}")
            print(f"Average Score: {group_averages[i]:.2f}")

        print(f"\nInitial Standard Deviation of Group Averages: {initial_std_dev:.2f}")

        # Step 7: Defined a function to swap students to minimize the standard deviation
        def swap_students(groups, group_averages):
            high_group_idx = np.argmax(group_averages)  # Find the index of the group with the highest average score
            low_group_idx = np.argmin(group_averages)  # Find the index of the group with the lowest average score

            high_group = pd.DataFrame(groups[high_group_idx])  # Get the high-scoring group as a DataFrame
            low_group = pd.DataFrame(groups[low_group_idx])    # Get the low-scoring group as a DataFrame

            best_swap = None  # This will hold the best swap found
            best_std_dev = np.std(group_averages)  # Store the best standard deviation found
            best_group_averages = group_averages[:]  # Copy the current group averages

            # Step 8: Trying to find the best student swap to reduce the standard deviation
            for profession in high_group['profession'].unique():
                high_group_prof = high_group[high_group['profession'] == profession]  # Filtering the high group by profession
                low_group_prof = low_group[low_group['profession'] == profession]      # Filtering the low group by profession

                # If there are students in both groups with the same profession
                if not high_group_prof.empty and not low_group_prof.empty:
                    # Get the highest scoring student from the high group and the lowest from the low group
                    high_student = high_group_prof.iloc[0]  # Student with the highest score
                    low_student = low_group_prof.iloc[-1]    # Student with the lowest score

                    # Step 9: Doing the swap by making a temporary copy of the groups
                    temp_groups = [group[:] for group in groups]  # creating copy of groups
                    temp_groups[high_group_idx] = [s for s in temp_groups[high_group_idx] if s['userId'] != high_student['userId']]
                    temp_groups[low_group_idx] = [s for s in temp_groups[low_group_idx] if s['userId'] != low_student['userId']]
                    temp_groups[high_group_idx].append(low_student)  # Adding the low-scoring student to high group
                    temp_groups[low_group_idx].append(high_student)  # Adding the high-scoring student to low group

                    # Step 10: Recalculate the averages for the new groups
                    temp_group_averages = calculate_group_averages(temp_groups)
                    temp_std_dev = np.std(temp_group_averages)  # Calculate the new standard deviation

                    # If this swap reduces the standard deviation, remember it
                    if temp_std_dev < best_std_dev:
                        best_swap = (high_student, low_student)  # Store the best swap
                        best_std_dev = temp_std_dev  # Update the best standard deviation
                        best_group_averages = temp_group_averages  # Update the best group averages

            # Step 11: Perform the best swap if one was found
            if best_swap:
                high_student, low_student = best_swap
                print(f"\nSwapping UserId {high_student['userId']} with UserId {low_student['userId']}")
                
                # Remove the swapped students from their original groups
                groups[high_group_idx] = [s for s in groups[high_group_idx] if s['userId'] != high_student['userId']]
                groups[low_group_idx] = [s for s in groups[low_group_idx] if s['userId'] != low_student['userId']]
                groups[high_group_idx].append(low_student)  # Add the low-scoring student to the high group
                groups[low_group_idx].append(high_student)  # Add the high-scoring student to the low group

                return best_std_dev, best_group_averages  # Return the new standard deviation and group averages

            return None, group_averages  # If no swap was made, return the current averages

        # Step 12: Check if the standard deviation is already acceptable
        if initial_std_dev <= 2:  # If the standard deviation is less than or equal to 2, no swap is needed
            print(f"\nInitial Standard Deviation ({initial_std_dev:.2f}) is already less than or equal to 2. No swap performed.")
            final_std_dev = initial_std_dev  # Final std deviation remains the same
        else:
            # Step 13: Attempt to improve the distribution by swapping
            final_std_dev, final_group_averages = swap_students(groups, group_averages)

            # Step 14: Display the updated group distribution after the swap
            if final_std_dev is not None:  # If a swap was performed
                print("\nUpdated Group Distribution After Swap:")
                for i, group in enumerate(groups):
                    print(f"\nGroup {i + 1}:")
                    for student in group:
                        print(f"UserId: {student['userId']}, Score: {student['score']}, Profession: {student['profession']}")
                    print(f"Updated Average Score: {final_group_averages[i]:.2f}")

                # Print the final standard deviation of group averages
                print(f"\nFinal Standard Deviation of Group Averages: {final_std_dev:.2f}")
            else:
                print("\nNo improvements were made.")

        # Step 15: Display both the initial and final standard deviations
        print(f"\nInitial Standard Deviation: {initial_std_dev:.2f}")
        print(f"Final Standard Deviation: {final_std_dev:.2f}" if final_std_dev is not None else f"Final Standard Deviation: {initial_std_dev:.2f}")
        print("\n ******************************************************************* \n")

    except FileNotFoundError:
         # Handle file not found error
        print(f"Error: The file {file_path} was not found.")        
    except pd.errors.EmptyDataError:
        # Handle empty data error
        print(f"Error: No data found in the file {file_path}.")       
    except Exception as e:
        # Handle any other unexpected errors
        print(f"An error occurred: {e}")  


# List of file paths for all CSV files
file_paths = [
    r'D:\Naila Task\Tuesday 22-Oct-2024\table1.csv',
    r'D:\Naila Task\Tuesday 22-Oct-2024\table2.csv',
    r'D:\Naila Task\Tuesday 22-Oct-2024\table3.csv',
    r'D:\Naila Task\Tuesday 22-Oct-2024\table4.csv',
    r'D:\Naila Task\Tuesday 22-Oct-2024\table5.csv',
]

# Process each file in the list
for file_path in file_paths:
    process_file(file_path)


Original Student Data:
 Unnamed: 0  userId  score profession
          0       1    100   Engineer
          1       2    100   Engineer
          2       3    100   Engineer
          3       4    100     Doctor
          4       5    100     Doctor
          5       6     85     Doctor
          6       7     70     Lawyer
          7       8     75     Lawyer
          8       9     60     Lawyer
          9      10     65   Engineer
         10      11     55   Engineer
         11      12     50     Doctor
         12      13     45     Lawyer
         13      14     40     Doctor
         14      15     35     Lawyer
         15      16     30   Engineer
         16      17     25     Doctor
         17      18     20     Lawyer

Sorted Student Data:
 userId  score profession
      4    100     Doctor
      5    100     Doctor
      6     85     Doctor
     12     50     Doctor
     14     40     Doctor
     17     25     Doctor
      1    100   Engineer
      2    100   Enginee