In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from textblob import TextBlob
from wordcloud import WordCloud

In [None]:
#configure plotting
sns.set(style="whitegrid")
plt.rcParams["figure.figsize"] = (10, 6)

In [None]:
#Load the dataset
df1 = pd.read_csv('/student_feedback.csv')
df2 = pd.read_csv('/Student_Satisfaction_Survey.csv', encoding='ISO-8859-1')

In [None]:
df1.columns

cleaning dataset1


What is Unnamed: 0? When a CSV file is saved from tools like Excel or pandas with index=True (the default), the row numbers (indexes) are also saved as a column.
This creates an unnecessary column named "Unnamed: 0" when the CSV is read again, as the index is already automatically handled by pandas.
What does drop(..., inplace=True) do? drop(columns=['Unnamed: 0']) tells pandas to remove the specified column.
inplace=True ensures the operation modifies df1 directly, without creating a new copy.


In [None]:
# Clean df1: Drop unnecessary columns
df1.drop(columns=['Unnamed: 0'], inplace=True)

In [None]:
df2.columns

**cleaning dataset 2**


This removes any leading or trailing spaces from column names in df2.

Why it's important:

Sometimes, CSV or Excel files have hidden whitespace in column names, causing bugs like:

KeyError: 'Computed_Average' (even if it looks like the column exists)

Problems during merging, plotting, or filtering.

In [None]:
# Clean df2: Clean column names
df2.columns = df2.columns.str.strip()

In [None]:
# Calculate average satisfaction scores from df1
rating_columns = df1.columns.drop('Student ID')
df1['Average_Score'] = df1[rating_columns].mean(axis=1)


CHART -1 HISTOGRAM OF NO OF STUDENTS & AVG SCORE

In [None]:
# Plot: Overall satisfaction score distribution (df1)
sns.histplot(df1['Average_Score'], bins=10, kde=True)
plt.title('Distribution of Average Satisfaction Scores')
plt.xlabel('Average Score')
plt.ylabel('Number of Students')
plt.show()

The histogram displays the distribution of average satisfaction scores among students.

The x-axis shows the average score ranges, divided into 10 bins.

The y-axis shows the number of students who fall into each score range.

The KDE (smooth curve) shows the overall trend or shape of the score distribution.

Helps identify whether most students gave high, medium, or low ratings.

Chart 2:- Bar plot score and question

In [None]:
#  Plot: Average score per question (df1)
avg_per_question = df1[rating_columns].mean().sort_values()

sns.barplot(
    x=avg_per_question.values,
    y=avg_per_question.index,
    hue=avg_per_question.index,  # setting hue
    palette='viridis',
    dodge=False,
    legend=False
)

plt.title('Average Score per Question')
plt.xlabel('Average Score')
plt.ylabel('Question')
plt.show()

"Well versed with the subject" received the highest average score, reflecting strong subject knowledge.

"Explains concepts clearly" and "Use of presentations" also scored well, indicating effective teaching methods.

Lower scores for "Solves doubts willingly" and "Difficulty of assignments" suggest areas needing improvement in student support and assignment clarity.

First, the total number of responses for each question is calculated by adding up how many students gave ratings from 1 to 5.

Then, the weighted average score is computed by multiplying each rating value (1 to 5) by the number of times it was given, summing those values, and dividing by the total number of responses.

This gives a more accurate average that considers both the rating and how many students selected it

In [None]:
# Compute computed average from df2 (weighted average)
df2['Total_Responses'] = df2[[f'Weightage {i}' for i in range(1, 6)]].sum(axis=1)
df2['Computed_Average'] = sum(df2[f'Weightage {i}'] * i for i in range(1, 6)) / df2['Total_Responses']

Bar plot of AVG rating per question

Cleans the data by replacing curly/smart apostrophes in the "Questions" column with regular apostrophes to avoid font or display issues.

Sorts the questions based on their computed average scores in descending order (highest to lowest).

Creates a horizontal bar chart where:

The y-axis shows the questions.

The x-axis shows the computed average scores.

Each bar's color comes from the "coolwarm" color palette, and each question is used as a hue to apply different shades.

The legend is turned off to avoid clutter since each question is already labeled on the y-axis.

The chart shows how well each question scored on average, making it easy to compare satisfaction levels across different areas

In [None]:
#  Average satisfaction by Question (df2)
df2['Questions'] = df2['Questions'].str.replace('’', "'", regex=False)

top_questions = df2[['Questions', 'Computed_Average']].sort_values(by='Computed_Average', ascending=False)

sns.barplot(
    data=top_questions,
    x='Computed_Average',
    y='Questions',
    hue='Questions',
    palette='coolwarm',
    dodge=False,
    legend=False
    )

plt.title('Average Rating per Question (Aggregated)')
plt.xlabel('Average Score')
plt.ylabel('Question')
plt.show()

word plot of feedback question

In [None]:
# Word Cloud: Top feedback questions (df2)
text = ' '.join(df2['Questions'].dropna().tolist())
wordcloud = WordCloud(width=800, height=400, background_color='white').generate(text)

In [None]:
plt.figure(figsize=(12, 6))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.title("Word Cloud of Feedback Questions")
plt.show()

In [None]:
print("\nSuggestions for Improvement:")
low_avg_questions = top_questions[top_questions['Computed_Average'] < 3.5]
for _, row in low_avg_questions.iterrows():
    print(f"• {row['Questions']} → Avg Score: {row['Computed_Average']:.2f}")