# Quantitative Analysis Notebook
**Goal:** Apply structured quantitative analysis to survey data, following HCI usability evaluation principles.  
This notebook demonstrates **descriptive statistics**, **relationships**, and **deeper explorations** using Python.


## Step 1: Load & Inspect Data
*Justification:*  
Before any analysis, we need to import the dataset, check its structure, and confirm variable types. This ensures we know which variables are continuous (e.g., Likert ratings, times, SUS scores) and which are categorical (e.g., gender, completion).  


In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.read_excel("UTAAUTdata.xlsx", sheet_name="Form Responses 1")
df.head()


Unnamed: 0,Timestamp,Participant id,[Using this app helps me quickly identify safe surplus food],[The safety score and freshness indicators increase my confidence in buying food.],[Using the app makes it easier to make informed decisions about surplus food.],[This app allows me to find restaurants with desirable food quality efficiently.],[People who are important to me think I should use this app],[People whose opinions I value would recommend using this app],"[If I use this app, I will gain social approval (e.g., being seen as eco-friendly or budget-conscious]",[I am more likely to use this app if I see my friends or peers using it.]
0,2025-09-16 22:35:02.486,P1,4,4,4,4,3,4,3,4
1,2025-09-17 18:16:39.865,R1,3,2,3,1,2,1,3,3
2,2025-09-17 21:09:13.308,P02,5,5,5,5,4,5,4,5
3,2025-09-18 10:07:26.799,r2,5,5,5,5,4,5,4,5
4,2025-09-19 11:03:58.300,Yash Pandey,1,2,3,4,3,3,1,1


## Step 2: Descriptive Statistics
*Justification:*  
Descriptive statistics summarise the dataset, allowing us to quickly understand central tendencies and spread. For **continuous data**, we report mean, median, standard deviation, min, and max. For **categorical data**, we report counts and percentages.  


In [None]:
numeric_df = df.select_dtypes(include="number")
categorical_df = df.select_dtypes(exclude="number")

# Continuous summary
desc_stats = numeric_df.describe().T
desc_stats["median"] = numeric_df.median()
desc_stats

# Categorical summary (if available)
for col in categorical_df.columns:
    print(f"\n{col} counts:")
    print(categorical_df[col].value_counts(normalize=True) * 100)


## Visualisations: Descriptive Statistics
*Justification:*  
Visuals like bar charts, histograms, and boxplots clearly communicate distributions and highlight outliers.  


In [None]:


# Histograms
numeric_df.hist(figsize=(12,8), bins=5, edgecolor="black")
plt.suptitle("Distributions of Continuous Responses")
plt.show()

# Boxplots
plt.figure(figsize=(10,6))
sns.boxplot(data=numeric_df, orient="h", palette="Set2")
plt.title("Boxplots of Responses per Question")
plt.show()


## Step 3: Explore Relationships
*Justification:*  
Exploring relationships helps identify associations between variables (e.g., do high-confidence users also score higher on usability?). This guides where to dig deeper. Correlation is not causation but shows patterns worth investigating.  


In [None]:
# Correlation matrix
plt.figure(figsize=(8,6))
sns.heatmap(numeric_df.corr(), annot=True, cmap="coolwarm", center=0)
plt.title("Correlation Between Survey Items")
plt.show()

# Pairwise plots
sns.pairplot(numeric_df)
plt.suptitle("Pairwise Scatterplots of Responses", y=1.02)
plt.show()


In [None]:


# Prepare results matrices
cols = numeric_df.columns
r_matrix = pd.DataFrame(np.zeros((len(cols), len(cols))), columns=cols, index=cols)
p_matrix = pd.DataFrame(np.ones((len(cols), len(cols))), columns=cols, index=cols)

# Fill correlation and p-values
for i in range(len(cols)):
    for j in range(len(cols)):
        if i == j:
            r_matrix.iloc[i, j] = 1.0
            p_matrix.iloc[i, j] = 0.0
        else:
            r, p = stats.pearsonr(numeric_df[cols[i]], numeric_df[cols[j]])
            r_matrix.iloc[i, j] = r
            p_matrix.iloc[i, j] = p

# Display correlation coefficients
plt.figure(figsize=(10,8))
sns.heatmap(r_matrix.round(2), annot=True, cmap="coolwarm", center=0,
            xticklabels=cols, yticklabels=cols)
plt.title("Correlation Matrix (r values)")
plt.show()

# Display p-values
plt.figure(figsize=(10,8))
sns.heatmap(p_matrix.round(3), annot=True, cmap="YlGnBu", cbar=False,
            xticklabels=cols, yticklabels=cols)
plt.title("P-value Matrix")
plt.show()
