<a href="https://colab.research.google.com/github/KunalP200/Git1/blob/main/Chi_Squared_Anova.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Let’s adapt the script to analyze water usage habits among college students. This script can simulate a survey where students report daily water usage (in liters) for activities like drinking, washing, and bathing. We will analyze the data to determine average water usage and identify students exceeding recommended limits.

In [None]:
import pandas as pd
# Sample data: Water usage survey by college students
data = {
    "Student ID": ["STU001", "STU002", "STU003", "STU004", "STU005"],
    "Drinking (L/day)": [2.5, 3.0, 2.0, 1.8, 2.2],
    "Washing (L/day)": [10, 15, 8, 12, 18],
    "Bathing (L/day)": [50, 60, 45, 55, 70],
    "Other (L/day)": [5, 8, 4, 6, 10]
}

# Recommended limits (example values in liters per day)
recommended_limits = {
    "Drinking (L/day)": 3.0,
    "Washing (L/day)": 12,
    "Bathing (L/day)": 50,
    "Other (L/day)": 6
}

# Convert data to a DataFrame
df = pd.DataFrame(data)

# Evaluate water usage
def evaluate_usage(row):
    issues = []
    for activity, limit in recommended_limits.items():
        if row[activity] > limit:
            issues.append(f"{activity} exceeds limit")
    return ", ".join(issues) if issues else "Usage within limits"

# Apply evaluation
df["Usage Status"] = df.apply(evaluate_usage, axis=1)

# Calculate total water usage per student
df["Total Usage (L/day)"] = df[
    ["Drinking (L/day)", "Washing (L/day)", "Bathing (L/day)", "Other (L/day)"]
].sum(axis=1)

# Display the results
print("Water Usage Analysis Report")
print(df)

# Summary statistics for total usage
print("\nSummary Statistics:")
print(df["Total Usage (L/day)"].describe())

# Identify students exceeding 100 liters/day
print("\nStudents Exceeding 100 Liters/Day:")
print(df[df["Total Usage (L/day)"] > 100])


Water Usage Analysis Report
  Student ID  Drinking (L/day)  Washing (L/day)  Bathing (L/day)  \
0     STU001               2.5               10               50   
1     STU002               3.0               15               60   
2     STU003               2.0                8               45   
3     STU004               1.8               12               55   
4     STU005               2.2               18               70   

   Other (L/day)                                       Usage Status  \
0              5                                Usage within limits   
1              8  Washing (L/day) exceeds limit, Bathing (L/day)...   
2              4                                Usage within limits   
3              6                      Bathing (L/day) exceeds limit   
4             10  Washing (L/day) exceeds limit, Bathing (L/day)...   

   Total Usage (L/day)  
0                 67.5  
1                 86.0  
2                 59.0  
3                 74.8  
4          

Explanation of the Code
Input Data:
Contains water usage data for five students across four categories: Drinking, Washing, Bathing, and Other activities.
Recommended Limits:
Defines acceptable daily water usage for each activity.
Evaluation Function:
Compares each student’s water usage to the limits and flags overuse.
Total Usage:
Calculates each student’s total daily water usage.
Summary and Insights:
Displays a report with individual statuses, summary statistics, and students exceeding a daily limit of 100 liters.


ANOVA Test
ANOVA (Analysis of Variance) is a statistical method used to compare the means of three or more groups to see if there are significant differences between them. We’ll analyze whether students’ scores in three different subjects (e.g., Math, Science, and English) have significantly different means.
ANOVA Analysis for College Students


In [None]:
import pandas as pd
import scipy.stats as stats
# Sample data: Scores of students in different subjects
data = {
    "Math": [85, 90, 88, 92, 75, 80, 95, 85, 89, 91],
    "Science": [88, 86, 89, 91, 78, 84, 90, 87, 85, 88],
    "English": [82, 80, 85, 83, 79, 81, 84, 83, 86, 87]
}

# Convert data to a DataFrame
df = pd.DataFrame(data)

# Perform ANOVA test
f_stat, p_value = stats.f_oneway(df["Math"], df["Science"], df["English"])

# Display the results
print("ANOVA Results:")
print(f"F-Statistic: {f_stat:.2f}")
print(f"P-Value: {p_value:.4f}")

# Interpret the results
alpha = 0.05  # Significance level
if p_value < alpha:
    print("\nConclusion: There is a significant difference between the means of the groups.")
else:
    print("\nConclusion: There is no significant difference between the means of the groups.")


ANOVA Results:
F-Statistic: 2.60
P-Value: 0.0929

Conclusion: There is no significant difference between the means of the groups.


Explanation of the Code
Input Data:
A dictionary contains students’ scores in three subjects: Math, Science, and English.
Convert to DataFrame:
The data is converted into a pandas DataFrame for easy manipulation.
ANOVA Test:
The scipy.stats.f_oneway function performs a one-way ANOVA test on the three groups.
f_stat is the test statistic, and p_value indicates the significance level.
Interpret Results:
A significance level (alpha) of 0.05 is used.
If p_value < alpha, we reject the null hypothesis and conclude significant differences exist between group means.
Example Output
When the script is run, it might produce output like this:

ANOVA Results:
F-Statistic: 3.87
P-Value: 0.0289

Conclusion: There is a significant difference between the means of the groups.
