# 📌 Hypotheses & Testing Methods

1️⃣ Does Seller Rating Affect Price?

H0: Seller rating has no impact on price.

H1: Higher-rated sellers price items differently.

Test: Pearson Correlation & t-test (High-rated vs. Low-rated sellers).


2️⃣ Does Product Condition (New vs. Used) Impact Price?

H0: No significant price difference between New and Used items.

H1: New items are priced higher than Used ones.

Test: t-test (Compare prices of "New" vs. "Used" items).

3️⃣ Does Category Affect Price?

H0: All categories have similar average prices.

H1: At least one category has a significantly different price.

Test: ANOVA test (Compare price variations across multiple categories).

4️⃣ Does Seasonality Affect Listings?

H0: Items are evenly listed across seasons.

H1: More items are listed in specific seasons.

Test: Chi-square test (Check if listings are equally distributed across seasons).

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import ttest_ind, f_oneway, pearsonr, chi2_contingency

In [2]:
# Load the dataset
file_path = "ebay_cleaned.csv"
df = pd.read_csv(file_path)

In [3]:
# Convert date column to datetime format
df["Item Creation Date"] = pd.to_datetime(df["Item Creation Date"], errors="coerce")

In [4]:
# Encode 'Condition' column (New = 1, Used = 0) for analysis
if "Condition" in df.columns:
    df["Condition"] = df["Condition"].apply(lambda x: 1 if x.lower() == "new" else 0)

In [5]:
# Extract 'Season' from 'Item Creation Date'
df["Month"] = df["Item Creation Date"].dt.month
season_map = {
    12: "Winter", 1: "Winter", 2: "Winter",
    3: "Spring", 4: "Spring", 5: "Spring",
    6: "Summer", 7: "Summer", 8: "Summer",
    9: "Fall", 10: "Fall", 11: "Fall"
}
df["Season"] = df["Month"].map(season_map)

In [6]:
# Display basic dataset info
df.info(), df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 82 entries, 0 to 81
Data columns (total 17 columns):
 #   Column              Non-Null Count  Dtype              
---  ------              --------------  -----              
 0   Item ID             82 non-null     object             
 1   Title               82 non-null     object             
 2   Category IDs        82 non-null     int64              
 3   Categories          82 non-null     object             
 4   Price               82 non-null     float64            
 5   Item Link           82 non-null     object             
 6   Condition           82 non-null     int64              
 7   Seller              82 non-null     object             
 8   Seller Rating       82 non-null     int64              
 9   Item Location       82 non-null     object             
 10  Shipping Cost       82 non-null     float64            
 11  Buying Options      82 non-null     object             
 12  Item Creation Date  82 non-null     da

(None,
                         Item ID  \
 0             v1|110577732801|0   
 1             v1|110577681056|0   
 2  v1|110577018152|410109996760   
 3             v1|110577322535|0   
 4             v1|110577018103|0   
 
                                                Title  Category IDs  \
 0  Apple MacBook Pro MB990LL/A 13.3 in. Notebook NEW        111422   
 1  Apple MacBook Pro MB990LL/A 13.3 in. Notebook NEW        111422   
 2  Old Variants Product Test Product Apple MacBoo...        111422   
 3  Harry Potter and the Goblet of Fire - First Ed...           177   
 4  Old Simple Product Test Product Apple MacBook ...        111422   
 
                                           Categories   Price  \
 0  Apple Laptops, Computers/Tablets & Networking,...  500.00   
 1  Apple Laptops, Computers/Tablets & Networking,...  500.00   
 2  Apple Laptops, Computers/Tablets & Networking,...    9.99   
 3  PC Laptops & Netbooks, Computers/Tablets & Net...  500.00   
 4  Apple Laptops, Com

In [7]:
# 1️⃣ T-test: Does Seller Rating Affect Price?
high_rated_sellers = df[df["Seller Rating"] >= 500]["Price"]
low_rated_sellers = df[df["Seller Rating"] < 500]["Price"]
t_stat_seller, p_value_seller = ttest_ind(high_rated_sellers, low_rated_sellers, equal_var=False)

In [8]:
# 2️⃣ T-test: Does Condition (New vs. Used) Affect Price?
new_items = df[df["Condition"] == 1]["Price"]
used_items = df[df["Condition"] == 0]["Price"]
t_stat_condition, p_value_condition = ttest_ind(new_items, used_items, equal_var=False)

In [9]:
# 3️⃣ ANOVA Test: Does Category Affect Price?
category_groups = [group["Price"] for _, group in df.groupby("Categories")]
anova_stat_category, p_value_category = f_oneway(*category_groups)

In [10]:
# 4️⃣ Chi-Square Test: Does Seasonality Affect Listings?
season_counts = df["Season"].value_counts()
expected_counts = np.full(len(season_counts), season_counts.mean())
chi2_stat_season, p_value_season = chi2_contingency([season_counts, expected_counts])[:2]

In [11]:
# 📌 Step 4: Display Results
results = {
    "Seller Rating vs. Price": {"T-statistic": t_stat_seller, "P-value": p_value_seller},
    "Condition (New vs. Used) vs. Price": {"T-statistic": t_stat_condition, "P-value": p_value_condition},
    "Category vs. Price": {"ANOVA Statistic": anova_stat_category, "P-value": p_value_category},
    "Seasonality of Listings": {"Chi-Square Statistic": chi2_stat_season, "P-value": p_value_season}
}

# Print the results
for test, result in results.items():
    print(f"\n🔹 {test}:")
    for key, value in result.items():
        print(f"   {key}: {value:.4f}")


🔹 Seller Rating vs. Price:
   T-statistic: -0.2970
   P-value: 0.7856

🔹 Condition (New vs. Used) vs. Price:
   T-statistic: -1.0784
   P-value: 0.3923

🔹 Category vs. Price:
   ANOVA Statistic: 2.1516
   P-value: 0.1005

🔹 Seasonality of Listings:
   Chi-Square Statistic: 1.2066
   P-value: 0.2720


In [14]:
# 📌 Step 5: Interpretation of Results
print("\n📊 Final Conclusions:")
if p_value_seller > 0.05:
    print("✅ Seller Rating does NOT significantly impact price.")
else:
    print("❌ Seller Rating significantly impacts price.")

if p_value_condition > 0.05:
    print("✅ Condition (New vs. Used) does NOT significantly impact price.")
else:
    print("❌ Condition significantly impacts price.")

if p_value_category > 0.05:
    print("✅ Product category does NOT significantly impact price.")
else:
    print("❌ Product category significantly impacts price.")

if p_value_season > 0.05:
    print("✅ Listings are evenly distributed across seasons.")
else:
    print("❌ Seasonality significantly impacts listings.")


📊 Final Conclusions:
✅ Seller Rating does NOT significantly impact price.
✅ Condition (New vs. Used) does NOT significantly impact price.
✅ Product category does NOT significantly impact price.
✅ Listings are evenly distributed across seasons.
