# Assessing the Impact of Product Specifications and Brand Origin on the Pricing of Mechanical Keyboards in 2025

## Problem Statement
Global supply chains have undergone significant political and economic disruption in recent years, particularly in the technology and consumer electronics industries. 

Mechanical keyboards, an essential component of modern computing and creative work, have become a notable example of how Chinese manufacturers have entered the enthusiast market with competitive alternatives.

Historically, branding and Western design heritage contributed greatly to pricing. However, with increased transparency and direct-to-consumer models from Chinese factories, this may no longer hold true.

## Goal
This project aims to use mechanical keyboard listings as a case study to explore whether technical specifications and country/brand of origin still meaningfully influence pricing in 2025.

## Hypothesis
H₀ (Null Hypothesis): Product specifications and brand origin (e.g., Chinese vs Western brands) have no significant effect on price.

H₁ (Alternative Hypothesis): Product specifications and brand origin significantly affect price.

## Objectives
- Determine which features (e.g., switch type, brand, layout, connectivity) influence pricing.

- Analyze whether branding and origin remain significant predictors of pricing.

- Provide insights into broader trends of consumer electronics pricing post-supply-chain globalization.



# Seeing what data we are working with

In [None]:
import pandas as pd

df = pd.read_csv('keebfinder_keyboards_rev8.csv')


In [2]:
df.shape

(2368, 13)

In [3]:
df.isna().sum()

category         0
brand            0
title            0
price            0
layout           0
mount            0
hall_effect      0
hotswap          0
case_material    0
backlight        0
connectivity     0
screen           0
knob             0
dtype: int64

## Preprocessing
- knowing there are missing values, determine what they are and deal with them

In [None]:
# check for all the missing values in the layout column
df['layout'].unique()

In [None]:
#df['layout'] = df['layout'].str.replace(',', '')

In [None]:
#check for all the unique values in the mount column
df['mount'].unique()

In [None]:
#df['mount'] = df['mount'].str.replace(',', '')

In [None]:
df['case_material'].unique()

In [None]:
#df['case_material'] = df['case_material'].str.replace(',', '')

In [None]:
# check for all the unique values in the 'connectivity' column
df['connectivity'].unique()

In [None]:
#df['connectivity'] = df['connectivity'].str.replace(',', '')

In [None]:
# fill missing values with 'Unknown' for categorical columns
df['layout'].fillna('Unknown', inplace=True)
df['mount'].fillna('Unknown', inplace=True)
df['case_material'].fillna('Unknown', inplace=True)
df['connectivity'].fillna('Unknown', inplace=True)

In [None]:
# minor mistake in the above code, lazy fix ^^
#df['case_material'].replace(to_replace="Unknown",
#         value="Unspecified(likely ABS plastic)", inplace=True)
#df['case_material'].replace(to_replace="Alu case",
#         value="Metal (likely Aluminium)", inplace=True)
#df.tail()


In [None]:
# # extract brand from title n make a new column
# df['brand'] = df['title'].str.split().str[0]
# # df.head()

# #reorder the columns to make brand go first
# df = df[['brand', 'title', 'price', 'layout', 'mount', 'hall_effect', 'hotswap', 'case_material', 'backlight', 'connectivity', 'screen', 'knob']]
# df.head()



# categorizing brands

## Price-Based Categorization with Brand Context

Calculate the average price for each brand

Calculate the price range (min, max, standard deviation) for each brand

Use these metrics to categorize brands

This would help account for brands that offer both high-end and budget options. For example:
- If a brand's average price is high but has a wide range, it might be a "premium" brand that also offers budget options
- If a brand's average price is low with a narrow range, it's likely a "budget" brand
- If a brand's average price is moderate with some variation, it might be "midrange"
## Price Tiers by Feature Set
Define price tiers based on feature combinations (e.g., hotswap + aluminum case + wireless = premium features)
Compare a keyboard's price to what would be expected given its features
Use this "price-to-feature ratio" to categorize brands

In [None]:
# df = df.drop('brand_category', axis=1)

# df.to_csv('keebfinder_keyboards_rev7.csv', index=False)

# #new brand categorization approach (!GPT SUGGESTION)
# # Calculate brand statistics (!GPT CODE)
# brand_stats = df.groupby('brand').agg({
#     'price': ['mean', 'min', 'max', 'std', 'count']
# }).reset_index()

# # name cols
# brand_stats.columns = ['brand', 'avg_price', 'min_price', 'max_price', 'price_std', 'product_count']

# # Calculate price percentiles for the entire dataset (!GPT CODE)
# # First, convert price strings to numeric values (removing '$' and ',' characters)

# df['price_numeric'] = df['price'].str.replace('$', '').str.replace(',', '').astype(float)
# price_percentiles = df['price_numeric'].quantile([0.33, 0.66])

# # Define brand categories based on multiple metrics (!GPT SUGGESTION)
# def categorize_brand(row):
#     avg_price = row['avg_price']
#     price_std = row['price_std']
#     price_range = row['max_price'] - row['min_price']
    
#     # Premium: High average price OR wide price range with high max

#     if avg_price > price_percentiles[0.66] or (price_range > 100 and row['max_price'] > price_percentiles[0.66]):
#         return 'premium'
#     # Budget: Low average price AND narrow price range

#     elif avg_price < price_percentiles[0.33] and price_range < 50:
#         return 'budget'
#     # Midrange: Everything else

#     else:
#         return 'midrange'

# # apply categorization
# brand_stats['category'] = brand_stats.apply(categorize_brand, axis=1)

# # results
# print("\nPrice Percentiles:")
# print(f"33rd percentile: ${price_percentiles[0.33]:.2f}")
# print(f"66th percentile: ${price_percentiles[0.66]:.2f}")

# print("\nBrand Categories:")
# print(brand_stats[['brand', 'category', 'avg_price', 'min_price', 'max_price', 'product_count']].sort_values('avg_price', ascending=False))

# # ddd category back to main dataframe
# df = df.merge(brand_stats[['brand', 'category']], on='brand', how='left')

# sahaja i want the brand_category to be the first column
#df = df[['brand_category','brand', 'title', 'price', 'layout', 'mount', 'hall_effect', 'hotswap', 'case_material', 'backlight', 'connectivity', 'screen', 'knob']]

# Save the new categorized dataset
#df.to_csv('keebfinder_keyboards_rev8.csv', index=False)