고객사의 사례집을 블로그 등에서 수집하고 정보된 기업에 한하여 기업정보를 분석한 결과를 정리한 데이터의 내용은 이하와 같았다.
✅ 1. Null 데이터 판단 및 처리 전략
📌 주요 결측 항목
항목명	Null 개수	전체 대비 비율
기업명	74개	62%
업계	16개	13%
기업규모	53개	45%
설립년도	74개	62%
기업매출	74개	62%
재직자수	74개	62%

🎯 결론: 단순 삭제는 불가능 → 통계 기반 대체 필요
항목	처리 전략
기업명, 설립년도, 기업매출, 재직자수	Null 그대로 유지 또는 군집에서 제외 (비식별화된 케이스로 판단)
기업규모 (범주형)	업계별로 가장 많이 나타난 기업규모로 대체 (최빈값 기준)
재직자수, 기업매출 (수치형)	기업규모별 중앙값으로 대체

먼저, 데이터의 처리를 수행한다.

✅ 기능 요약:
기업규모 → 업계별 최빈값으로 대체

재직자수, 기업매출 → 기업규모별 중앙값으로 대체

기업매출 → ₩, 콤마 제거 후 float 처리

범주형(업계, 기업규모) → One-Hot Encoding

전체 feature → StandardScaler 정규화

In [1]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import OneHotEncoder, StandardScaler

# 1. Load data
df = pd.read_csv(r"C:\Users\EL031\Desktop\BCK\교육사례집.csv")  # 파일명에 맞게 수정

# 2. 기업매출 → 숫자로 변환
def convert_currency(x):
    if pd.isna(x):
        return np.nan
    x = str(x).replace("₩", "").replace(",", "").strip()
    try:
        return float(x)
    except:
        return np.nan

df['기업매출'] = df['기업매출'].apply(convert_currency)

# 3. 기업규모: 업계별 최빈값으로 대체
df['기업규모'] = df.groupby('업계')['기업규모'].transform(
    lambda x: x.fillna(x.mode().iloc[0] if not x.mode().empty else '기타')
)

# 4. 재직자수 / 기업매출: 기업규모별 중앙값으로 대체
for col in ['재직자수', '기업매출']:
    median_dict = df.groupby('기업규모')[col].median()
    df[col] = df.apply(
        lambda row: median_dict[row['기업규모']]
        if pd.notna(row['기업규모']) and pd.isna(row[col])
        else row[col],
        axis=1
    )

# 5. One-Hot Encoding: 업계 / 기업규모
encoder = OneHotEncoder(sparse_output=False, handle_unknown='ignore')
encoded = encoder.fit_transform(df[['업계', '기업규모']])
encoded_cols = encoder.get_feature_names_out(['업계', '기업규모'])

# 결과를 DataFrame으로 변환 후 원본에 결합
encoded_df = pd.DataFrame(encoded, columns=encoded_cols, index=df.index)
df = pd.concat([df, encoded_df], axis=1)

# 6. Combine numerical and encoded features
features = pd.concat([encoded_df, df[['재직자수', '기업매출']]], axis=1)

# 7. 정규화
scaler = StandardScaler()
X_scaled = scaler.fit_transform(features)

# 8. 결과 확인
print("✅ 전처리 완료! 정규화된 데이터 shape:", X_scaled.shape)
features.head()


✅ 전처리 완료! 정규화된 데이터 shape: (119, 28)


Unnamed: 0,업계_IT,"업계_건축,토목",업계_교통,업계_금융,업계_농업,업계_도매업,업계_문화,업계_반도체,업계_뷰티/미용,업계_산업기계제조,...,"기업규모_공기업,공공기관",기업규모_기타,기업규모_대기업,기업규모_외국계,기업규모_일본계,기업규모_중견기업,기업규모_중소기업,기업규모_nan,재직자수,기업매출
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,10.0,6600000000.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,310.0,97200000000.0
2,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5068.0,3000000000000.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1310.5,5060000000000.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,125.0,60500000000.0


🚀 선택 군집 기법
알고리즘	이유	설명
SOM (Self-Organizing Map)	
시각화, 탐색적 분석 적합	
고객 특성군 분류에 적합. 고차원 데이터를 2D로 축소

FCM (Fuzzy C-Means)	
다소 유연한 군집	
하나의 고객이 여러 군집에 속할 가능성을 반영함

👉 우리는 SOM → 시각적 이해, FCM → 마케팅 세분화를 위해 둘 다 실행하여 비교할 수 있습니다.

✅ 사전 준비: 필수 라이브러리 설치
먼저 아래의 패키지를 설치해야 합니다.

In [10]:
pip install minisom fcmeans matplotlib scikit-learn

Collecting minisom
  Downloading minisom-2.3.5.tar.gz (12 kB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Note: you may need to restart the kernel to use updated packages.


ERROR: Could not find a version that satisfies the requirement fcmeans (from versions: none)

[notice] A new release of pip is available: 25.0.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip
ERROR: No matching distribution found for fcmeans


In [4]:
pip install fuzzy-c-means


Collecting fuzzy-c-means
  Downloading fuzzy_c_means-1.7.2-py3-none-any.whl.metadata (4.7 kB)
Collecting numpy<2.0.0,>=1.21.1 (from fuzzy-c-means)
  Downloading numpy-1.26.4.tar.gz (15.8 MB)
     ---------------------------------------- 0.0/15.8 MB ? eta -:--:--
     ----- ---------------------------------- 2.1/15.8 MB 11.5 MB/s eta 0:00:02
     ----------- ---------------------------- 4.7/15.8 MB 11.9 MB/s eta 0:00:01
     ------------------ --------------------- 7.3/15.8 MB 11.8 MB/s eta 0:00:01
     ------------------------ --------------- 9.7/15.8 MB 11.8 MB/s eta 0:00:01
     ------------------------------ -------- 12.3/15.8 MB 11.9 MB/s eta 0:00:01
     ------------------------------------ -- 14.9/15.8 MB 11.9 MB/s eta 0:00:01
     --------------------------------------- 15.8/15.8 MB 11.0 MB/s eta 0:00:00
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to bu

  error: subprocess-exited-with-error
  
  × Preparing metadata (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [21 lines of output]
      + c:\Users\EL031\AppData\Local\Programs\Python\Python313\python.exe C:\Users\EL031\AppData\Local\Temp\pip-install-6ybzpy28\numpy_ed484a8371f54b7c9489491ebc7a177a\vendored-meson\meson\meson.py setup C:\Users\EL031\AppData\Local\Temp\pip-install-6ybzpy28\numpy_ed484a8371f54b7c9489491ebc7a177a C:\Users\EL031\AppData\Local\Temp\pip-install-6ybzpy28\numpy_ed484a8371f54b7c9489491ebc7a177a\.mesonpy-ec4f31gn -Dbuildtype=release -Db_ndebug=if-release -Db_vscrt=md --native-file=C:\Users\EL031\AppData\Local\Temp\pip-install-6ybzpy28\numpy_ed484a8371f54b7c9489491ebc7a177a\.mesonpy-ec4f31gn\meson-python-native-file.ini
      The Meson build system
      Version: 1.2.99
      Source dir: C:\Users\EL031\AppData\Local\Temp\pip-install-6ybzpy28\numpy_ed484a8371f54b7c9489491ebc7a177a
      Build dir: C:\Users\EL031\AppData\Local\Temp\pip-install-6yb

In [13]:
pip install minisom

Collecting minisom
  Using cached minisom-2.3.5.tar.gz (12 kB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Building wheels for collected packages: minisom
  Building wheel for minisom (pyproject.toml): started
  Building wheel for minisom (pyproject.toml): finished with status 'done'
  Created wheel for minisom: filename=minisom-2.3.5-py3-none-any.whl size=12132 sha256=b404dc92813812389629cb3321f1bebe39dd56e155122d387f7a90a2b3bedc95
  Stored in directory: c:\users\el031\appdata\local\pip\cache\wheels\df\bc\51\5a64336510519dc8062d6e17d458721906b85b09abe192481e
Successfully built minisom
Installing collected packages: minisom
Successfully installed minisom-2.3.5
Note: you may need to restart the kernel to u


[notice] A new release of pip is available: 25.0.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from fuzzy-c-means import FCM
from minisom import MiniSom
from sklearn.preprocessing import StandardScaler, OneHotEncoder

# 전처리된 features 와 X_scaled 가 이미 준비된 상태라고 가정

# --------------------------------
# ✅ 1. SOM (Self-Organizing Map)
# --------------------------------

# SOM 설정
som_grid_x, som_grid_y = 10, 10
som = MiniSom(x=som_grid_x, y=som_grid_y, input_len=X_scaled.shape[1], sigma=1.0, learning_rate=0.5, random_seed=42)
som.train_batch(X_scaled, num_iteration=1000, verbose=True)

# 각 데이터가 매핑된 좌표 가져오기
win_map = np.array([som.winner(x) for x in X_scaled])
som_cluster_labels = [f"{x}-{y}" for x, y in win_map]

# 시각화 (2D PCA 기반)
X_pca = PCA(n_components=2).fit_transform(X_scaled)
plt.figure(figsize=(10, 6))
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=pd.factorize(som_cluster_labels)[0], cmap='tab10')
plt.title("SOM Clustering 결과 (PCA 기반)")
plt.xlabel("PCA 1")
plt.ylabel("PCA 2")
plt.colorbar(label='SOM 클러스터')
plt.grid(True)
plt.show()

# --------------------------------
# ✅ 2. FCM (Fuzzy C-Means)
# --------------------------------

fcm = FCM(n_clusters=4, random_state=42)
fcm.fit(X_scaled)

# FCM 클러스터 예측
fcm_labels = fcm.u.argmax(axis=1)

# 시각화
plt.figure(figsize=(10, 6))
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=fcm_labels, cmap='tab10')
plt.title("FCM Clustering 결과 (PCA 기반)")
plt.xlabel("PCA 1")
plt.ylabel("PCA 2")
plt.colorbar(label='FCM 클러스터')
plt.grid(True)
plt.show()

# --------------------------------
# ✅ 결과 저장 (선택)
# --------------------------------

df['SOM_Cluster'] = som_cluster_labels
df['FCM_Cluster'] = fcm_labels

# 군집별 통계 확인 예시
summary = df.groupby('FCM_Cluster')[['재직자수', '기업매출']].mean()
print("FCM 클러스터별 평균:")
print(summary)


ModuleNotFoundError: No module named 'fcmeans'