# Task
Perform Multidimensional Scaling (MDS) and K-Means clustering on provinces using features `MPP_Total_Persen` and `Jumlah_Rantai_Utama` to segment them, then analyze the characteristics of each resulting cluster.

## Data Preparation for MDS

### Subtask:
Select relevant numerical features and standardize them to prepare for Multidimensional Scaling.


**Reasoning**:
The current `df` is not in a clean format suitable for analysis. The actual data starts from row index 3, and the column names need to be set properly. I will clean the DataFrame by selecting relevant rows, renaming columns, and converting the 'MPP_Total_Persen' column to a numeric type. I will also check for the existence of `Jumlah_Rantai_Utama` as requested by the subtask.



In [30]:
import pandas as pd

# Create a clean DataFrame starting from row index 3
df_clean = df.iloc[3:].copy()

# Rename columns
df_clean.rename(columns={'38 Provinsi': 'Provinsi', 'Unnamed: 1': 'MPP_Total_Persen'}, inplace=True)

# Convert 'MPP_Total_Persen' to numeric, handling potential errors
df_clean['MPP_Total_Persen'] = pd.to_numeric(df_clean['MPP_Total_Persen'], errors='coerce')

# Drop rows where 'MPP_Total_Persen' is NaN (failed conversion or truly missing)
df_clean.dropna(subset=['MPP_Total_Persen'], inplace=True)

# Display the cleaned DataFrame head and info
print("Cleaned DataFrame head:")
print(df_clean.head())
print("\nCleaned DataFrame info:")
df_clean.info()

# Check for the existence of 'Jumlah_Rantai_Utama'
if 'Jumlah_Rantai_Utama' not in df_clean.columns:
    print("\nError: Column 'Jumlah_Rantai_Utama' not found in the dataset.")
    print("Cannot proceed with the standardization as requested, as one of the required features is missing.")

Cleaned DataFrame head:
         Provinsi  MPP_Total_Persen Unnamed: 2
3            ACEH              5.61          2
4  SUMATERA UTARA             24.35          4
5  SUMATERA BARAT              5.36          2
6            RIAU              8.55          3
7           JAMBI              7.49          2

Cleaned DataFrame info:
<class 'pandas.core.frame.DataFrame'>
Index: 35 entries, 3 to 41
Data columns (total 3 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Provinsi          35 non-null     object 
 1   MPP_Total_Persen  35 non-null     float64
 2   Unnamed: 2        35 non-null     object 
dtypes: float64(1), object(2)
memory usage: 1.1+ KB

Error: Column 'Jumlah_Rantai_Utama' not found in the dataset.
Cannot proceed with the standardization as requested, as one of the required features is missing.


## Data Preparation for MDS (Retry 1)

### Subtask:
Identify the missing feature 'Jumlah_Rantai_Utama' by inspecting available columns, rename it if found, and then select and standardize the relevant numerical features to prepare for Multidimensional Scaling.


**Reasoning**:
The previous step indicated that 'Jumlah_Rantai_Utama' was missing. Based on the subtask instructions and the `df_clean.info()` output from the last step, 'Unnamed: 2' is the likely candidate for 'Jumlah_Rantai_Utama'. I will now rename this column, convert it to a numeric type, handle potential NaNs, and then standardize the selected features 'MPP_Total_Persen' and 'Jumlah_Rantai_Utama' as instructed for MDS.



In [31]:
from sklearn.preprocessing import StandardScaler

# 1. Rename 'Unnamed: 2' to 'Jumlah_Rantai_Utama'
df_clean.rename(columns={'Unnamed: 2': 'Jumlah_Rantai_Utama'}, inplace=True)

# 2. Convert 'Jumlah_Rantai_Utama' to numeric
df_clean['Jumlah_Rantai_Utama'] = pd.to_numeric(df_clean['Jumlah_Rantai_Utama'], errors='coerce')

# 3. Drop rows where 'Jumlah_Rantai_Utama' is NaN
df_clean.dropna(subset=['Jumlah_Rantai_Utama'], inplace=True)

# 4. Select the relevant features for MDS
df_features_mds = df_clean[['MPP_Total_Persen', 'Jumlah_Rantai_Utama']].copy()

# 5. Initialize StandardScaler
scaler = StandardScaler()

# 6. Apply StandardScaler to the features
df_scaled = pd.DataFrame(scaler.fit_transform(df_features_mds), columns=df_features_mds.columns)

print("Cleaned and prepared DataFrame for MDS (first 5 rows):")
print(df_clean.head())
print("\nFeatures selected for MDS (first 5 rows):")
print(df_features_mds.head())
print("\nScaled features for MDS (first 5 rows):")
print(df_scaled.head())
print("\nScaled features info:")
df_scaled.info()

Cleaned and prepared DataFrame for MDS (first 5 rows):
         Provinsi  MPP_Total_Persen  Jumlah_Rantai_Utama
3            ACEH              5.61                    2
4  SUMATERA UTARA             24.35                    4
5  SUMATERA BARAT              5.36                    2
6            RIAU              8.55                    3
7           JAMBI              7.49                    2

Features selected for MDS (first 5 rows):
   MPP_Total_Persen  Jumlah_Rantai_Utama
3              5.61                    2
4             24.35                    4
5              5.36                    2
6              8.55                    3
7              7.49                    2

Scaled features for MDS (first 5 rows):
   MPP_Total_Persen  Jumlah_Rantai_Utama
0         -1.030345            -1.233905
1          0.498483             2.088148
2         -1.050740            -1.233905
3         -0.790497             0.427121
4         -0.876973            -1.233905

Scaled features info:
<cla