# Market Segmentation Analysis - Step 5 (Python Conversion)
This notebook replicates the R code examples in Python using pandas, numpy, and scikit-learn.

## 1. Loading Sample Data (Tourist Activities Example)
Simulating the tourist activities dataset as described in the R code.

In [None]:

import pandas as pd
import numpy as np

# Sample data (Tourist activities: BEACH, ACTION, CULTURE)
data = {
    "Name": ["Anna", "Bill", "Frank", "Julia", "Maria", "Michael", "Tom"],
    "Beach": [100, 100, 60, 70, 80, 0, 50],
    "Action": [0, 0, 40, 0, 0, 90, 20],
    "Culture": [0, 0, 0, 30, 20, 10, 30]
}

df = pd.DataFrame(data)
df.set_index("Name", inplace=True)
df


## 2. Distance Calculations
Calculating Euclidean and Manhattan distances.

In [None]:

from scipy.spatial.distance import pdist, squareform

# Euclidean Distance
euclidean_dist = pd.DataFrame(squareform(pdist(df, metric='euclidean')), 
                              index=df.index, columns=df.index)
print("Euclidean Distance Matrix:")
euclidean_dist.round(2)


In [None]:

# Manhattan Distance
manhattan_dist = pd.DataFrame(squareform(pdist(df, metric='cityblock')), 
                              index=df.index, columns=df.index)
print("Manhattan Distance Matrix:")
manhattan_dist


## 3. Hierarchical Clustering and Dendrogram
Using Scipy for hierarchical clustering.

In [None]:

import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage

# Complete Linkage
linkage_matrix = linkage(df, method='complete', metric='cityblock')

# Plotting Dendrogram
plt.figure(figsize=(10, 5))
dendrogram(linkage_matrix, labels=df.index.tolist(), leaf_rotation=90)
plt.title("Complete Linkage Dendrogram (Manhattan Distance)")
plt.tight_layout()
plt.show()


## 4. k-Means Clustering Example

In [None]:

from sklearn.cluster import KMeans

# Number of clusters
kmeans = KMeans(n_clusters=3, random_state=1234, n_init=10)
df['Cluster'] = kmeans.fit_predict(df)

# Show results
df


## 5. Visualize Cluster Assignments

In [None]:

import seaborn as sns

sns.pairplot(df.reset_index(), hue='Cluster', vars=["Beach", "Action", "Culture"], palette='Set2')
plt.suptitle("Cluster Visualization", y=1.02)
plt.show()
