# üìà Palmer Penguins: Part 2 - Statistical Analysis and Modeling

In this notebook, we extend our analysis by exploring feature relationships and building a simple classification model to predict penguin species.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load and clean the data
url = "https://raw.githubusercontent.com/allisonhorst/palmerpenguins/master/inst/extdata/penguins.csv"
penguins = pd.read_csv(url)
penguins_clean = penguins.dropna()
penguins_clean.head()

## üîç Correlation Heatmap
Let‚Äôs explore the relationships between numeric features.

In [None]:
sns.heatmap(penguins_clean.corr(numeric_only=True), annot=True, cmap="coolwarm")
plt.title("Feature Correlation Heatmap")
plt.show()

## ü§ñ Simple Machine Learning Model
Let‚Äôs predict the species of penguins using body measurements.

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import LabelEncoder

# Encode species labels
le = LabelEncoder()
penguins_clean['species_label'] = le.fit_transform(penguins_clean['species'])

# Select features and target
features = ['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g']
X = penguins_clean[features]
y = penguins_clean['species_label']

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
clf = RandomForestClassifier()
clf.fit(X_train, y_train)

# Evaluate
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy:.2f}")