In the world of travel preferences, the age-old debate of mountains versus beaches continues to captivate our imaginations. But what if we could predict your preference based on your lifestyle and demographics? Let's dive into the data and see what insights we can uncover. If you find this notebook useful, please consider upvoting it.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report
import warnings
warnings.filterwarnings('ignore')

In [2]:
# Load the dataset
file_path = '/kaggle/input/mountains-vs-beaches-preference/mountains_vs_beaches_preferences.csv'
df = pd.read_csv(file_path)
df.head()

### Data Overview
Let's take a look at the basic information about the dataset to understand its structure and contents.

In [3]:
df.info()

### Exploratory Data Analysis
Let's explore the dataset to understand the distribution of preferences and other features.

In [4]:
# Distribution of Preferences
sns.countplot(x='Preference', data=df)
plt.title('Distribution of Preferences: Mountains vs Beaches')
plt.xlabel('Preference (0: Mountains, 1: Beaches)')
plt.ylabel('Count')
plt.show()

In [5]:
# Age distribution
sns.histplot(df['Age'], bins=20, kde=True)
plt.title('Age Distribution')
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.show()

In [6]:
# Income distribution
sns.histplot(df['Income'], bins=20, kde=True)
plt.title('Income Distribution')
plt.xlabel('Income')
plt.ylabel('Frequency')
plt.show()

### Correlation Analysis
Let's examine the correlation between numeric features to see if there are any interesting relationships.

In [7]:
# Select only numeric columns
numeric_df = df.select_dtypes(include=[np.number])
correlation_matrix = numeric_df.corr()
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()

### Predictive Modeling
Let's build a predictive model to see if we can accurately predict a person's preference for mountains or beaches based on the other features.

In [8]:
# Prepare the data for modeling
X = df.drop('Preference', axis=1)
y = df['Preference']

# Convert categorical variables to dummy variables
X = pd.get_dummies(X, drop_first=True)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the model
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Model Accuracy: {accuracy:.2f}')
print(classification_report(y_test, y_pred))

### Discussion
In this notebook, we explored the fascinating dataset of travel preferences, focusing on the classic debate of mountains versus beaches. We visualized the distribution of preferences and other key features, examined correlations, and built a predictive model using a Random Forest Classifier. The model achieved a reasonable accuracy, suggesting that demographic and lifestyle factors can indeed provide insights into travel preferences.

For future analysis, it would be interesting to explore more sophisticated models or feature engineering techniques to improve prediction accuracy. Additionally, incorporating external data sources, such as weather patterns or regional tourism statistics, could provide further context and enhance the model's predictive power.

## Credits
This notebook was created with the help of [Devra AI data science assistant](https://devra.ai/ref/kaggle)