# Hill and Valley Prediction with Logistic Regression 

-------------

## **Objective**

The objective of this project is to build a logistic regression model to predict whether a given data point represents a hill or a valley based on its features. This binary classification problem will involve data preprocessing, model training, evaluation, and prediction.

## **Data Source**

The Hill and Valley dataset can be sourced from the UCI Machine Learning Repository: Hill and Valley Data Set

## **Import Library**

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score 

## **Import Data**

In [None]:
data = pd.read_csv('hill_valley.csv')

## **Describe Data**

In [None]:
# Display the first few rows of the dataset
print(data.head())

# Summary statistics of the dataset
print(data.describe())

# Information about the dataset
print(data.info())

## **Data Visualization**

In [None]:
# Visualize the distribution of the target variable
sns.countplot(x='class', data=data)
plt.title('Distribution of Hill and Valley Classes')
plt.show()

# Visualize some feature distributions
sns.histplot(data['V1'], kde=True, bins=30)
plt.title('Distribution of Feature V1')
plt.show()

## **Data Preprocessing**

In [None]:
# Check for missing values
print(data.isnull().sum())

# Standardize the feature variables
scaler = StandardScaler()
scaled_features = scaler.fit_transform(data.drop('class', axis=1))

# Convert scaled features back to a dataframe
scaled_data = pd.DataFrame(scaled_features, columns=data.columns[:-1])

## **Define Target Variable (y) and Feature Variables (X)**

In [None]:
X = scaled_data
y = data['class']

## **Train Test Split**

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## **Modeling**

In [None]:
model = LogisticRegression()
model.fit(X_train, y_train)

## **Model Evaluation**

In [None]:
# Predictions on the test set
y_pred = model.predict(X_test)

# Confusion matrix
conf_matrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:\n", conf_matrix)

# Classification report
class_report = classification_report(y_test, y_pred)
print("Classification Report:\n", class_report)

# Accuracy score
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy Score:", accuracy)

## **Prediction**

In [None]:
# Example prediction
sample_data = X_test.iloc[0].values.reshape(1, -1)
sample_prediction = model.predict(sample_data)
print("Sample Prediction:", sample_prediction)

## **Explaination**

The logistic regression model was trained to classify whether a data point represents a hill or a valley. The features were scaled for better performance, and the model's performance was evaluated using a confusion matrix, classification report, and accuracy score. The model's predictions were demonstrated with a sample prediction. The project shows how logistic regression can be effectively applied to binary classification problems in machine learning. 