#**Hill and Valley Prediction**

---



#**Objective**
To develop an Machine Learning Model to predict Hill and Valley using Logistic Regression method that can accurately classify geographical locations as either hills or valleys based on a set of input features. The model will be trained on a labeled dataset of geographical features and their corresponding classifications, and then evaluated on a separate test dataset to measure its performance. The ultimate goal of this project is to provide a useful tool for identifying hills and valleys in various geographic locations, which can have important applications in fields such as geology, agriculture, and urban planning.

#**Get Understanding about Data set**
Each record represents 100 points on a two-dimensional graph. When plotted in order (from 1 through 100) as the Y coordinate, the points will create either a Hill (a "bump" in the terrain) or a Valley (a "dip" in the terrain). See the original source for some examples of these graphs.

**1-100**: Labeled "V##". Floating point values (numeric), the X-values.

**101**: Labeled "Class". Binary {0, 1} representing {valley, hill}


#**Import Library**

In [1]:
import pandas as pd
import numpy as np

#**Import Data**

In [2]:
hill = pd.read_csv('https://github.com/YBI-Foundation/Dataset/raw/main/Hill%20Valley%20Dataset.csv')

In [3]:
hill.head()

Unnamed: 0,V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,...,V92,V93,V94,V95,V96,V97,V98,V99,V100,Class
0,39.02,36.49,38.2,38.85,39.38,39.74,37.02,39.53,38.81,38.79,...,36.62,36.92,38.8,38.52,38.07,36.73,39.46,37.5,39.1,0
1,1.83,1.71,1.77,1.77,1.68,1.78,1.8,1.7,1.75,1.78,...,1.8,1.79,1.77,1.74,1.74,1.8,1.78,1.75,1.69,1
2,68177.69,66138.42,72981.88,74304.33,67549.66,69367.34,69169.41,73268.61,74465.84,72503.37,...,73438.88,71053.35,71112.62,74916.48,72571.58,66348.97,71063.72,67404.27,74920.24,1
3,44889.06,39191.86,40728.46,38576.36,45876.06,47034.0,46611.43,37668.32,40980.89,38466.15,...,42625.67,40684.2,46960.73,44546.8,45410.53,47139.44,43095.68,40888.34,39615.19,0
4,5.7,5.4,5.28,5.38,5.27,5.61,6.0,5.38,5.34,5.87,...,5.17,5.67,5.6,5.94,5.73,5.22,5.3,5.73,5.91,0


In [4]:
hill.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1212 entries, 0 to 1211
Columns: 101 entries, V1 to Class
dtypes: float64(100), int64(1)
memory usage: 956.5 KB


#**Describe Data**

In [5]:
hill.describe()

Unnamed: 0,V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,...,V92,V93,V94,V95,V96,V97,V98,V99,V100,Class
count,1212.0,1212.0,1212.0,1212.0,1212.0,1212.0,1212.0,1212.0,1212.0,1212.0,...,1212.0,1212.0,1212.0,1212.0,1212.0,1212.0,1212.0,1212.0,1212.0,1212.0
mean,8169.091881,8144.306262,8192.653738,8176.868738,8128.297211,8173.030008,8188.582748,8183.641543,8154.670066,8120.767574,...,8120.056815,8125.917409,8158.793812,8140.885421,8213.480611,8185.594002,8140.195355,8192.960891,8156.197376,0.5
std,17974.950461,17881.049734,18087.938901,17991.903982,17846.757963,17927.114105,18029.562695,18048.582159,17982.390713,17900.798206,...,17773.190621,17758.182403,17919.510371,17817.945646,18016.445265,17956.084223,17768.356106,18064.781479,17829.310973,0.500206
min,0.92,0.9,0.85,0.89,0.88,0.86,0.87,0.65,0.65,0.62,...,0.87,0.9,0.87,0.88,0.89,0.89,0.86,0.91,0.89,0.0
25%,19.6025,19.595,18.925,19.2775,19.21,19.5825,18.69,19.0625,19.5325,19.285,...,19.1975,18.895,19.2375,19.385,19.0275,19.135,19.205,18.8125,19.145,0.0
50%,301.425,295.205,297.26,299.72,295.115,294.38,295.935,290.85,294.565,295.16,...,297.845,295.42,299.155,293.355,301.37,296.96,300.925,299.2,302.275,0.5
75%,5358.795,5417.8475,5393.3675,5388.4825,5321.9875,5328.04,5443.9775,5283.655,5378.18,5319.0975,...,5355.355,5386.0375,5286.385,5345.7975,5300.89,5361.0475,5390.85,5288.7125,5357.8475,1.0
max,117807.87,108896.48,119031.35,110212.59,113000.47,116848.39,115609.24,118522.32,112895.9,117798.3,...,113858.68,112948.83,112409.57,112933.73,112037.22,115110.42,116431.96,113291.96,114533.76,1.0


In [6]:
hill.columns

Index(['V1', 'V2', 'V3', 'V4', 'V5', 'V6', 'V7', 'V8', 'V9', 'V10',
       ...
       'V92', 'V93', 'V94', 'V95', 'V96', 'V97', 'V98', 'V99', 'V100',
       'Class'],
      dtype='object', length=101)

**All columns name not printed**

In [7]:
print(hill.columns.tolist())

['V1', 'V2', 'V3', 'V4', 'V5', 'V6', 'V7', 'V8', 'V9', 'V10', 'V11', 'V12', 'V13', 'V14', 'V15', 'V16', 'V17', 'V18', 'V19', 'V20', 'V21', 'V22', 'V23', 'V24', 'V25', 'V26', 'V27', 'V28', 'V29', 'V30', 'V31', 'V32', 'V33', 'V34', 'V35', 'V36', 'V37', 'V38', 'V39', 'V40', 'V41', 'V42', 'V43', 'V44', 'V45', 'V46', 'V47', 'V48', 'V49', 'V50', 'V51', 'V52', 'V53', 'V54', 'V55', 'V56', 'V57', 'V58', 'V59', 'V60', 'V61', 'V62', 'V63', 'V64', 'V65', 'V66', 'V67', 'V68', 'V69', 'V70', 'V71', 'V72', 'V73', 'V74', 'V75', 'V76', 'V77', 'V78', 'V79', 'V80', 'V81', 'V82', 'V83', 'V84', 'V85', 'V86', 'V87', 'V88', 'V89', 'V90', 'V91', 'V92', 'V93', 'V94', 'V95', 'V96', 'V97', 'V98', 'V99', 'V100', 'Class']


In [8]:
hill.shape

(1212, 101)

#**Get Unique Values(class) in y Variable**

In [9]:
hill['Class'].value_counts()

0    606
1    606
Name: Class, dtype: int64

In [10]:
hill.groupby('Class').mean()

Unnamed: 0_level_0,V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,...,V91,V92,V93,V94,V95,V96,V97,V98,V99,V100
Class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,7913.333251,7825.339967,7902.497294,7857.032079,7775.610198,7875.436337,7804.166584,7722.324802,7793.328416,7686.782046,...,7753.427244,7737.843366,7799.332079,7825.2117,7791.35401,7927.237112,7874.502343,7844.227459,7875.338713,7855.181172
1,8424.850512,8463.272558,8482.810182,8496.705396,8480.984224,8470.62368,8572.998911,8644.958284,8516.011716,8554.753102,...,8478.513399,8502.270264,8452.502739,8492.375924,8490.416832,8499.724109,8496.68566,8436.163251,8510.583069,8457.213581


#**Define Target Variable (y) and Feature Variables (X)**

In [11]:
y = hill['Class']

In [12]:
y.shape

(1212,)

In [13]:
X = hill.drop('Class', axis=1)

In [14]:
X.shape

(1212, 100)

#**Data Preprocessing**

**Get X Variables Standardized**

In [16]:
from sklearn.preprocessing import StandardScaler
ss = StandardScaler()

In [17]:
X = ss.fit_transform(X)

In [18]:
X

array([[-0.45248681, -0.45361784, -0.45100881, ..., -0.45609618,
        -0.45164274, -0.45545496],
       [-0.45455665, -0.45556372, -0.45302369, ..., -0.45821768,
        -0.45362255, -0.45755405],
       [ 3.33983504,  3.24466709,  3.58338069, ...,  3.5427869 ,
         3.27907378,  3.74616847],
       ...,
       [ 0.11084204,  0.0505953 ,  0.04437307, ...,  0.12533312,
         0.04456025,  0.06450317],
       [-0.45272112, -0.45369729, -0.45118691, ..., -0.45648861,
        -0.45190136, -0.45569511],
       [ 0.01782872, -0.02636986,  0.05196137, ...,  0.03036056,
         0.01087365,  0.03123129]])

#**Train Test Split**

In [19]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.3, stratify=y, random_state=2529)

In [20]:
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((848, 100), (364, 100), (848,), (364,))

#**Modeling**

In [21]:
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()

In [22]:
model.fit(X_train,y_train)

#**Prediction**

In [23]:
y_pred = model.predict(X_test)

In [24]:
y_pred

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1,
       0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
       0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0,
       0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
       1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1,
       0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0,
       0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1,

#**Model Evaluation**

In [25]:
from sklearn.metrics import confusion_matrix, classification_report

In [26]:
print(confusion_matrix(y_test,y_pred))

[[181   1]
 [106  76]]


In [27]:
print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

           0       0.63      0.99      0.77       182
           1       0.99      0.42      0.59       182

    accuracy                           0.71       364
   macro avg       0.81      0.71      0.68       364
weighted avg       0.81      0.71      0.68       364



#**Get Future Prediction**
**Lets select a random sample from existing dataset as new value**
Steps to follow:
1. Extract random row using sample function
2. Separate X and y
3. Standerdize X
2. Predict

In [28]:
X_new = hill.sample(1)

In [29]:
X_new

Unnamed: 0,V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,...,V92,V93,V94,V95,V96,V97,V98,V99,V100,Class
870,1294.97,1359.78,1192.0,1370.03,1265.43,1228.03,1237.01,1240.67,1258.73,1273.1,...,1347.96,1331.43,1307.61,1199.38,1342.36,1295.36,1273.35,1345.86,1359.03,0


In [30]:
X_new = X_new.drop('Class', axis=1)

In [31]:
X_new

Unnamed: 0,V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,...,V91,V92,V93,V94,V95,V96,V97,V98,V99,V100
870,1294.97,1359.78,1192.0,1370.03,1265.43,1228.03,1237.01,1240.67,1258.73,1273.1,...,1305.27,1347.96,1331.43,1307.61,1199.38,1342.36,1295.36,1273.35,1345.86,1359.03


In [32]:
X_new = ss.fit_transform(X_new)

In [33]:
y_pred_new = model.predict(X_new)

In [34]:
y_pred_new

array([1])

#**Explanation**
 This project is based on the Dasa science using Python Programming language.The main purpose of creating this project is to develop a predictive model using logistic regression that can accurately classify geographical locations as either hills or valleys based on a set of input features.

 The model will be trained on a labeled dataset of geographical features and their corresponding classifications, and then evaluated on a separate test dataset to measure its performance.

 The ultimate goal of this project is to provide a useful tool for identifying hills and valleys in various geographic locations, which can have important applications in fields such as geology, agriculture, and urban planning.