This dataset was generated using HRSC nadir panchromatic image h0905_0000 taken by the Mars Express spacecraft. The images are located in the Xanthe Terra, centered on Nanedi Vallis and covers mostly Noachian terrain on Mars. The image had a resolution of 12.5 meters/pixel.

**Problem statement**

Determine if the instance is a crater or not a crater. 1=Crater, 0=Not Crater

**About the dataset**

Using the technique described by L. Bandeira (Bandeira, Ding, Stepinski. 2010.Automatic Detection of Sub-km Craters Using Shape and Texture Information) we identify crater candidates in the image using the pipeline depicted in the figure below. Each crater candidate image block is normalized to a standard scale of 48 pixels. Each of the nine kinds of image masks probes the normalized image block in four different scales of 12 pixels, 24 pixels, 36 pixels, and 48 pixels, with a step of a third of the mask size (meaning 2/3 overlap). We totally extract 1,090 Haar-like attributes using nine types of masks as the attribute vectors to represent each crater candidate. The dataset was converted to the Weka ARFF format by Joseph Paul Cohen in 2012.
**Attribute Information**

We construct an attribute vector for each crater candidate using Haar-like attributes described by Papageorgiou 1998. These attributes are simple texture attributes that are calculated using Haar-like image masks that were used by Viola in 2004 for face detection consisting of only black and white sectors. The value of an attribute is the difference between the sum of gray pixel values located within the black sector and the white sector of an image mask. The figure below shows nine image masks used in our case study. The first five masks focus on capturing diagonal texture gradient changes while the remaining four masks on horizontal or vertical textures.

**How to read an image?**

Python supports very powerful tools when comes to image processing. Matplotlib is an amazing visualization library in Python for 2D plots of arrays. Matplotlib is a multi-platform data visualization library built on NumPy arrays and designed to work with the broader SciPy stack. It was introduced by John Hunter in the year 2002. We will use the Matplotlib library to convert the image to numpy as an array.

We import image from the Matplotlib library as mpimg.
Use mpimg.imread to read the image as numpy as array.
import matplotlib.image as mpimg
#<div class="w-percent-100 flex-hbox flex-cross-center flex-main-center">
          <div style="width:100%" class="flex-auto">
            <div style="width:100%; max-width:100%; overflow: hidden "><p><img src="https://storage.googleapis.com/ga-commit-live-prod-live-data/account/b92/11111111-1111-1111-1111-000000000000/b-43/9301164e-92b3-4f64-b699-737433839cd8/file.png" alt="tile" /></p></div>
          </div>
        </div>

image = mpimg.imread('crater1.png')

train.csv
The data file train.csv contains the 5892 instances with the 1091 features including the target feature.

test.csv
The datafile test.csv contains the 1473instances with the 1090 features excluding the target feature.
valuation metrics
For this particular dataset, we are using roc_auc_score as the evaluation metric.

Submissions will be evaluated based on ROC-AUC Score as per the below threshold.

Your roc_auc_score score	Points earned for the Task
0.89 <= roc_auc_score	100% of the available points
0.87 <= roc_auc_score < 0.89	80% of the available points
0.84 < roc_auc_score < 0.87	70% of the available points
roc_auc_score <= 0.84	No points earned

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
train_path = "/content/drive/MyDrive/Colab_Notebooks/mars_train.csv"
test_path = "/content/drive/MyDrive/Colab_Notebooks/mars_test.csv"
df_train=pd.read_csv(train_path)
df_test=pd.read_csv(test_path)

Checking the train & the test data

In [31]:
df_train.head()


Unnamed: 0.1,Id,Unnamed: 0,attr0,attr1,attr2,attr3,attr4,attr5,attr6,attr7,attr8,attr9,attr10,attr11,attr12,attr13,attr14,attr15,attr16,attr17,attr18,attr19,attr20,attr21,attr22,attr23,attr24,attr25,attr26,attr27,attr28,attr29,attr30,attr31,attr32,attr33,attr34,attr35,attr36,attr37,...,attr1050,attr1051,attr1052,attr1053,attr1054,attr1055,attr1056,attr1057,attr1058,attr1059,attr1060,attr1061,attr1062,attr1063,attr1064,attr1065,attr1066,attr1067,attr1068,attr1069,attr1070,attr1071,attr1072,attr1073,attr1074,attr1075,attr1076,attr1077,attr1078,attr1079,attr1080,attr1081,attr1082,attr1083,attr1084,attr1085,attr1086,attr1087,attr1088,attr1089
0,2216,2216,-4.374765,13.819856,14.656331,-9.728919,-19.334897,0.344455,11.10572,21.977302,14.822923,-24.72994,-12.07345,6.015896,19.84554,12.047648,-22.712267,-8.572944,6.681071,16.540066,14.858941,-16.717719,-12.927241,-8.8648,44.103488,38.053358,-43.025526,-32.710992,-3.750109,24.053819,19.861871,-7.46913,-4.585042,-0.315972,65.223253,36.735948,-51.804091,-54.989095,-16.795139,31.407661,...,92.17806,85.664063,84.205404,92.722656,93.854384,82.754991,73.006076,69.867839,77.809713,79.294217,86.885145,90.070313,89.619792,89.161024,87.796007,87.793403,82.235677,73.722222,87.74215,87.425645,98.044424,107.072284,105.65259,107.309475,97.643738,91.75883,98.246746,103.918837,101.709256,100.605008,89.083581,86.194838,93.162055,100.883355,123.558503,112.831384,100.583377,102.194939,120.306692,0
1,2673,2673,-13.796261,-4.647589,21.676617,-0.122074,11.228644,-8.806895,-9.16119,18.025709,4.948527,-11.680861,-30.65129,26.12576,19.141276,-25.525079,31.035224,-4.879354,-9.678657,8.343913,7.727105,-4.86103,-11.290397,69.127794,-7.838203,-65.591933,42.613661,26.295329,-23.078342,-10.830295,18.220432,1.682509,19.843592,102.323039,-50.642822,-105.744656,49.86849,68.779677,-28.649523,-25.667046,...,66.589084,48.35612,47.667101,54.417318,59.832682,52.856771,50.580295,61.448134,74.261115,87.317166,85.693088,82.081489,70.256727,59.416558,60.681424,58.894748,70.599392,65.003689,78.49008,70.279085,90.672518,78.513183,109.745781,101.185164,68.957438,79.8066,69.113593,62.749437,85.397597,74.236803,100.750899,83.373142,76.902208,72.182997,102.843819,93.118477,80.33857,80.196648,93.995657,0
2,5603,5603,-2.1154,-3.3324,-6.64,-13.825,4.1232,27.365,6.7002,3.783,8.9095,1.4539,-12.621,-13.274,14.091,2.9467,7.84,12.174,-4.0939,11.33,14.799,-1.31,-14.297,3.0388,47.134,13.665,-2.3066,-12.3,-27.281,23.683,33.561,-3.3681,-14.035,45.952,71.107,13.088,-11.703,-46.917,-55.606,30.275,...,56.945,43.807,50.26,63.772,73.012,67.101,50.6,35.716,76.025,67.214,57.907,55.767,58.154,59.648,50.925,39.688,28.826,27.733,48.041,35.73,54.066,55.223,65.636,66.725,64.301,54.499,65.823,71.385,60.33,49.283,52.917,34.799,42.562,51.161,77.139,73.367,50.733,39.949,60.731,0
3,6401,6401,-25.531,66.699,-13.025,-31.198,12.016,19.365,5.0451,20.418,24.372,18.163,-19.068,40.776,17.933,-8.5508,-0.97569,21.993,5.2198,41.417,35.896,-1.7412,18.183,1.9501,31.811,16.854,-0.43899,2.5635,-15.193,65.341,30.399,-14.735,9.5065,-45.954,16.734,52.615,2.5657,-36.302,-25.771,97.402,...,49.782,42.354,64.707,75.492,90.972,65.298,40.354,17.036,63.657,57.197,51.131,56.255,60.641,58.296,60.095,62.953,49.817,33.602,49.164,68.05,77.809,71.022,88.528,91.974,71.942,50.576,69.427,56.714,55.426,35.733,49.488,71.633,66.757,69.213,97.606,81.416,53.808,41.489,71.825,0
4,6043,6043,18.993,-5.62,-9.9649,3.3072,0.99976,-10.92,-11.392,3.9185,-1.1683,1.9185,19.622,-2.8097,-2.1113,6.2031,-2.138,-11.53,-11.078,-3.6164,-13.983,-5.9065,16.097,-0.14426,-5.6403,0.19536,7.3483,-4.0679,-14.424,-14.221,-15.05,-0.17631,0.95837,-6.8695,-36.387,-5.072,44.562,3.7177,-10.904,-18.273,...,92.568,96.391,79.363,61.198,36.856,20.206,10.961,21.575,97.482,102.69,104.96,108.12,109.21,91.484,61.71,27.154,13.412,21.789,25.435,32.667,34.266,18.431,50.159,45.08,23.52,26.535,64.178,59.698,49.904,35.044,84.508,89.976,61.169,33.132,58.043,54.522,80.941,53.0,80.615,1


In [32]:
df_test.head()

Unnamed: 0.1,Id,Unnamed: 0,attr0,attr1,attr2,attr3,attr4,attr5,attr6,attr7,attr8,attr9,attr10,attr11,attr12,attr13,attr14,attr15,attr16,attr17,attr18,attr19,attr20,attr21,attr22,attr23,attr24,attr25,attr26,attr27,attr28,attr29,attr30,attr31,attr32,attr33,attr34,attr35,attr36,attr37,...,attr1049,attr1050,attr1051,attr1052,attr1053,attr1054,attr1055,attr1056,attr1057,attr1058,attr1059,attr1060,attr1061,attr1062,attr1063,attr1064,attr1065,attr1066,attr1067,attr1068,attr1069,attr1070,attr1071,attr1072,attr1073,attr1074,attr1075,attr1076,attr1077,attr1078,attr1079,attr1080,attr1081,attr1082,attr1083,attr1084,attr1085,attr1086,attr1087,attr1088
0,3531,3531,-4.492422,-6.629738,-7.919406,4.508779,-2.831982,-4.172664,15.308077,9.871601,-1.252328,-1.087131,-4.327975,-14.392564,-6.259728,14.522641,-6.070357,-3.509447,19.599657,13.423116,4.168593,-7.703857,-3.453474,-10.463175,4.119921,15.427104,-9.428413,-12.521186,10.340658,22.230564,9.702582,-16.955716,-6.305039,-3.861379,16.481601,14.051737,-11.992459,-28.486586,-15.177463,35.728434,...,22.369561,28.123508,28.544732,25.993571,21.753743,18.89822,14.255317,12.103299,9.675076,21.274367,17.014621,13.354207,16.882514,18.058974,15.296984,11.430664,11.533257,13.211372,15.834852,12.966054,15.809386,26.805955,34.505375,36.814956,44.297389,31.720493,21.75101,20.505107,15.676361,16.430549,17.920403,22.5358,28.566427,29.7669,29.245158,31.312288,30.214145,19.960902,16.394512,20.275859
1,3916,3916,41.931532,-7.567764,-15.296777,-0.700684,-2.476057,-2.370331,23.990248,36.879395,8.682407,-5.309828,31.297687,-3.761427,-7.441189,11.075521,4.097466,-11.252184,15.238227,32.098741,15.693848,-5.647461,18.34225,17.881592,24.481689,16.604004,-7.466417,-31.997613,-0.200792,23.348307,32.292969,12.326226,14.485545,46.006307,49.748996,11.52455,-30.191135,-48.898383,-13.857639,24.704807,...,70.794379,70.873915,68.88227,72.98546,66.688802,54.091797,36.209635,29.672309,29.406684,90.532905,81.633681,70.913357,66.487305,66.465929,61.513997,51.168837,31.159505,19.709852,23.113064,64.163256,63.192662,71.039063,69.82063,95.18166,92.698154,86.0126,68.362074,81.105518,60.629605,42.052572,34.749268,82.050315,80.836724,84.742486,76.576226,92.660668,78.585458,73.230589,50.373267,68.214557
2,3065,3065,-41.64968,-53.923069,-52.578029,-28.343204,37.459261,80.567602,20.427972,-30.09058,-30.325087,-17.060866,-53.891342,-48.577364,-38.665712,-20.511624,27.157803,71.551161,32.921685,-19.197673,-21.126383,-0.6891,-62.286474,-40.073459,-24.824341,-12.255941,15.118896,61.329536,52.998291,-4.638401,-17.224175,11.489312,-68.701366,-31.152727,-14.660509,-8.3141,15.162354,67.793349,67.411838,-3.07373,...,70.302748,68.098307,62.533691,48.256293,31.239692,23.629774,30.592014,44.446181,55.059462,69.825467,57.727241,47.91645,39.292643,34.598524,34.517361,35.945095,39.666016,48.775608,56.029514,84.128499,86.259713,67.825903,53.523539,105.927118,93.29877,69.196581,55.973497,78.502119,71.303348,67.412014,67.991943,84.918449,84.841743,68.827976,52.59299,61.269953,62.074993,68.300953,78.849034,56.47457
3,3465,3465,-0.791667,-4.694444,-1.611111,-2.680556,-11.472222,-0.708333,6.291667,-2.777778,-7.430556,5.958333,-0.833333,-5.444444,-2.930556,-5.25,-10.166667,3.375,6.333333,-2.652778,-3.055556,5.833333,-0.736111,-6.583333,-4.402778,-10.138889,-6.388889,11.847222,5.402778,-4.597222,-1.333333,1.625,-0.819444,-5.583333,-8.541667,-16.263889,0.611111,20.152778,4.263889,-6.805556,...,9.680556,11.236111,12.041667,11.847222,10.458333,9.805556,10.555556,14.736111,16.541667,9.194444,12.708333,16.486111,17.652778,15.694444,13.319444,12.444444,14.847222,15.916667,17.527778,12.361111,14.368056,15.732639,16.552083,12.763889,14.847222,12.996528,11.277778,12.940972,14.725694,17.302083,18.708333,16.864583,19.84375,18.645833,17.072917,12.330247,14.212963,15.878086,23.089506,16.007813
4,5619,5619,6.9593,16.215,1.9757,-15.563,-15.645,12.499,4.9979,24.216,16.234,-24.609,-2.591,19.351,30.481,-6.2187,-34.145,0.70801,3.9502,19.154,13.468,-12.717,-16.48,29.652,59.749,-7.7054,-52.917,-22.384,-2.6761,26.963,17.408,-2.289,-7.3967,52.046,66.296,-26.692,-64.155,-45.163,-3.0627,36.93,...,91.898,82.704,76.862,83.818,92.973,96.365,87.442,81.107,69.853,92.223,94.5,93.9,92.399,93.506,94.94,91.718,84.779,77.121,77.993,82.803,90.799,111.13,109.44,104.19,99.06,92.231,89.466,99.144,105.73,102.97,100.31,84.628,84.861,101.45,105.72,122.93,111.64,103.26,104.48,121.95


Shape of train & test data


In [33]:
df_train.shape

(5892, 1092)

In [34]:
df_test.shape

(1473, 1091)

Dropping the ID column in both train & test data

In [35]:
df_train.drop(["Id"],axis=1,inplace=True)

In [36]:
df_test.drop(["Id"],axis=1,inplace=True)

Splitting the data

In [39]:
X=df_train.iloc[:,:1090]
y=df_train.iloc[:,-1]
print(y)
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3,random_state=0)

0       0
1       0
2       0
3       0
4       1
       ..
5887    1
5888    0
5889    1
5890    1
5891    1
Name: attr1089, Length: 5892, dtype: int64


Creating different ML

In [42]:
log_clf_1 = LogisticRegression(random_state=0)
log_clf_2 = LogisticRegression(random_state=42)
decision_clf1 = DecisionTreeClassifier(criterion = 'entropy',random_state=0)
decision_clf2 = DecisionTreeClassifier(criterion = 'entropy', random_state=42)
Model_List=[('Logistic Regression 1', log_clf_1),
           ('Logistic Regression 2', log_clf_2),
           ('Decision Tree 1', decision_clf1),
           ('Decision Tree 2', decision_clf2)]

Using voting classifier with both voting types soft & hard

In [47]:
from sklearn.ensemble import VotingClassifier
from sklearn.metrics import roc_auc_score
voting_clf_hard = VotingClassifier(estimators = Model_List,voting = 'hard')
voting_clf_hard.fit(X_train, y_train)
y_pred = voting_clf_hard.predict(X_test)
hard_voting=roc_auc_score(y_test, y_pred)
print(hard_voting)
voting_clf_soft = VotingClassifier(estimators = Model_List,
                                voting = 'soft')
voting_clf_soft.fit(X_train, y_train)
y_pred = voting_clf_soft.predict(X_test)
soft_voting=roc_auc_score(y_test, y_pred)
print(soft_voting)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


0.8473644003055768


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


0.8700862163047037


Checking the accuracy while using BaggingClassifier

In [48]:
from sklearn.ensemble import BaggingClassifier
bagging_clf = BaggingClassifier(DecisionTreeClassifier(), n_estimators=100, max_samples=100, random_state=0)
bagging_clf.fit(X_train, y_train)
y_pred = bagging_clf.predict(X_test)
baggingclassifier=roc_auc_score(y_test, y_pred)
print(baggingclassifier)

0.8341154643675651


Checking the accuracy while using pasting

In [49]:
pasting_clf = BaggingClassifier(DecisionTreeClassifier(), n_estimators=100, max_samples=100, random_state=0,bootstrap=False)
 
#Fitting the data
pasting_clf.fit(X_train, y_train)
y_pred = pasting_clf.predict(X_test)
pastingclf=roc_auc_score(y_test, y_pred)
print(pastingclf)


0.8391793080868711


Checking the score while using random forest

In [50]:
from sklearn.ensemble import RandomForestClassifier
rf_clf=RandomForestClassifier(n_estimators=100,n_jobs=100,random_state=0, min_samples_leaf=100)
#Fitting on data
rf_clf.fit(X_train, y_train)
y_pred = rf_clf.predict(X_test)
rf_score=roc_auc_score(y_test, y_pred)
print(rf_score)


0.8365382516643022


Checking the score while using grid search

In [57]:
from sklearn.model_selection import GridSearchCV
parameter_grid = {"max_depth": [3, None],
             "max_features": [1, 3, 10],
             "min_samples_split": [2, 3, 10],
             "min_samples_leaf": [1, 3, 10],
             "bootstrap": [True, False],
             "criterion": ["gini", "entropy"]}
clf = RandomForestClassifier()
grid_search = GridSearchCV(estimator=clf,param_grid =parameter_grid )
grid_search.fit(X_train,y_train)
gridsearch_pred=grid_search.predict(X_test)
gs_score=roc_auc_score(y_test, y_pred)
print(gs_score)

Checking the accuracy score using randomizedsearchCV

In [None]:
from sklearn.model_selection import RandomizedSearchCV
 
# Code starts here
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import RandomizedSearchCV
clf = RandomForestClassifier(random_state=0)
parameter_grid = {"max_depth": [3, None],
             "max_features": [1, 3, 10],
             "min_samples_split": [2, 3, 10],
             "min_samples_leaf": [1, 3, 10],
             "bootstrap": [True, False],
             "criterion": ["gini", "entropy"]}
random_search = RandomizedSearchCV(estimator=clf,param_distributions =parameter_grid,n_iter=20,random_state=0)
random_search.fit(X_train,y_train)
rs_predict=random_search.predict(X_test)
rs_score=roc_auc_score(y_test, y_pred)
print(rs_score)


Since ROC the accuracy score for soft voting is more accurate, we are predicting the test data set & submitting the file

In [55]:
y_pred_test = voting_clf_soft.predict(df_test)
print(y_pred_test)
submissions_f = pd.DataFrame(y_pred_test,columns = ['attr1089'])
submissions_f.to_csv('/content/drive/MyDrive/Colab_Notebooks/mars_sample_submission.csv')

[0 0 1 ... 0 0 0]
