# Ionosphere Data Problem:
    
- Radar data:
    - phased array of 16 high-frequency antennas
    - with a total transmitted power on the order of 6.4 kilowatts   
    - received signals were processed using an autocorrelation function whose arguments are the time of a pulse and the pulse number. 
    - There were 17 pulse numbers for the Goose Bay system. 
    - Instances in this databse are described by 2 attributes per pulse number, corresponding to the complex values returned by the function resulting from the complex electromagnetic signal.
    
- Collected by a system in Goose Bay, Labrador. 
- The targets were free electrons in the ionosphere. 

- Binary Classification: 
    - 'g': "Good" radar returns are those showing evidence of some type of structure in the ionosphere. 
    - 'b': "Bad" returns are those that do not; their signals pass through the ionosphere.
- Attribute Information:
    - All 34 are continuous
    - The 35th attribute is either "good" or "bad" according to the definition summarized above. 
- Characteristics:
    - Multivariate data
    - No. of instances: 351
    - attribute: real/integer, no. of attributes: 34
    - missing values: na
   

## Step 1: Load Data

In [1]:
#import libraries:
import keras
import numpy as np
import pandas as pd

In [85]:
df = pd.read_csv('ionosphere_data.csv')
df.head()

Unnamed: 0,feature1,feature2,feature3,feature4,feature5,feature6,feature7,feature8,feature9,feature10,...,feature26,feature27,feature28,feature29,feature30,feature31,feature32,feature33,feature34,label
0,1,0,0.99539,-0.05889,0.85243,0.02306,0.83398,-0.37708,1.0,0.0376,...,-0.51171,0.41078,-0.46168,0.21266,-0.3409,0.42267,-0.54487,0.18641,-0.453,g
1,1,0,1.0,-0.18829,0.93035,-0.36156,-0.10868,-0.93597,1.0,-0.04549,...,-0.26569,-0.20468,-0.18401,-0.1904,-0.11593,-0.16626,-0.06288,-0.13738,-0.02447,b
2,1,0,1.0,-0.03365,1.0,0.00485,1.0,-0.12062,0.88965,0.01198,...,-0.4022,0.58984,-0.22145,0.431,-0.17365,0.60436,-0.2418,0.56045,-0.38238,g
3,1,0,1.0,-0.45161,1.0,1.0,0.71216,-1.0,0.0,0.0,...,0.90695,0.51613,1.0,1.0,-0.20099,0.25682,1.0,-0.32382,1.0,b
4,1,0,1.0,-0.02401,0.9414,0.06531,0.92106,-0.23255,0.77152,-0.16399,...,-0.65158,0.1329,-0.53206,0.02431,-0.62197,-0.05707,-0.59573,-0.04608,-0.65697,g


In [86]:
#feature1-34, label: g or b
#feature1 & 2 seems constant

df.shape

(351, 35)

In [87]:
df.dtypes

feature1       int64
feature2       int64
feature3     float64
feature4     float64
feature5     float64
feature6     float64
feature7     float64
feature8     float64
feature9     float64
feature10    float64
feature11    float64
feature12    float64
feature13    float64
feature14    float64
feature15    float64
feature16    float64
feature17    float64
feature18    float64
feature19    float64
feature20    float64
feature21    float64
feature22    float64
feature23    float64
feature24    float64
feature25    float64
feature26    float64
feature27    float64
feature28    float64
feature29    float64
feature30    float64
feature31    float64
feature32    float64
feature33    float64
feature34    float64
label         object
dtype: object

In [None]:
#label: object -> encode 
#feature 1 &2: int64, feature 3-34: float64


## Step 2: Check Missing Values (fill with mean if any), drop duplicates or any usless column

In [88]:
df.isnull().sum()   #although already stated that no missing value

feature1     0
feature2     0
feature3     0
feature4     0
feature5     0
feature6     0
feature7     0
feature8     0
feature9     0
feature10    0
feature11    0
feature12    0
feature13    0
feature14    0
feature15    0
feature16    0
feature17    0
feature18    0
feature19    0
feature20    0
feature21    0
feature22    0
feature23    0
feature24    0
feature25    0
feature26    0
feature27    0
feature28    0
feature29    0
feature30    0
feature31    0
feature32    0
feature33    0
feature34    0
label        0
dtype: int64

In [89]:
#checking for duplicates
df.duplicated().sum()

1

In [90]:
df.drop_duplicates(inplace=True)
df.shape

(350, 35)

In [91]:
df.duplicated().sum()

0

## Step 3: Standardized the Input Variables. 

In [92]:
#standarizing all features but not the label  

from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
df.iloc[:, 2:-1] = scaler.fit_transform(df.iloc[:, 2:-1])  
df.head()

Unnamed: 0,feature1,feature2,feature3,feature4,feature5,feature6,feature7,feature8,feature9,feature10,...,feature26,feature27,feature28,feature29,feature30,feature31,feature32,feature33,feature34,label
0,1,0,0.997695,0.470555,0.926215,0.51153,0.91699,0.31146,1.0,0.5188,...,0.244145,0.70539,0.26916,0.60633,0.32955,0.711335,0.227565,0.593205,0.2735,g
1,1,0,1.0,0.405855,0.965175,0.31922,0.44566,0.032015,1.0,0.477255,...,0.367155,0.39766,0.407995,0.4048,0.442035,0.41687,0.46856,0.43131,0.487765,b
2,1,0,1.0,0.483175,1.0,0.502425,1.0,0.43969,0.944825,0.50599,...,0.2989,0.79492,0.389275,0.7155,0.413175,0.80218,0.3791,0.780225,0.30881,g
3,1,0,1.0,0.274195,1.0,1.0,0.85608,0.0,0.5,0.5,...,0.953475,0.758065,1.0,1.0,0.399505,0.62841,1.0,0.33809,1.0,b
4,1,0,1.0,0.487995,0.9707,0.532655,0.96053,0.383725,0.88576,0.418005,...,0.17421,0.56645,0.23397,0.512155,0.189015,0.471465,0.202135,0.47696,0.171515,g


In [93]:
#df['feature2'].sum()   #all values are zero, so dropping it 

#df['feature1'].sum()  #sum  was 313 out of 350 values, so some values are not 1 but 0

In [94]:
df.drop(columns = ['feature2'], inplace=True)
df.shape

(350, 34)

In [99]:
#Encoding label: simply mapped g with 1, b with 0
df.iloc[:,-1] = df['label'].map({'g':1, 'b':0})
df['label']

0      1
1      0
2      1
3      0
4      1
      ..
346    1
347    1
348    1
349    1
350    1
Name: label, Length: 350, dtype: int64

## Step 4: Shuffle the data if needed. Split into 60 and 40 ratio.

In [102]:
df2 = df.sample(frac=1, random_state=0)  #shuffle and return all rows

#350: 60% : 210 and 30%: 140

train_X = df2.iloc[0:210,:-1].values
train_y = df2.iloc[0:210,-1].values

test_X = df2.iloc[210:,:-1].values
test_y = df2.iloc[210:,-1].values


In [103]:
print(train_X.shape)
print(train_y.shape)

print(test_X.shape)
print(test_y.shape)

(210, 33)
(210,)
(140, 33)
(140,)


## Step 5: Encode labels.

In [59]:
#already encoded the output label.

## Step 6: Build Model (1 hidden layers including 16 unit)

In [115]:
from keras import models, layers

model = models.Sequential()
model.add(layers.Dense(32, activation='relu', input_shape=(33,)))  #input layer
model.add(layers.Dense(16, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))  #Output layer


In [116]:
model.summary()

Model: "sequential_8"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_24 (Dense)             (None, 32)                1088      
_________________________________________________________________
dense_25 (Dense)             (None, 16)                528       
_________________________________________________________________
dense_26 (Dense)             (None, 1)                 17        
Total params: 1,633
Trainable params: 1,633
Non-trainable params: 0
_________________________________________________________________


## Step 7: Compilation Step 

In [117]:
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])

## Step 8: Train the Model with Epochs (100).

In [118]:
model.fit(train_X, train_y, batch_size=10, epochs=100)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<tensorflow.python.keras.callbacks.History at 0x2588af215c8>

## Step 9: Evaluation Step
- (Prediction should be > 92%)

In [120]:
myscore = model.evaluate(test_X, test_y, batch_size=15)
print('The accuracy score is ', myscore[1] * 100, '%')

The accuracy score is  96.42857313156128 %


In [None]:
#Step 10: If the model gets overfit tune your model by changing:
    #- the units , No. of layers , epochs , add dropout layer or add Regularizer according to the need .
#accuracy decreases if i start with units > 32        