Importing the Libraries.

PANDAS :Pandas provides us with some powerful objects like DataFrames and Series which are very useful for working with and analyzing data

NUMPY :numpy library which provides objects for multi-dimensional arrays .

TENSORFLOW :It is an open source artificial intelligence library, using data flow graphs to build models. It allows developers to create large-scale neural networks with many layers. TensorFlow is mainly used for: Classification, Perception, Understanding, Discovering, Prediction and Creation.

KERAS :Keras is a neural network library . It provides only high-level APIs

In [0]:
import pandas as pd
import numpy as np
import tensorflow as tf
import keras

Read Dataset


In [0]:
x_train =pd.read_csv("Train.csv" )
x_test =pd.read_csv("Test.csv" )

ILOC : iloc returns a Pandas Series when one row is selected, and a Pandas DataFrame when multiple rows are selected, or if any column in full is selected.

In [0]:
X_train =x_train.iloc[:,2:-1]
y_train = x_train.iloc[:,-1]
X_test = x_test.iloc[:,2:]

SHAPE : shape is a tuple that gives you an indication of the number of dimensions in the array. So in your case, since the index value of Y. shape[0] is 0

In [0]:
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)

(23856, 15)
(15903, 15)
(23856,)


Checking for the missing value in dataset .
ISNA() : Pandas dataframe. isna() function is used to detect missing values. It return a boolean same-sized object indicating if the values are NA. NA values, such as None or numpy. NaN, gets mapped to True values.

In [0]:
missing =X_train.isna().sum(axis=0).sort_values(ascending=False)
missing_value_columns = missing[missing>0]
print('They are %s columns with missing values : \n%s ' %(missing_value_columns.count() , [(index , value) for (index , value) in missing_value_columns.iteritems()]))

missing =X_test.isna().sum(axis=0).sort_values(ascending=False)
missing_value_columns = missing[missing>0]
print('They are %s columns with missing values : \n%s ' %(missing_value_columns.count() , [(index , value) for (index , value) in missing_value_columns.iteritems()]))


They are 1 columns with missing values : 
[('X_12', 182)] 
They are 1 columns with missing values : 
[('X_12', 127)] 


Dealing with the missing value .
FILLNA() : fillna fills the NaN values with a given number with which you want to substitute. It gives you an option to fill according to the index of rows of a pd. DataFrame or on the name of the columns in the form of a python dict .

In [0]:
def impute_value(X):
    dataset =X
    dataset['X_12'].fillna(dataset['X_12'].median() , inplace = True)
    return dataset

X_train = impute_value(X_train)
X_test = impute_value(X_test)

Cross checking if any missing value is present .

In [0]:
missing =X_train.isna().sum(axis=0).sort_values(ascending=False)
missing_value_columns = missing[missing>0]
print('They are %s columns with missing values : \n%s ' %(missing_value_columns.count() , [(index , value) for (index , value) in missing_value_columns.iteritems()]))

missing =X_test.isna().sum(axis=0).sort_values(ascending=False)
missing_value_columns = missing[missing>0]
print('They are %s columns with missing values : \n%s ' %(missing_value_columns.count() , [(index , value) for (index , value) in missing_value_columns.iteritems()]))


They are 0 columns with missing values : 
[] 
They are 0 columns with missing values : 
[] 


Checking for the categorical value present in dataset .

In [0]:
object = list(X_train.select_dtypes(include=[np.object]))
print('Here are the %s object variables : \n %s' %(len(object) , object))

object = list(X_test.select_dtypes(include=[np.object]))
print('Here are the %s object variables : \n %s' %(len(object) , object))

Here are the 0 object variables : 
 []
Here are the 0 object variables : 
 []


Converting into numpy array .


In [0]:
X_train =X_train.values
# print(X_train.dtypes) :: to checkkk if the X_train is converted into numpy array or not [if error occur it is , if not it's not]
X_test =X_test.values
y_train=y_train.values
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)

(23856, 15)
(15903, 15)
(23856,)


Feature Scaling .
Feature Scaling is a technique to standardize the independent features present in the data in a fixed range. It is performed during the data pre-processing to handle highly varying magnitudes or values or units.

In [0]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()

X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

Sequential model : A Sequential model is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor.

In [0]:
ann = tf.keras.models.Sequential()

1st Layer [hidden layer]
with activation function : Rectified Linear Unit (ReLU) 
it is defined as y = max(0, x).
and 6 Units .

In [0]:
ann.add(tf.keras.layers.Dense(units=6 ,activation = 'relu'))

2nd Layer [hidden layer] 
with activation function : Rectified Linear Unit (ReLU) 
it is defined as y = max(0, x). 
and 6 Units .

In [0]:
ann.add(tf.keras.layers.Dense(units=6 ,activation = 'relu'))

3rd Layer [output layer] 
with activation function :  sigmoid function
we use sigmoid function is because it exists between (0 to 1). Therefore, it is especially used for models where we have to predict the probability as an output. Since probability of anything exists only between the range of 0 and 1, sigmoid is the right choice.
and 1 Unit .

In [0]:
ann.add(tf.keras.layers.Dense(units=1 ,activation = 'sigmoid'))

Model Compile
METRICS : A metric is a function that is used to judge the performance of your model. Metric functions are similar to loss functions, except that the results from evaluating a metric are not used when training the model.

OPTIMIZER :Optimizers are algorithms or methods used to change the attributes of your neural network such as weights and learning rate in order to reduce the losses. Optimizers help to get results faster.

In [0]:
ann.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

Fitting ;
Fit function adjusts weights according to data values so that better accuracy can be achieved. After training, the model can be used for predictions .

In [0]:
ann.fit(X_train, y_train, batch_size = 100, epochs = 100)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<tensorflow.python.keras.callbacks.History at 0x2679399c588>

Prediction 
predict() : given a trained model, predict the label of a new set of data. This method accepts one argument, the new data X_new (e.g. model. predict(X_new) ), and returns the learned label for each object in the array.

In [0]:
y_pred = ann.predict(X_test)
y_pred = (y_pred > 0.9)
# print(y_pred.reshape(len(y_pred) , 1))
print(y_pred)

[[ True]
 [ True]
 [ True]
 ...
 [ True]
 [ True]
 [ True]]


Converting boolean array :
true = 1
false = 0

In [0]:
y_pred = y_pred*1

In [0]:
y_pred

array([[1],
       [1],
       [1],
       ...,
       [1],
       [1],
       [1]])

In [0]:
y_pred.shape

(15903, 1)

Reshaping the Y_pred 
The reshape() function is used to give a new shape to an array without changing its data.

In [0]:
y_pred = y_pred.reshape(-1)

Creating Submission File to submit my prediction 

In [0]:
my_submission = pd.DataFrame({'INCIDENT_ID' : x_test.iloc[: ,0].values , 'MULTIPLE_OFFENSE' : y_pred})
my_submission.to_csv('Prediction.csv' , index=False)
