## ANN & CNN

### Q.1) Use the Dataset named HR Interview Analytics to predict the Status of the candidate (0 is not accepted and 1 is accepted). Build an Artificial Neural Network to predict the Status

In [1]:
!pip install tensorflow



#### Importing Necessary Libraries

In [2]:
import tensorflow as tf
import pandas as pd

#### Checking the version of tensorflow

In [3]:
tf.__version__

'2.11.0'

#### Importing the dataset and viewing it

In [4]:
hr_interview = pd.read_csv(r"C:\Users\lenovo\Desktop\Data Set\HR Interview Analytics.csv")
hr_interview.head()

Unnamed: 0,SLNO,Candidate.Ref,DOJ.Extended,Duration.to.accept.offer,Notice.period,Offered.band,Pecent.hike.expected.in.CTC,Percent.hike.offered.in.CTC,Percent.difference.CTC,Joining.Bonus,Candidate.relocate.actual,Gender,Candidate.Source,Rex.in.Yrs,LOB,Location,Age,Status
0,1,2110407,Yes,14,30,E2,-20.79,13.16,42.86,No,No,Female,Agency,7,ERS,Noida,34,1
1,2,2112635,No,18,30,E2,50.0,320.0,180.0,No,No,Male,Employee Referral,8,INFRA,Chennai,34,1
2,3,2112838,No,3,45,E2,42.84,42.84,0.0,No,No,Male,Agency,4,INFRA,Noida,27,1
3,4,2115021,No,26,30,E2,42.84,42.84,0.0,No,No,Male,Employee Referral,4,INFRA,Noida,34,1
4,5,2115125,Yes,1,120,E2,42.59,42.59,0.0,No,Yes,Male,Employee Referral,6,INFRA,Noida,34,1


#### Checking for null values in dataset

In [5]:
hr_interview.isnull().sum()

SLNO                           0
Candidate.Ref                  0
DOJ.Extended                   0
Duration.to.accept.offer       0
Notice.period                  0
Offered.band                   0
Pecent.hike.expected.in.CTC    0
Percent.hike.offered.in.CTC    0
Percent.difference.CTC         0
Joining.Bonus                  0
Candidate.relocate.actual      0
Gender                         0
Candidate.Source               0
Rex.in.Yrs                     0
LOB                            0
Location                       0
Age                            0
Status                         0
dtype: int64

#### Collecting the information of dataset

In [6]:
hr_interview.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9011 entries, 0 to 9010
Data columns (total 18 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   SLNO                         9011 non-null   int64  
 1   Candidate.Ref                9011 non-null   int64  
 2   DOJ.Extended                 9011 non-null   object 
 3   Duration.to.accept.offer     9011 non-null   int64  
 4   Notice.period                9011 non-null   int64  
 5   Offered.band                 9011 non-null   object 
 6   Pecent.hike.expected.in.CTC  9011 non-null   float64
 7   Percent.hike.offered.in.CTC  9011 non-null   float64
 8   Percent.difference.CTC       9011 non-null   float64
 9   Joining.Bonus                9011 non-null   object 
 10  Candidate.relocate.actual    9011 non-null   object 
 11  Gender                       9011 non-null   object 
 12  Candidate.Source             9011 non-null   object 
 13  Rex.in.Yrs        

#### Creating dummy variables for all catagorical columns

In [7]:
## Lets create the dummy variables for all the categorical columns
cat_col = ['DOJ.Extended','Offered.band','Joining.Bonus','Candidate.relocate.actual','Gender','Candidate.Source','LOB','Location']
hr_interview_final = pd.get_dummies(hr_interview,columns=cat_col,drop_first=True)
hr_interview_final.head()

Unnamed: 0,SLNO,Candidate.Ref,Duration.to.accept.offer,Notice.period,Pecent.hike.expected.in.CTC,Percent.hike.offered.in.CTC,Percent.difference.CTC,Rex.in.Yrs,Age,Status,...,Location_Bangalore,Location_Chennai,Location_Cochin,Location_Gurgaon,Location_Hyderabad,Location_Kolkata,Location_Mumbai,Location_Noida,Location_Others,Location_Pune
0,1,2110407,14,30,-20.79,13.16,42.86,7,34,1,...,0,0,0,0,0,0,0,1,0,0
1,2,2112635,18,30,50.0,320.0,180.0,8,34,1,...,0,1,0,0,0,0,0,0,0,0
2,3,2112838,3,45,42.84,42.84,0.0,4,27,1,...,0,0,0,0,0,0,0,1,0,0
3,4,2115021,26,30,42.84,42.84,0.0,4,34,1,...,0,0,0,0,0,0,0,1,0,0
4,5,2115125,1,120,42.59,42.59,0.0,6,34,1,...,0,0,0,0,0,0,0,1,0,0


#### Checking for unique values in target column

In [8]:
hr_interview_final['Status'].nunique()

2

#### Removing Unwanted Columns

In [9]:
# We will remove SLNO and Candidate.Ref for the model building step. As it will be unique for every Candidate
hr_interview_final1 = hr_interview_final.drop(columns=['SLNO','Candidate.Ref'])

#### Checking the shape of Dataset

In [10]:
hr_interview_final.shape

(9011, 37)

In [11]:
hr_interview_final1.shape

(9011, 35)

In [12]:
# Identifying target variable and inputs
y = hr_interview_final1[['Status']]
x = hr_interview_final1.drop(columns=['Status'])

#### Splitting the dataset and Importing library

In [13]:
## Importing the library
from sklearn.model_selection import train_test_split

In [14]:
## Splitting the data into Train and Test
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.2,random_state=42)

In [15]:
## Checking the length of the X and Y train
len(x_train),len(x_test),len(y_train),len(y_test)

(7208, 1803, 7208, 1803)

### Building the Model
### Step 1: Initialising the model and creating the input layer

In [16]:
ann = tf.keras.models.Sequential()

### Step 2: Creating the first hidden layer

In [17]:
## Considering hidden unit layers as 6
ann.add(tf.keras.layers.Dense(units=6,activation='relu'))

### Step 3: Creating the second hidden layer

In [18]:
## Considering hidden unit layers as 6
ann.add(tf.keras.layers.Dense(units=6,activation='relu'))

### Step 4: Creating the output layer

In [19]:
#The number of units in output column should be n-1, where n is the number of categories in output column
ann.add(tf.keras.layers.Dense(units=1,activation='sigmoid'))

### Step 5: Compiling the model

In [20]:
ann.compile(optimizer='adam',loss='binary_crossentropy',metrics='accuracy')

### Step 6: Fitting the data

In [21]:
model = ann.fit(x_train,y_train,validation_split=0.2,batch_size=32,epochs=100)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100


Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78/100
Epoch 79/100
Epoch 80/100
Epoch 81/100
Epoch 82/100
Epoch 83/100
Epoch 84/100
Epoch 85/100
Epoch 86/100
Epoch 87/100
Epoch 88/100
Epoch 89/100
Epoch 90/100
Epoch 91/100
Epoch 92/100
Epoch 93/100
Epoch 94/100
Epoch 95/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100


### As their is no much difference inTraining Accuracy & Validation Accuracy it can be said that their is no overfitting condition in data
### Prediction on test data

In [22]:
y_test['Prediction'] = ann.predict(x_test)



In [23]:
y_test

Unnamed: 0,Status,Prediction
93,0,0.962819
7272,1,0.938603
447,1,0.898908
6519,0,0.649237
3898,0,0.496188
...,...,...
8671,1,0.768776
111,1,0.659915
8742,1,0.999466
7596,1,0.907553


In [24]:
y_test['Prediction'] = y_test["Prediction"]>0.5
y_test.head()

Unnamed: 0,Status,Prediction
93,0,True
7272,1,True
447,1,True
6519,0,True
3898,0,False


In [25]:
y_test['Prediction'] = y_test['Prediction'].astype('int')
y_test.head()

Unnamed: 0,Status,Prediction
93,0,1
7272,1,1
447,1,1
6519,0,1
3898,0,0


#### Importing the library

In [26]:
from sklearn.metrics import accuracy_score,confusion_matrix

#### Printing the Accuracy Score

In [27]:
print(accuracy_score(y_test['Status'],y_test['Prediction']))

0.8236272878535774


#### Printing the Confusion Matrix

In [28]:
print(confusion_matrix(y_test['Status'],y_test['Prediction']))

[[  50  287]
 [  31 1435]]


                          - - - - - - - - X X X X X X X X - - - - - - - -

### Q. 2) Use the below given url to build a CNN model to predict whether the given image is dog or cat. You can use Kaggle notebooks to build the model. Once the model is build, take a random image of a cat or dog from the test data and predict it using the model.
( https://www.kaggle.com/tongpython/cat-and-dog )

#### Importing the necessary libraries

In [30]:
import tensorflow as tf
from keras.preprocessing.image import ImageDataGenerator

In [31]:
train_data_gen = ImageDataGenerator(rescale=1/255)
training_data = train_data_gen.flow_from_directory(r"C:\Users\lenovo\Desktop\training_set\training_set",
                                                   target_size=(64,64),batch_size=32,class_mode='binary')

Found 8005 images belonging to 2 classes.


In [32]:
test_data_gen = ImageDataGenerator(rescale=1/255)
test_data = test_data_gen.flow_from_directory(r"C:\Users\lenovo\Desktop\test_set\test_set",
                                                   target_size=(64,64),batch_size=32,class_mode='binary')

Found 2023 images belonging to 2 classes.


### Convolutional Neural Network ( CNN )

### Step 1: Initialising the model and creating the input layer

In [33]:
cnn = tf.keras.models.Sequential()

### Step 2: Creating the first hidden layer

In [34]:
## Convolution
cnn.add(tf.keras.layers.Conv2D(filters=32,kernel_size=3,activation='relu'))

In [35]:
## Pooling
cnn.add(tf.keras.layers.MaxPool2D(pool_size=2,strides=2))

### Step 3: Creating the second hidden layer

In [36]:
## Convolution
cnn.add(tf.keras.layers.Conv2D(filters=32,kernel_size=3,activation='relu'))

## Pooling
cnn.add(tf.keras.layers.MaxPool2D(pool_size=2,strides=2))

In [37]:
## Flattening
cnn.add(tf.keras.layers.Flatten())

In [38]:
## Full connection
cnn.add(tf.keras.layers.Dense(units=128,activation='relu'))

### Step 4: Creating the output layer

In [39]:
## The number of units in output column should be n-1, where n is the number of categories in output column
cnn.add(tf.keras.layers.Dense(units=1,activation='sigmoid'))

### Step 5: Compiling the model

In [40]:
cnn.compile(optimizer='adam',loss='binary_crossentropy',metrics='accuracy')

### Step 6: Fitting the data

In [41]:
cnn.fit(x=training_data,validation_data=test_data,epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x1cfb164cb80>

### Checking for proper output, loading image ( random ), Converting image to array & Expanding the image

#### Importing the libraries and validating image

In [42]:
## You can use or give path of any image from test dataset to classify weather it may be of cat or dog
import numpy as np
from keras.preprocessing import image
val_image = tf.keras.utils.load_img(r"C:\Users\lenovo\Desktop\test_set\test_set\cats\cat.4041.jpg",target_size=(64,64))
val_image = tf.keras.utils.img_to_array(val_image)
val_image = np.expand_dims(val_image,axis=0)

#### Checking for result

In [43]:
result = cnn.predict(val_image)
result



array([[0.]], dtype=float32)

#### Checking for no. 1 in array stands for

In [44]:
training_data.class_indices

{'cats': 0, 'dogs': 1}

                      - - - - - - - - X X X X X X X X - - - - - - - -

### Q. 3) Use the dataset named video_game_sales to predict the global sales, using artificial neural network. Use appropriate data preprocessing techniques to clean the data.

#### Importing necessary libraries

In [45]:
import tensorflow as tf
import pandas as pd

#### Importing the dataset & checking it

In [46]:
video_games_sales = pd.read_csv(r"C:\Users\lenovo\Desktop\Data Set\video_games_sales.csv")
video_games_sales.head()

Unnamed: 0,Name,Platform,Year_of_Release,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales,Critic_Score,Critic_Count,User_Score,User_Count,Developer,Rating
0,Wii Sports,Wii,2006.0,Sports,Nintendo,41.36,28.96,3.77,8.45,82.53,76.0,51.0,8.0,322.0,Nintendo,E
1,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24,,,,,,
2,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.68,12.76,3.79,3.29,35.52,82.0,73.0,8.3,709.0,Nintendo,E
3,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.61,10.93,3.28,2.95,32.77,80.0,73.0,8.0,192.0,Nintendo,E
4,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.0,31.37,,,,,,


#### Checking the shape and Null values in dataset

In [47]:
## Checking Shape
video_games_sales.shape

(16719, 16)

In [48]:
## Checking Null Values
video_games_sales.isnull().sum()

Name                  2
Platform              0
Year_of_Release     269
Genre                 2
Publisher            54
NA_Sales              0
EU_Sales              0
JP_Sales              0
Other_Sales           0
Global_Sales          0
Critic_Score       8582
Critic_Count       8582
User_Score         6704
User_Count         9129
Developer          6623
Rating             6769
dtype: int64

#### Note :- Here we are using the LableEncoder as a data cleaning or PreProcessing Technique as the Columns ( Critic_Score, Critic_Count, User_Score, User_Count, Developer, Rating ) have more than 30% of missing values so we are gonna drop those columns and will use lableEncoder for other columns containing the missing values

#### Importing LableEncoder and Calling it

In [49]:
from sklearn.preprocessing import LabelEncoder

In [50]:
le = LabelEncoder()

#### Filling the missing values in dataset using LableEncoder

In [51]:
video_games_sales['Genre'] = le.fit_transform(video_games_sales['Genre'])
video_games_sales['Publisher'] = le.fit_transform(video_games_sales['Publisher'])
video_games_sales['Platform'] = le.fit_transform(video_games_sales['Platform'])

#### Checking the DataSet

In [52]:
video_games_sales.head()

Unnamed: 0,Name,Platform,Year_of_Release,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales,Critic_Score,Critic_Count,User_Score,User_Count,Developer,Rating
0,Wii Sports,26,2006.0,10,361,41.36,28.96,3.77,8.45,82.53,76.0,51.0,8.0,322.0,Nintendo,E
1,Super Mario Bros.,11,1985.0,4,361,29.08,3.58,6.81,0.77,40.24,,,,,,
2,Mario Kart Wii,26,2008.0,6,361,15.68,12.76,3.79,3.29,35.52,82.0,73.0,8.3,709.0,Nintendo,E
3,Wii Sports Resort,26,2009.0,10,361,15.61,10.93,3.28,2.95,32.77,80.0,73.0,8.0,192.0,Nintendo,E
4,Pokemon Red/Pokemon Blue,5,1996.0,7,361,11.27,8.89,10.22,1.0,31.37,,,,,,


#### Identifying target variable and inputs

In [53]:
y = video_games_sales[['Global_Sales']]
x = video_games_sales[['Platform','Genre','Publisher','NA_Sales','EU_Sales','JP_Sales','Other_Sales']]

#### Importing the Library, Splitting the data & checking length of DataSet

In [54]:
## Importing the library
from sklearn.model_selection import train_test_split

In [55]:
## Splityting the data into Test and Train
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.2,random_state=42)

In [56]:
## Checking the length of X,Y train and test
len(x_train),len(x_test),len(y_train),len(y_test)

(13375, 3344, 13375, 3344)

### Artificial Neural Network ( ANN )

### Step 1: Initialising the model and creating the input layer

In [57]:
ann = tf.keras.models.Sequential()

### Step 2: Creating the first hidden layer

In [58]:
## Considering hidden unit layers as 6
ann.add(tf.keras.layers.Dense(units=6,activation='relu'))

### Step 3: Creating the second hidden layer

In [59]:
## Considering hidden unit layers as 6
ann.add(tf.keras.layers.Dense(units=6,activation='relu'))

### Step 4: Creating the output layer

In [60]:
#The number of units in output column should be n-1, where n is the number of categories in output column
ann.add(tf.keras.layers.Dense(units=1,activation='linear'))

#### Checking for unique value in target Column

In [61]:
video_games_sales['Global_Sales'].nunique()

629

### Step 5: Compiling the model

In [62]:
ann.compile(optimizer='adam',loss='MSE',metrics='MAE')

### Note :- As the problem is of Regression Model loss and Netrice function MSE & MAE i.e Mean Square Error & Mean Absolute error is used

### Step 6: Fitting the data

In [63]:
model = ann.fit(x_train,y_train,validation_split=0.2,batch_size=32,epochs=100)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100


Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78/100
Epoch 79/100
Epoch 80/100
Epoch 81/100
Epoch 82/100
Epoch 83/100
Epoch 84/100
Epoch 85/100
Epoch 86/100
Epoch 87/100
Epoch 88/100
Epoch 89/100
Epoch 90/100
Epoch 91/100
Epoch 92/100
Epoch 93/100
Epoch 94/100
Epoch 95/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100


### Prediction on Test Data

In [64]:
y_test['Prediction'] = ann.predict(x_test)



In [65]:
y_test

Unnamed: 0,Global_Sales,Prediction
6991,0.23,0.260938
16195,0.01,0.021510
9862,0.12,0.155501
11152,0.09,0.117765
8642,0.16,0.163074
...,...,...
2452,0.84,0.848212
1237,1.52,1.535399
11784,0.07,0.099101
16325,0.01,0.031394


### Note :- As the problem is of regression the Accuracy cannot be predicted. It Can be calculated on the basic of MAE & val_MAE

                            - - - - - - - - X X X X X X X X - - - - - - - -