<a href="https://colab.research.google.com/github/delhiiitian/CrossSell/blob/main/feature_columns_(1).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Import TensorFlow and other libraries

In [1]:
import numpy as np
import pandas as pd

import tensorflow as tf

from tensorflow import feature_column
from tensorflow.keras import layers
from sklearn.model_selection import train_test_split


## Use Pandas to create a dataframe

[Pandas](https://pandas.pydata.org/) is a Python library with many helpful utilities for loading and working with structured data. We will use Pandas to download the dataset from a URL, and load it into a dataframe.

In [2]:
dataframe = pd.read_csv('train.csv')
testframe = pd.read_csv('test.csv')

## Create target variable

The task in the original dataset is to predict the speed at which a pet will be adopted (e.g., in the first week, the first month, the first three months, and so on). Let's simplify this for our tutorial. Here, we will transform this into a binary classification problem, and simply predict whether the pet was adopted, or not.

After modifying the label column, 0 will indicate the pet was not adopted, and 1 will indicate it was.

In [3]:
target1 = dataframe['gill-attachment']
target2 = dataframe['ring-type']
target3 = dataframe['season']


# Drop un-used columns.
dataframe_new = dataframe.drop(columns=['gill-attachment','ring-type','season'])

# In the original dataset "4" indicates the pet was not adopted.
dataframe1 = dataframe_new.copy()
dataframe2 = dataframe_new.copy()


# ga = feature_column.categorical_column_with_vocabulary_list(
#       'target', dataframe1['target'].unique())
# ga_emb = feature_column.embedding_column(['target'], dimension=7)

dataframe1['target'] = target1
dataframe2['target'] = target2

## Split the dataframe into train, validation, and test

The dataset we downloaded was a single CSV file. We will split this into train, validation, and test sets.

In [4]:
train1, test1 = dataframe1[dataframe1['target'].notnull()], dataframe1[dataframe1['target'].isnull()]
# train, test = train_test_split(train1, test_size=0.2)
# train, val = train_test_split(train, test_size=0.2)
# print(len(train), 'train examples')
# print(len(val), 'validation examples')
# print(len(test), 'test examples')

## Create an input pipeline using tf.data

Next, we will wrap the dataframes with [tf.data](https://www.tensorflow.org/guide/datasets). This will enable us  to use feature columns as a bridge to map from the columns in the Pandas dataframe to features used to train the model. If we were working with a very large CSV file (so large that it does not fit into memory), we would use tf.data to read it from disk directly. That is not covered in this tutorial.

In [5]:
# # A utility method to create a tf.data dataset from a Pandas Dataframe
# def df_to_dataset(dataframe, shuffle=True, batch_size=32):
#   dataframe = dataframe.copy()
#   labels = dataframe.pop('target')
#   ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels))
#   if shuffle:
#     ds = ds.shuffle(buffer_size=len(dataframe))
#   ds = ds.batch(batch_size)
#   return ds
# batch_size = 32 # A small batch sized is used for demonstration purposes
# train_ds = df_to_dataset(train, batch_size=batch_size)
# val_ds = df_to_dataset(val, shuffle=False, batch_size=batch_size)
# test_ds = df_to_dataset(test, shuffle=False, batch_size=batch_size)
# train_ds
# train.shape

In [6]:
# !pip install --upgrade keras

In [7]:
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
# from keras.optimizers import SGD

model = Sequential()
# Dense(64) is a fully-connected layer with 64 hidden units.
# in the first layer, you must specify the expected input data shape:
# here, 20-dimensional vectors.
model.add(Dense(64, input_dim=53))
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Dense(64))
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Dense(7))
model.add(Activation('softmax'))

# sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])



In [8]:
train = pd.get_dummies(train1,columns=['edible-poisonous', 'cap-shape', 'cap-color',
       'does-bruise-or-bleed', 'gill-color',
       'stem-color', 'has-ring', 'habitat'],drop_first=True)
test = pd.get_dummies(test1,columns=['edible-poisonous', 'cap-shape', 'cap-color',
       'does-bruise-or-bleed', 'gill-color',
       'stem-color', 'has-ring', 'habitat'],drop_first=True)
# val = pd.get_dummies(val,columns=['edible-poisonous', 'cap-shape', 'cap-color',
#        'does-bruise-or-bleed', 'gill-color',
#        'stem-color', 'has-ring', 'habitat'],drop_first=True)

In [9]:
X_train,y_train = train.loc[:, train.columns != 'target'],train['target']
X_test = test.loc[:, test.columns != 'target']
# X_val,y_val = val.loc[:, val.columns != 'target'],val['target']

In [10]:
# X_train = np.asarray(X_train).astype(np.float32)
# X_test = np.asarray(X_test).astype(np.float32)

In [11]:
y_train = pd.get_dummies(y_train)

In [12]:
# from sklearn.preprocessing import LabelEncoder

# label_encoder = LabelEncoder()

# label_encoder.fit(y_train)
# y_train = label_encoder.transform(y_train)
# # y_test = label_encoder.transform(y_test)

In [13]:
y_train

Unnamed: 0,a,d,e,f,p,s,x
0,0,0,0,0,0,1,0
1,0,0,0,1,0,0,0
2,0,0,0,0,0,0,1
3,1,0,0,0,0,0,0
4,1,0,0,0,0,0,0
...,...,...,...,...,...,...,...
42742,0,0,1,0,0,0,0
42743,1,0,0,0,0,0,0
42744,0,0,0,0,0,0,1
42745,1,0,0,0,0,0,0


In [14]:
model.fit(X_train, y_train, epochs=20)
score = model.evaluate(X_train, y_train, batch_size=16)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [15]:
gilladd = dataframe[['edible-poisonous', 'cap-diameter', 'cap-shape', 'cap-color',
       'does-bruise-or-bleed', 'gill-color', 'stem-height',
       'stem-width', 'stem-color', 'has-ring', 'habitat']]

gilladd = pd.get_dummies(gilladd,drop_first=True)

dataframe['gill_attachment'] = np.argmax(model.predict(gilladd),axis=1)

mapping_dictionary = {0:"a",1:"d",2:"e",3:"f",4:"p",5:"s",6:"x"}
dataframe = dataframe.replace({"gill_attachment":mapping_dictionary})
# y_test = y_test.replace({"season":mapping_dictionary})

dataframe['gill-attachment'] = dataframe.apply(lambda x: x['gill_attachment'] if pd.isna(x['gill-attachment']) else x['gill-attachment'],axis=1)

dataframe.drop('gill_attachment',axis=1,inplace=True)


gilladdtest = testframe[['edible-poisonous', 'cap-diameter', 'cap-shape', 'cap-color',
       'does-bruise-or-bleed', 'gill-color', 'stem-height',
       'stem-width', 'stem-color', 'has-ring', 'habitat']]

gilladdtest = pd.get_dummies(gilladdtest,drop_first=True)

testframe['gill_attachment'] = np.argmax(model.predict(gilladdtest),axis=1)

mapping_dictionary = {0:"a",1:"d",2:"e",3:"f",4:"p",5:"s",6:"x"}
testframe = testframe.replace({"gill_attachment":mapping_dictionary})
# y_test = y_test.replace({"season":mapping_dictionary})

testframe['gill-attachment'] = testframe.apply(lambda x: x['gill_attachment'] if pd.isna(x['gill-attachment']) else x['gill-attachment'],axis=1)

testframe.drop('gill_attachment',axis=1,inplace=True)

In [16]:
target2 = dataframe['ring-type']
target3 = dataframe['season']


# Drop un-used columns.
dataframe_new = dataframe.drop(columns=['ring-type','season'])

# In the original dataset "4" indicates the pet was not adopted.
dataframe1 = dataframe_new.copy()


# ga = feature_column.categorical_column_with_vocabulary_list(
#       'target', dataframe1['target'].unique())
# ga_emb = feature_column.embedding_column(['target'], dimension=7)

dataframe1['target'] = target2
train1, test1 = dataframe1[dataframe1['target'].notnull()], dataframe1[dataframe1['target'].isnull()]

# from keras.optimizers import SGD

model = Sequential()
# Dense(64) is a fully-connected layer with 64 hidden units.
# in the first layer, you must specify the expected input data shape:
# here, 20-dimensional vectors.
model.add(Dense(64, input_dim=59))
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Dense(64))
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Dense(8))
model.add(Activation('softmax'))

# sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

train = pd.get_dummies(train1,columns=['edible-poisonous', 'cap-shape', 'cap-color',
       'does-bruise-or-bleed', 'gill-color',
       'stem-color', 'has-ring', 'habitat','gill-attachment'],drop_first=True)
test = pd.get_dummies(test1,columns=['edible-poisonous', 'cap-shape', 'cap-color',
       'does-bruise-or-bleed', 'gill-color',
       'stem-color', 'has-ring', 'habitat','gill-attachment'],drop_first=True)
# val = pd.get_dummies(val,columns=['edible-poisonous', 'cap-shape', 'cap-color',
#        'does-bruise-or-bleed', 'gill-color',
#        'stem-color', 'has-ring', 'habitat'],drop_first=True)

X_train,y_train = train.loc[:, train.columns != 'target'],train['target']

y_train = pd.get_dummies(y_train)

model.fit(X_train, y_train, epochs=20)
score = model.evaluate(X_train, y_train, batch_size=16)



Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [17]:
# np.argmax(model.predict(X_train),axis=1)

In [19]:
ringadd = dataframe[['edible-poisonous', 'cap-diameter', 'cap-shape', 'cap-color',
       'does-bruise-or-bleed', 'gill-color', 'stem-height',
       'stem-width', 'stem-color', 'has-ring', 'habitat','gill-attachment']]

ringadd = pd.get_dummies(ringadd,drop_first=True)

dataframe['ring_type'] = np.argmax(model.predict(ringadd),axis=1)

mapping_dictionary = {0:"e",1:"f",2:"g",3:"l",4:"m",5:"p",6:"r",7:"z"}
dataframe = dataframe.replace({"ring_type":mapping_dictionary})

dataframe['ring-type'] = dataframe.apply(lambda x: x['ring_type'] if pd.isna(x['ring-type']) else x['ring-type'],axis=1)
dataframe.drop('ring_type',axis=1,inplace=True)



ringaddtest = testframe[['edible-poisonous', 'cap-diameter', 'cap-shape', 'cap-color',
       'does-bruise-or-bleed', 'gill-color', 'stem-height',
       'stem-width', 'stem-color', 'has-ring', 'habitat','gill-attachment']]

ringaddtest = pd.get_dummies(ringaddtest,drop_first=True)

testframe['ring_type'] = np.argmax(model.predict(ringaddtest),axis=1)

mapping_dictionary = {0:"e",1:"f",2:"g",3:"l",4:"m",5:"p",6:"r",7:"z"}
testframe = testframe.replace({"ring_type":mapping_dictionary})

testframe['ring-type'] = testframe.apply(lambda x: x['ring_type'] if pd.isna(x['ring-type']) else x['ring-type'],axis=1)
testframe.drop('ring_type',axis=1,inplace=True)

In [38]:
# dataframe.to_csv('train_full_new.csv')
# testframe.to_csv('test_full_new.csv')

In [131]:
from tensorflow.keras.optimizers import SGD

In [136]:
traindf = pd.read_csv('train_full_new.csv',index_col=0)
testdf = pd.read_csv('test_full_new.csv',index_col=0)

target = dataframe['season']


# Drop un-used columns.
dataframe_new = traindf.drop(columns=['season'])
testdf.drop(columns=['season'],inplace=True)

# In the original dataset "4" indicates the pet was not adopted.
# dataframe1 = dataframe_new.copy()


# ga = feature_column.categorical_column_with_vocabulary_list(
#       'target', dataframe1['target'].unique())
# ga_emb = feature_column.embedding_column(['target'], dimension=7)

dataframe_new['target'] = target

# from keras.optimizers import SGD

model = Sequential()
# Dense(64) is a fully-connected layer with 64 hidden units.
# in the first layer, you must specify the expected input data shape:
# here, 20-dimensional vectors.
model.add(Dense(128, input_dim=66))
model.add(Activation('relu'))
model.add(Dropout(0.01))
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.01))
model.add(Dense(32))
model.add(Activation('relu'))
model.add(Dropout(0.01))
model.add(Dense(16))
model.add(Activation('relu'))
model.add(Dropout(0.01))
model.add(Dense(4))
model.add(Activation('softmax'))

sgd = SGD(lr=0.01, decay=1e-6, momentum=0.0, nesterov=True)
model.compile(loss='categorical_crossentropy',
            optimizer=sgd,
              metrics=['accuracy'])


  super(SGD, self).__init__(name, **kwargs)


In [137]:

train = pd.get_dummies(dataframe_new,columns=['edible-poisonous', 'cap-shape', 'cap-color',
       'does-bruise-or-bleed', 'gill-attachment', 'gill-color',
       'stem-color', 'has-ring', 'ring-type', 'habitat',],drop_first=True)
test = pd.get_dummies(testdf,columns=['edible-poisonous', 'cap-shape', 'cap-color',
       'does-bruise-or-bleed', 'gill-attachment', 'gill-color',
       'stem-color', 'has-ring', 'ring-type', 'habitat',],drop_first=True)
# val = pd.get_dummies(val,columns=['edible-poisonous', 'cap-shape', 'cap-color',
#        'does-bruise-or-bleed', 'gill-color',
#        'stem-color', 'has-ring', 'habitat'],drop_first=True)

train1, test1 = train_test_split(train, test_size=0.2)
# train, val = train_test_split(train, test_size=0.2)

X_train,y_train = train1.loc[:, train1.columns != 'target'],train1['target']
X_test,y_test = test1.loc[:, test1.columns != 'target'],test1['target']

y_train = pd.get_dummies(y_train)
y_test = pd.get_dummies(y_test)

model.fit(X_train, y_train, epochs=20)
score = model.evaluate(X_test, y_test, batch_size=16)



Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [140]:
# Import packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from keras.models import Sequential
from keras.layers import Dense, BatchNormalization, Dropout
from tensorflow.keras.optimizers import Adam, SGD, RMSprop, Adadelta, Adagrad, Adamax, Nadam, Ftrl
from keras.callbacks import EarlyStopping, ModelCheckpoint
from keras.wrappers.scikit_learn import KerasClassifier
from math import floor
from sklearn.metrics import make_scorer, accuracy_score
from bayes_opt import BayesianOptimization
from sklearn.model_selection import StratifiedKFold
from keras.layers import LeakyReLU
LeakyReLU = LeakyReLU(alpha=0.1)
import warnings
warnings.filterwarnings('ignore')
pd.set_option("display.max_columns", None)

ModuleNotFoundError: ignored

In [89]:
# testdf.drop(columns=['season'],inplace=True)

In [64]:
ringaddtest = testdf
ringaddtest = pd.get_dummies(ringaddtest,drop_first=True)

testdf['season'] = np.argmax(model.predict(ringaddtest),axis=1)

mapping_dictionary = {0:"a",1:"s",2:"u",3:"w"}
testdf = testdf.replace({"season":mapping_dictionary})

# testdf[''] = testdf.apply(lambda x: x['ring_type'] if pd.isna(x['ring-type']) else x['ring-type'],axis=1)
# testdf.drop('ring_type',axis=1,inplace=True)

In [65]:
testdf[['season']].to_csv('out4.csv',index=False)