<a href="https://colab.research.google.com/github/JaeDoo1034/Kaggle-Study/blob/master/Keras_tuner1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install git+https://github.com/keras-team/keras-tuner.git -q

  Building wheel for keras-tuner (setup.py) ... [?25l[?25hdone
  Building wheel for terminaltables (setup.py) ... [?25l[?25hdone


MoA: Keras + KerasTuner best practices¶<br>
This notebook will teach you how to:<br>

1. Use a Keras neural network for the MoA competition
2. Use KerasTuner to find high-performing model configurations
3. Ensemble a few of the top models to generate final predictions

In [1]:
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

In [2]:
print('TF version:', tf.__version__)
print('GPU devices:', tf.config.list_physical_devices('GPU'))

TF version: 2.3.0
GPU devices: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]


In this competition, we're looking at 3 CSV files: one for training features, one for training targets (with the same number of entries and a 1:1 match between entries in the features file and those in the targets file), and one for test features. The goal is to predict the targets that correspond to the test features.

In [5]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [6]:
train_features_df = pd.read_csv('/content/list-moa/train_features.csv')
train_targets_df = pd.read_csv('/content/list-moa/train_targets_scored.csv')
test_features_df = pd.read_csv('/content/list-moa/test_features.csv')

In [7]:
print('train_features_df.shape:', train_features_df.shape)
print('train_targets_df.shape:', train_targets_df.shape)
print('test_features_df.shape:', test_features_df.shape)

train_features_df.shape: (9584, 876)
train_targets_df.shape: (23814, 207)
test_features_df.shape: (3982, 876)


In [8]:
train_features_df.sample(5)

Unnamed: 0,sig_id,cp_type,cp_time,cp_dose,g-0,g-1,g-2,g-3,g-4,g-5,g-6,g-7,g-8,g-9,g-10,g-11,g-12,g-13,g-14,g-15,g-16,g-17,g-18,g-19,g-20,g-21,g-22,g-23,g-24,g-25,g-26,g-27,g-28,g-29,g-30,g-31,g-32,g-33,g-34,g-35,...,c-60,c-61,c-62,c-63,c-64,c-65,c-66,c-67,c-68,c-69,c-70,c-71,c-72,c-73,c-74,c-75,c-76,c-77,c-78,c-79,c-80,c-81,c-82,c-83,c-84,c-85,c-86,c-87,c-88,c-89,c-90,c-91,c-92,c-93,c-94,c-95,c-96,c-97,c-98,c-99
2156,id_17151431c,trt_cp,48,D2,0.2871,-0.5234,-2.739,1.071,-0.5239,1.608,1.048,-0.2696,0.0523,0.7615,-1.123,-0.6801,-1.037,0.9577,-1.917,-0.0384,1.724,-0.5185,-1.041,-0.8852,0.6342,0.8359,-0.5106,-0.0961,0.2105,0.4349,0.1016,0.2061,0.7106,1.75,0.236,2.992,-0.1055,0.6839,0.1814,-0.3879,...,0.5668,-0.2458,-0.3608,1.308,-0.1791,-1.127,0.479,0.5706,-0.7124,-0.3269,-1.276,0.6352,0.4923,-0.4622,-0.5449,-0.8785,-0.4349,-0.3091,0.0126,0.4977,0.3637,-0.9565,0.2882,-0.5982,0.1831,0.4152,-0.0189,-0.1452,-0.6767,0.0254,0.2513,0.4052,0.7098,-0.7969,-0.9011,-0.3009,0.437,1.066,0.387,0.006
6707,id_47b3d8626,trt_cp,24,D1,-0.3812,-0.2175,0.7398,0.3453,1.197,-0.5934,0.926,1.096,-0.9367,-0.6648,0.0787,0.7102,-0.3683,0.0131,-1.77,0.4221,-0.2394,-0.8451,0.2111,0.447,-0.5389,0.5449,-1.24,-0.7131,-0.1826,0.8555,0.3575,0.3108,0.1062,-0.1784,-0.0809,-0.2097,-0.15,1.001,0.8343,-0.0865,...,-0.4703,-0.1572,0.7206,-0.4357,-0.5604,0.2186,0.6129,-0.8133,0.1335,-1.145,0.6868,0.0043,-0.8765,-0.9171,0.1006,0.0864,-0.0133,-0.9021,-0.3381,0.3789,-0.1603,-0.2464,0.0307,-0.2174,-1.293,-0.2044,-1.156,-0.0781,-0.8536,-0.4451,-0.6935,-0.2409,-0.6941,0.295,-0.424,-0.1218,0.7656,-0.7064,-0.6518,0.4767
2420,id_19ca12ec3,trt_cp,24,D1,1.02,-1.128,0.4073,-0.487,-1.46,-1.011,0.2689,-2.091,-0.5445,1.005,1.919,2.794,-0.1123,2.151,0.6866,-1.007,-1.331,0.6171,0.0421,-0.4487,-0.5456,-0.4254,1.175,0.2821,0.3989,-2.352,1.143,0.3255,-0.6579,-0.0816,-0.0286,1.183,-0.2539,-0.792,-0.4972,-1.039,...,-0.9208,-0.47,-0.8858,-0.2812,-0.5807,-2.13,-0.5211,0.2714,-0.095,-0.4615,-0.4189,-0.4696,-0.4552,0.5459,-0.4338,-0.6298,-1.053,-0.0371,-0.6715,-0.1774,-0.5837,-0.2577,0.0426,-0.8019,0.6555,-0.0595,-0.6146,-2.027,-0.3243,-0.2819,-0.466,-1.469,-0.0546,-0.1789,0.8141,-0.6162,-1.951,-0.8947,-1.537,-1.092
7465,id_5009a1ea4,trt_cp,48,D1,1.766,-1.11,-1.451,0.3372,-1.875,-0.6719,0.7348,-1.852,0.4195,-0.0373,1.143,-0.5863,1.314,-0.911,0.4934,-0.7996,0.8605,0.3918,-0.0735,0.0061,-0.4809,-2.131,1.398,-0.4586,-0.6741,-0.9619,1.075,0.1286,0.034,0.9358,-0.0226,0.2458,-0.3865,-0.1248,-0.5033,-0.3734,...,-1.275,-1.476,-0.5307,-0.3794,-0.9176,-0.3521,-0.5652,0.1814,-0.1628,0.4463,-0.2607,-1.158,-0.0492,0.6522,-0.8539,0.5216,-0.7734,0.4256,-0.3941,-0.1497,-0.0878,-0.0738,-0.9015,-1.203,0.4919,-0.921,1.334,-1.637,-0.4323,0.1293,-1.008,-0.3598,-1.255,0.0022,-0.7723,-0.0127,-0.6393,-0.1445,-0.4916,-0.5186
2521,id_1ae825d32,trt_cp,72,D1,0.2853,0.1803,-0.7998,-0.2013,0.4573,-0.0625,-0.9012,-0.3826,-0.1385,-0.0478,0.3066,-0.0911,0.5585,-1.155,1.201,0.5377,0.033,-0.3832,0.4424,0.0477,0.9392,-0.7944,0.5343,0.0,0.015,-0.3442,-0.1255,0.5035,-0.0576,1.92,-0.058,-0.2921,-0.4067,-0.1972,1.321,-0.1326,...,-0.1726,0.4991,1.115,-0.4608,0.6169,0.7393,-1.1,0.4715,0.0494,0.2234,0.3937,-0.2415,-0.5118,-1.02,-0.6479,-0.0878,0.3611,-1.43,-0.3358,-0.0427,0.0364,0.1255,-0.4729,0.2774,0.2377,0.1177,0.7194,-0.184,0.0901,0.0953,-0.148,-0.855,-1.225,0.4529,0.1064,0.9049,-0.0688,-0.9046,0.1708,-0.0799


Ok, so we have 2 categorical features (cp_type and cp_dose, which are strings), and everything else is numerical (assuming g-0 to g-99 are homogeneous in type).

We'll use the StringLookup and CategoryEncoding layers to encode the categorical features, and the Normalization layer to normalize the values of the numerical features.

Let's look at the targets:

In [9]:
train_targets_df.sample(5)

Unnamed: 0,sig_id,5-alpha_reductase_inhibitor,11-beta-hsd1_inhibitor,acat_inhibitor,acetylcholine_receptor_agonist,acetylcholine_receptor_antagonist,acetylcholinesterase_inhibitor,adenosine_receptor_agonist,adenosine_receptor_antagonist,adenylyl_cyclase_activator,adrenergic_receptor_agonist,adrenergic_receptor_antagonist,akt_inhibitor,aldehyde_dehydrogenase_inhibitor,alk_inhibitor,ampk_activator,analgesic,androgen_receptor_agonist,androgen_receptor_antagonist,anesthetic_-_local,angiogenesis_inhibitor,angiotensin_receptor_antagonist,anti-inflammatory,antiarrhythmic,antibiotic,anticonvulsant,antifungal,antihistamine,antimalarial,antioxidant,antiprotozoal,antiviral,apoptosis_stimulant,aromatase_inhibitor,atm_kinase_inhibitor,atp-sensitive_potassium_channel_antagonist,atp_synthase_inhibitor,atpase_inhibitor,atr_kinase_inhibitor,aurora_kinase_inhibitor,...,protein_synthesis_inhibitor,protein_tyrosine_kinase_inhibitor,radiopaque_medium,raf_inhibitor,ras_gtpase_inhibitor,retinoid_receptor_agonist,retinoid_receptor_antagonist,rho_associated_kinase_inhibitor,ribonucleoside_reductase_inhibitor,rna_polymerase_inhibitor,serotonin_receptor_agonist,serotonin_receptor_antagonist,serotonin_reuptake_inhibitor,sigma_receptor_agonist,sigma_receptor_antagonist,smoothened_receptor_antagonist,sodium_channel_inhibitor,sphingosine_receptor_agonist,src_inhibitor,steroid,syk_inhibitor,tachykinin_antagonist,tgf-beta_receptor_inhibitor,thrombin_inhibitor,thymidylate_synthase_inhibitor,tlr_agonist,tlr_antagonist,tnf_inhibitor,topoisomerase_inhibitor,transient_receptor_potential_channel_antagonist,tropomyosin_receptor_kinase_inhibitor,trpv_agonist,trpv_antagonist,tubulin_inhibitor,tyrosine_kinase_inhibitor,ubiquitin_specific_protease_inhibitor,vegfr_inhibitor,vitamin_b,vitamin_d_receptor_agonist,wnt_inhibitor
20050,id_d6fa57659,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
18132,id_c2c45b124,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
21024,id_e184ed55c,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1509,id_101aad6ac,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
10114,id_6cea0fdcb,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


The targets are binary indicators (0 or 1) across 206 different categories. So our model should output a probability score between 0 and 1 (sigmoid activation) across 206 outputs.

The sample submission format matches these expectations:

In [None]:
sample_submission_df = pd.read_csv('/kaggle/input/lish-moa/sample_submission.csv')
sample_submission_df.sample(5)