# Kannada MNIST with H2O DeepLearning

![](https://miro.medium.com/max/1095/1*s1tZoytg71DUnEKEYWtyNw.png)

[H2O’s](http://docs.h2o.ai/h2o/latest-stable/h2o-docs/welcome.html) is an open source, in-memory, distributed, fast, and scalable machine learning and predictive analytics platform that allows you to build machine learning models on big data and provides easy productionalization of those models in an enterprise environment.

H2O's [Deep Learning](http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/deep-learning.html) is based on a multi-layer feedforward artificial neural network that is trained with stochastic gradient descent using back-propagation. The network can contain a large number of hidden layers consisting of neurons with tanh, rectifier, and maxout activation functions.

Advanced features such as adaptive learning rate, rate annealing, momentum training, dropout, L1 or L2 regularization, checkpointing, and grid search enable high predictive accuracy. Each compute node trains a copy of the global model parameters on its local data with multi-threading (asynchronously) and contributes periodically to the global model via model averaging across the network.


## Acknowledgement

Arno Candel for his [kernel](https://www.kaggle.com/arnocandel/mnist-with-h2o-deeplearning) on MNIST with H2O DeepLearning

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.

## Starting H2O cluster and Importing the DeepLearning Estimator

In [None]:
import h2o
print(h2o.__version__)
from h2o.estimators.deeplearning import H2ODeepLearningEstimator

h2o.init(max_mem_size='16G')

## Upload the datasets to the the H2O cluster. 

The data is imported into H2OFrames, which operate similarly in function to pandas DataFrames.  

In [None]:
train = h2o.import_file("/kaggle/input/Kannada-MNIST/train.csv")
test = h2o.import_file("/kaggle/input/Kannada-MNIST/test.csv")
submission = h2o.import_file("/kaggle/input/Kannada-MNIST/sample_submission.csv")

In [None]:
train.head()

 ## Specify the response and predictor columns

In [None]:
x = train.columns[1:]
y = 'label'


## Encode the response column as categorical for **multinomial classification

In [None]:

train[y] = train[y].asfactor()

## Train Deep Learning model 

Here nfolds = 3, which means it performs three folds cross validation. To disable cross-validation, use nfolds=0,
which is the default value. More information about the parameters can be found in the [H2O Deep Learning booklet](http://h2o.ai/resources/).


In [None]:
dl = H2ODeepLearningEstimator(input_dropout_ratio = 0.2, nfolds=3)
dl.train(x=x, y=y, training_frame=train)

## Extracting the Results

In [None]:
dl.model_performance(xval=True)

## Predictions

In [None]:
preds = dl.predict(test)
preds['p1'].as_data_frame().values.shape

In [None]:
preds

## Sample Submissions

In [None]:
sample_submission = pd.read_csv('/kaggle/input/Kannada-MNIST/sample_submission.csv')
sample_submission.shape

In [None]:
sample_submission['label'] = preds['predict'].as_data_frame().values
sample_submission.to_csv('H2O_DL.csv', index=False)
sample_submission.head()