# Baseline for MNIST Educational Challenge on AIcrowd
#### Author : Ayush Shivani

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ayushshivani/aicrowd_educational_baselines/blob/master/MNIST_baseline.ipynb)


## Download Necessary Packages

In [76]:
import sys
!{sys.executable} -m pip install numpy
!{sys.executable} -m pip install pandas
!{sys.executable} -m pip install scikit-learn



## Download dataset

In [None]:
!wget https://s3.eu-central-1.wasabisys.com/aicrowd-public-datasets/aicrowd_educational_mnist/data/public/test.zip
!wget https://s3.eu-central-1.wasabisys.com/aicrowd-public-datasets/aicrowd_educational_mnist/data/public/train.zip
!unzip train.zip
!unzip test.zip


## Import packages

In [77]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.metrics import f1_score,precision_score,recall_score,accuracy_score

## Load the data

In [80]:
train_data_path = "train.csv" #path where data is stored

In [79]:
train_data = pd.read_csv(train_data_path) #load data in dataframe using pandas

## Split the data in train/test

In [65]:
X_train, X_test= train_test_split(train_data, test_size=0.2, random_state=42) 

In [66]:
X_train,y_train = X_train.iloc[:,1:],X_train.iloc[:,0]
X_test,y_test = X_test.iloc[:,1:],X_test.iloc[:,0]

## Define the classifier

In [81]:
classifier = LogisticRegression(solver = 'lbfgs',multi_class='auto',max_iter=100)

One can set more parameters. To see the list of parameters visit [here](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html).

We can also use other classifiers. To read more about sklear classifiers visit [here](https://scikit-learn.org/stable/supervised_learning.html).

## Train the classifier

In [None]:
classifier.fit(X_train, y_train)


# Predict on test set

In [69]:
y_pred = classifier.predict(X_test)

## Find the scores 

In [70]:
precision = precision_score(y_test,y_pred,average='micro')
recall = recall_score(y_test,y_pred,average='micro')
accuracy = accuracy_score(y_test,y_pred)
f1 = f1_score(y_test,y_pred,average='macro')

In [71]:
print("Accuracy of the model is :" ,accuracy)
print("Recall of the model is :" ,recall)
print("Precision of the model is :" ,precision)
print("F1 score of the model is :" ,f1)

Accuracy of the model is : 0.92225
Recall of the model is : 0.92225
Precision of the model is : 0.92225
F1 score of the model is : 0.9213314758432045


# Prediction on Evaluation Set

# Load the evaluation data

In [72]:
final_test_path = "test.csv"
final_test = pd.read_csv(final_test_path)

## Predict on evaluation set

In [73]:
submission = classifier.predict(final_test)

## Save the prediction to csv

In [74]:
submission = pd.DataFrame(submission)
submission.to_csv('submission.csv',header=['label'],index=False)

### Go to [platform](https://www.aicrowd.com/challenges/mnist-recognise-handwritten-digits/). Participate in the challenge and submit the submission.csv generated.