# Baseline for OCPDT Educational Challenge on AIcrowd
#### Author : Ayush Shivani

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ayushshivani/aicrowd_educational_baselines/blob/master/OCTDT_baseline.ipynb)

## Download Necessary Packages

In [1]:
import sys
!{sys.executable} -m pip install numpy
!{sys.executable} -m pip install pandas
!{sys.executable} -m pip install scikit-learn



## Download dataset

In [None]:
!wget https://s3.eu-central-1.wasabisys.com/aicrowd-public-datasets/aicrowd_educational_ocpdt/data/public/test.csv
!wget https://s3.eu-central-1.wasabisys.com/aicrowd-public-datasets/aicrowd_educational_ocpdt/data/public/train.csv


## Import packages

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.metrics import f1_score,precision_score,recall_score,accuracy_score

## Load the data

In [2]:
train_data_path = "train.csv" #path where data is stored

In [4]:
train_data = pd.read_csv(train_data_path) #load data in dataframe using pandas

In [6]:
train_data.head()

Unnamed: 0,date,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy
0,2015-02-11 14:48:00,21.76,31.133333,437.333333,1029.666667,0.005021,1
1,2015-02-11 14:49:00,21.79,31.0,437.333333,1000.0,0.005009,1
2,2015-02-11 14:50:00,21.7675,31.1225,434.0,1003.75,0.005022,1
3,2015-02-11 14:51:00,21.7675,31.1225,439.0,1009.5,0.005022,1
4,2015-02-11 14:51:59,21.79,31.133333,437.333333,1005.666667,0.00503,1


In [15]:
train_data = train_data.drop(['date'],axis=1)

In [17]:
train_data.head()

Unnamed: 0,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy
0,21.76,31.133333,437.333333,1029.666667,0.005021,1
1,21.79,31.0,437.333333,1000.0,0.005009,1
2,21.7675,31.1225,434.0,1003.75,0.005022,1
3,21.7675,31.1225,439.0,1009.5,0.005022,1
4,21.79,31.133333,437.333333,1005.666667,0.00503,1


## Split the data in train/test

In [18]:
X_train, X_test= train_test_split(train_data, test_size=0.2, random_state=42) 

In [19]:
X_train,y_train = X_train.iloc[:,:-1],X_train.iloc[:,-1]
X_test,y_test = X_test.iloc[:,:-1],X_test.iloc[:,-1]

## Define the classifier

In [20]:
classifier = LogisticRegression(solver = 'lbfgs',multi_class='auto',max_iter=100)

One can set more parameters. To see the list of parameters visit [here](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html).

We can also use other classifiers. To read more about sklear classifiers visit [here](https://scikit-learn.org/stable/supervised_learning.html).

## Train the classifier

In [21]:
classifier.fit(X_train, y_train)


LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=1000,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                   warm_start=False)

# Predict on test set

In [23]:
y_pred = classifier.predict(X_test)

## Find the scores 

In [24]:
precision = precision_score(y_test,y_pred,average='micro')
recall = recall_score(y_test,y_pred,average='micro')
accuracy = accuracy_score(y_test,y_pred)
f1 = f1_score(y_test,y_pred,average='macro')

In [25]:
print("Accuracy of the model is :" ,accuracy)
print("Recall of the model is :" ,recall)
print("Precision of the model is :" ,precision)
print("F1 score of the model is :" ,f1)

Accuracy of the model is : 0.9923116350589442
Recall of the model is : 0.9923116350589442
Precision of the model is : 0.9923116350589442
F1 score of the model is : 0.9891767487049267


# Prediction on Evaluation Set

# Load the evaluation data

In [29]:
final_test_path = "test.csv"
final_test = pd.read_csv(final_test_path)
final_test = final_test.drop(['date'],axis=1)

## Predict on evaluation set

In [30]:
submission = classifier.predict(final_test)

## Save the prediction to csv

In [32]:
submission = pd.DataFrame(submission)
submission.to_csv('submission.csv',header=['occupancy'],index=False)

## To download the generated in collab csv run the below command

In [None]:
from google.colab import files
files.download('submission.csv') 

### Go to [platform](https://www.aicrowd.com/challenges/ocpdt-occupancy-detection). Participate in the challenge and submit the submission.csv generated.