## Occupancy Detection
`Donated on 2/28/2016`

Experimental data used for binary classification (room occupancy) from Temperature,Humidity,Light and CO2. Ground-truth occupancy was obtained from time stamped pictures that were taken every minute.

dataset can be found here : https://archive.ics.uci.edu/dataset/357/occupancy+detection <br/>
`[Candanedo,Luis. (2016). Occupancy Detection . UCI Machine Learning Repository. https://doi.org/10.24432/C5X01N.]`


| id |      date         |Temperature|   Humidity     |     Light      |      CO2       |   HumidityRatio   | Occupancy |
|----|-------------------|-----------|----------------|----------------|----------------|-------------------|-----------|
|"1" |2015-02-11 14:48:00|   21.76   |31.1333333333333|437.333333333333|1029.66666666667|0.00502101089021385|    1      |
|"2" |2015-02-11 14:49:00|   21.79   |31              |437.333333333333|1000            |0.00500858127480172|    1      |

`~ over 8000 entries, table is truncated`

Here id and date are not necessary for our model because they are not related with the occupancy

In [4]:
import csv
import numpy as np

file_train_data = open('data_training.csv') # open dataset for training

csvreader = csv.reader(file_train_data) # reading dataset for training

header = []
header = next(csvreader) # reading headers in training dataset -> ["id","date","Temperature","Humidity","Light","CO2","HumidityRatio","Occupancy"]

rows = [next(csvreader)] # initialize an 2D array with each element consist of 8 features corresponding to above header

for row in csvreader:
    rows = np.append(rows, [row], axis=0) # creating a matrix using the dataset

# print(np.shape(rows)[0])

X_train = rows[:,2:7]  # Here index 0 - id and index 1 - date are not necessary for our model because they are not related with the occupancy
target_train = rows[:,7] # target is in the final column


# tagging the training dataset as -1, 1 according to the target
y_train = np.array([])
for i in range(np.shape(rows)[0]):
    if target_train[i] == '1':
        y_train = np.append(y_train, [-1], axis=0)
    else:
        y_train = np.append(y_train, [1], axis=0)

Ntrain = np.shape(rows)[0] # size of the training set

# shuffle the training set for better training
rIndex = np.random.permutation(Ntrain)
X_train = X_train[rIndex,]
target_train = target_train[rIndex]
y_train = y_train[rIndex]



file_test_data = open('data_test.csv') # open dataset for testing

csvreader = csv.reader(file_test_data) # reading dataset for testing

header = []
header = next(csvreader) # reading headers in training dataset -> ["id","date","Temperature","Humidity","Light","CO2","HumidityRatio","Occupancy"]

rows = [next(csvreader)] # initialize an 2D array with each element consist of 8 features corresponding to above header

for row in csvreader:
    rows = np.append(rows, [row], axis=0) # creating a matrix using the dataset

# print(np.shape(rows)[0])

X_test= rows[:,2:7] # Here index 0 - id and index 1 - Temperature are not necessary for our model because they are not related with the occupancy
target_test = rows[:,7] # target is in the final column

# tagging the test dataset as -1, 1 according to the target
y_test = np.array([])
for i in range(np.shape(rows)[0]):
    if target_test[i] == '1':
        y_test = np.append(y_test, [-1], axis=0)
    else:
        y_test = np.append(y_test, [1], axis=0)


Ntest = np.shape(rows)[0] # size of the testing set


# check percentage correctness of the perceptron algorithm
def PercentCorrect(Inputs, targets, weights):
    N = len(targets)
    nCorrect = 0
    for n in range(N):
        OneInput = Inputs[n,:]
        if (targets[n] * np.dot(OneInput, weights) > 0):
            nCorrect +=1
    return 100*nCorrect/N


# Perceptron learning loop

# Random initialization of weights
w = np.random.randn(5)

# Fixed number of iterations (think of better stopping criterion)
MaxIter=1000

# Learning rate (change this to see convergence changing)
alpha = 0.002

# Main Loop
for iter in range(MaxIter):

    # Select a data item at random
    r = np.floor(np.random.rand()*Ntrain).astype(int)
    x = X_train[r,:]
    x = x.astype(np.float32)

    # If it is misclassified, update weights
    if (y_train[r] * np.dot(x, w) < 0):
        w += alpha * y_train[r] * x

# Evaluate trainign and test performances
print("percent correctness of the trained  model on training data =", PercentCorrect(X_train.astype(np.float32), y_train.astype(np.float32), w))
print("percent correctness of the trained  model on testing data =", PercentCorrect(X_test.astype(np.float32), y_test.astype(np.float32), w))

# print(np.dot(X_test[2376,:].astype(np.float32), w), int(target_test[2376]))

percent correctness of the trained  model on training data = 78.74247820213681
percent correctness of the trained  model on testing data = 82.3318293683347
