In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [2]:
import matplotlib.pyplot as plt
import cv2 
from random import shuffle 
from tqdm import tqdm 
from PIL import Image
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
import warnings
warnings.filterwarnings('ignore')
import os
print(os.listdir("../input/messy-vs-clean-room"))

## Explanation.

In this cell I have imported all the required modules and also accessed the directory of the given dataset.

In [3]:
train_messy = "/kaggle/input/messy-vs-clean-room/images/images/train/messy/"
train_clean= "/kaggle/input/messy-vs-clean-room/images/images/train/clean/"
test_messy= "/kaggle/input/messy-vs-clean-room/images/images/val/messy/"
test_clean= "/kaggle/input/messy-vs-clean-room/images/images/val/clean/"
image_size = 128

In [4]:
for picture in tqdm(os.listdir(train_messy)):
    path=os.path.join(train_messy,picture)
    img=cv2.imread(path,cv2.IMREAD_GRAYSCALE)
    img=cv2.resize(img,(image_size,image_size)).flatten()
    np_img=np.asarray(img)
    
for picture2 in tqdm(os.listdir(train_clean)):
    path=os.path.join(train_clean,picture2)
    pic=cv2.imread(path,cv2.IMREAD_GRAYSCALE)
    pic=cv2.resize(img,(image_size,image_size)).flatten()
    np_pic=np.asarray(pic)


plt.figure(figsize=(10,10))
plt.subplot(1, 2, 1)
plt.imshow(np_img.reshape(image_size, image_size))
plt.axis('on')
plt.subplot(1, 2, 2)
plt.imshow(np_pic.reshape(image_size, image_size))
plt.axis('on')

## Explanation.

- The code starts by importing the necessary packages.
- Next, it creates a list of all the pictures in the train_messy directory and then loops through them to find one with a picture that is similar to what we are looking for.
- The code then uses cv2.imread() to read in the image from disk and resize it so that it will fit on our screen (image_size, image_size).
- It flattens out this array into np_img which is an numpy array of shape (image_size, image_size) .
- Next, we create a list of all the pictures in train_clean and loop through them as well.
- We use cv2.imread() again but this time we pass in "gray" instead of "RGB".
- This tells cv2 not to convert colors from RGB into gray before reading them in from disk like normal images would be read into memory using cv2's imread function.
- After reading these images back into memory they are resized so that they will also fit on our screen (image size, image size).
- They are flattened out into np pic which is an numpy array of shape (image size, image size) .
–
- The code attempts to show the difference between the input and output of a computer vision algorithm.

- The code shows an image of a picture on top, followed by the corresponding "clean" version of that same picture.

In [5]:
def train_data():
    train_data_messy = [] 
    train_data_clean=[]
    for image in tqdm(os.listdir(train_messy)): 
        path = os.path.join(train_messy, image)
        img = cv2.imread(path, cv2.IMREAD_GRAYSCALE) 
        img = cv2.resize(img, (image_size, image_size))
        train_data_messy.append(img) 
    for picture in tqdm(os.listdir(train_clean)): 
        path = os.path.join(train_clean, picture)
        pic = cv2.imread(path, cv2.IMREAD_GRAYSCALE) 
        pic = cv2.resize(pic, (image_size, image_size))
        train_data_clean.append(pic) 
    
    train_data= np.concatenate((np.asarray(train_data_messy),np.asarray(train_data_clean)),axis=0)
    return train_data

## Explanation.

- The code starts by declaring a variable called train_data.
- This is the list of images that will be used for training and testing.
- The code then declares two variables, train_data_messy and train_data_clean, which are lists of images that have been preprocessed in different ways.
- The first line creates an empty list called train_data.
- Then the next line starts iterating through all the files in tqdm(os.listdir(train_messy)) to create a new image every time it's done with one file from tqdm(os.listdir(train_messy)).
- The path variable is set to os.path.join(train_messy, image) so it can access each individual file from tqdm(os.listdir).
- Next, cv2 is imported as well as some other libraries needed for processing images like cv2 import numpy as np and cv2 import matplotlib as plt .
- Next up we use os to get information about where our data directory is located on our computer using os path = os .path .join (train _ mess y ,image) so we know what directory contains all the pictures we're
- The code is a function that takes in an array of images and returns the corresponding train_data.

- The first line creates a list of all the files in the directory called train_messy.
- The second line creates a list of all the files in the directory called train_clean.
- The third line takes each file from train_messy and converts it to grayscale, then resize it so that its size is equal to image size, which is 640x480 pixels for this example.

In [6]:
def test_data():
    test_data_messy = [] 
    test_data_clean=[]
    for image in tqdm(os.listdir(test_messy)): 
        path = os.path.join(test_messy, image)
        img = cv2.imread(path, cv2.IMREAD_GRAYSCALE) 
        img = cv2.resize(img, (image_size, image_size))
        test_data_messy.append(img) 
    for picture in tqdm(os.listdir(test_clean)): 
        path = os.path.join(test_clean, picture)
        pic = cv2.imread(path, cv2.IMREAD_GRAYSCALE) 
        pic = cv2.resize(pic, (image_size, image_size))
        test_data_clean.append(pic) 
    
    test_data= np.concatenate((np.asarray(test_data_messy),np.asarray(test_data_clean)),axis=0) 
    return test_data

## Explanation.

- The code starts by declaring a variable called test_data.
- This is the list of data that will be analyzed in this program.
- The next line declares an empty list called test_data_clean, which will hold the cleaned up version of the original data.
- Next, it iterates through all of the files in the directory where our image was found and creates a new list for each one with its path as its first element and then stores it into test_data.
- It then goes on to create another empty list called test_data_messy, which holds all of the images that are not yet cleaned up.
- After creating these two lists, it concatenates them together using np.concatenate().
- The last line returns what we have created so far:
- The code is used to create a list of images that are in the test_messy directory and then creates a list of images from the test_clean directory.
- The code then concatenates both lists together into one list with the first axis being 0.

In [7]:
train_data=train_data()
test_data=test_data()

In [8]:
x_data=np.concatenate((train_data,test_data),axis=0)
x_data=(x_data-np.min(x_data))/(np.max(x_data)-np.min(x_data))

## Explanation.

- The code starts by creating a list of the data points in the training set.
- Then, it creates a new list with all the data points in the test set.
- Next, it subtracts out any values that are less than or equal to 0 and divides by max(x_data) - min(x_data).
- The result is then sorted into ascending order.
- The code starts by creating two lists: one for each dataset (train and test).
- It then subtracts out any values that are less than or equal to 0 and divides by max(x_data) - min(x_data).
- The result is then sorted into ascending order.
- The code will take the data from both the train and test sets, concatenate them together, then find the minimum value in that concatenated list.

In [10]:
z1=np.zeros(96)
o1=np.ones(96)
Y_train=np.concatenate((o1,z1),axis=0)
z=np.zeros(10)
o=np.ones(10)
Y_test=np.concatenate((o,z),axis=0)

In [11]:
y_data=np.concatenate((Y_train,Y_test),axis=0).reshape(x_data.shape[0],1)

#To print the values of X and Y axis.

print(x_data.shape)
print(y_data.shape)

## Explanation.

- The code starts by importing the numpy library.
- Then it creates a variable called X_train and Y_train which are two matrices of size 100x100.
- The code then reshapes the matrix to have one row and one column with shape (1,100).
- The next line is where the data is split into training set and test set.
- This is done by creating an array of size 1x2 which contains 0 for both rows and columns in order to create a new matrix that has only one row and one column with shape (1,2).
- Next, this new matrix is concatenated with another array containing all values from Y_train on axis=0 so that they become a single vector.
- The result of this operation will be stored in y_data .
- The code will create a matrix of size (X_data.shape[0],1) and reshape it into the shape of (x_data.shape[0],1).

In [12]:
x_train, x_test, y_train, y_test = train_test_split(x_data, y_data, test_size=0.15, random_state=42)
number_of_train = x_train.shape[0]
number_of_test = x_test.shape[0]

## Explanation.

- The code splits the data into two sets, x_train and x_test.
- The code then randomly assigns a percentage of 0.15 to each set so that it is equally likely for any given row in the dataset to be selected as either an example in the training or test set.
- The shape of both sets are [n_train, n_test] where n is the number of rows in each set.
- The code splits the data into two sets, x_train and x_test.
- The test set is then randomly split into 15% of the total number of training examples.
- The code above also computes the size of each training and test set in order to compute the size of each random subset.

In [13]:
x_train_flat = x_train.reshape(number_of_train,x_train.shape[1]*x_train.shape[2])
x_test_flat = x_test .reshape(number_of_test,x_test.shape[1]*x_test.shape[2])
print("X train:",x_train_flat.shape)
print("X test:",x_test_flat.shape)

## Explanation.

- The code is trying to reshape the data from a matrix into a vector.
- The code is doing this by taking the number of rows and columns in each dimension, multiplying them together, then dividing that result by the number of dimensions.
- The first line creates an empty list called x_train_flat.
- This will be used as a placeholder for all training data before it is reshaped into vectors.
- The next line takes the training data and converts it into a matrix using NumPy's array function:
- x_train = np.array(data) # convert to numpy array
- print("X train:",x_train) # print out shape of X train
- print("X test:",x_test) # print out shape of X test
- This code prints out two things about each column in both arrays (the training and testing).
- One thing printed is how many elements are in that column (number), followed by what type of object those elements are (shape).
- The code will reshape the data into a matrix of shape (number_of_train, number_of_test) and then flatten it.

In [14]:
x_train = x_train_flat.T
x_test = x_test_flat.T
y_test = y_test.T
y_train = y_train.T

### Explanation.

The code attempts to create a training set of data and a test set of data.

In [15]:
# Algorithm for identifying the clean and messy room.
# It'll also print the accurace of test and training.

def initialize_weights_and_bias(dimension):
    w = np.full((dimension,1),0.01)
    b = 0.0
    return w, b

def sigmoid(z):
    y_head = 1/(1+np.exp(-z))
    return y_head

def forward_backward_propagation(w,b,x_train,y_train):
    # forward propagation
    z = np.dot(w.T,x_train) + b
    y_head = sigmoid(z)
    loss = -y_train*np.log(y_head)-(1-y_train)*np.log(1-y_head)
    cost = (np.sum(loss))/x_train.shape[1]
    # backward propagation
    derivative_weight = (np.dot(x_train,((y_head-y_train).T)))/x_train.shape[1]
    derivative_bias = np.sum(y_head-y_train)/x_train.shape[1]
    gradients = {"derivative_weight": derivative_weight,"derivative_bias": derivative_bias}
    return cost,gradients

def update(w, b, x_train, y_train, learning_rate,number_of_iterarion):
    cost_list = []
    cost_list2 = []
    index = []
    
    for i in range(number_of_iterarion):
        
        cost,gradients = forward_backward_propagation(w,b,x_train,y_train)
        cost_list.append(cost)
        
        w = w - learning_rate * gradients["derivative_weight"]
        b = b - learning_rate * gradients["derivative_bias"]
        if i % 100 == 0:
            cost_list2.append(cost)
            index.append(i)
            print ("Cost after iteration %i: %f" %(i, cost))
    
    parameters = {"weight": w,"bias": b}
    plt.plot(index,cost_list2)
    plt.xticks(index,rotation='vertical')
    plt.xlabel("Number of Iterarion")
    plt.ylabel("Cost")
    plt.show()
    return parameters, gradients, cost_list

def predict(w,b,x_test):
    
    z = sigmoid(np.dot(w.T,x_test)+b)
    Y_prediction = np.zeros((1,x_test.shape[1]))

    for i in range(z.shape[1]):
        if z[0,i]<= 0.5:
            Y_prediction[0,i] = 0
        else:
            Y_prediction[0,i] = 1

    return Y_prediction

def logistic_regression(x_train, y_train, x_test, y_test, learning_rate ,  num_iterations):

    dimension =  x_train.shape[0]
    w,b = initialize_weights_and_bias(dimension)

    parameters, gradients, cost_list = update(w, b, x_train, y_train, learning_rate,num_iterations)
    
    y_prediction_test = predict(parameters["weight"],parameters["bias"],x_test)
    y_prediction_train = predict(parameters["weight"],parameters["bias"],x_train)
    
    print("Test Accuracy: {} %".format(round(100 - np.mean(np.abs(y_prediction_test - y_test)) * 100,2)))
    print("Train Accuracy: {} %".format(round(100 - np.mean(np.abs(y_prediction_train - y_train)) * 100,2)))

### Explanation.
- The code starts by initializing the weights and bias of a neural network.
- The sigmoid function is used to calculate the output from the input.
- Forward-backward propagation is then used to calculate cost and gradients for updating weights and biases.
- Finally, logistic regression is performed on test data in order to predict values for future data points.
- The code starts by initializing the weights and bias of a neural network with w = np.full((dimension,1),0.01) b = 0.0 	return w, b .
- Then forward-backward propagation is used to calculate cost and gradients for updating these parameters using x_train , y_train , x_test , y_test .
- Finally, logistic regression is performed on test data in order to predict values for future data points using Y_prediction = np.zeros((1,x_test shape[1])) return Y_prediction- The code is a logistic regression model.
- The code above first initializes the weights and bias of the model.
- Then, it performs forward and backward propagation to calculate the cost function.
- Finally, it prints out the result of each iteration after 100 iterations are completed.

In [16]:
logistic_regression(x_train, y_train, x_test, y_test,learning_rate = 0.01, num_iterations = 1500)

In [17]:
grid={"C":np.logspace(-3,3,7),"penalty":["l1","l2"]},
logistic_regression=LogisticRegression(random_state=20)
log_reg_cv=GridSearchCV(logistic_regression,grid,cv=10)
log_reg_cv.fit(x_train.T,y_train.T)

## Explanation.

- The code starts by importing the necessary libraries.
- Next, a grid is created with 7 columns and 3 rows.
- The first column has a penalty of l1 for negative values and l2 for positive values.
- The second column has a penalty of -l1 for negative values and +l2 for positive values.
- Next, an instance of LogisticRegression is created with random_state=42 as the seed value to initialize it from.
- A GridSearchCV object is then created using logistic regression as the search algorithm and 10 as the CV parameter (confusion matrix).
- Finally, this object's fit method is called on x_train which returns predictions y_train .
- The code is to fit a logistic regression model on the training data.
- The code above takes in x_train and y_train as input, and then runs a grid search with 10-fold cross validation.

In [18]:
print("best hyperparameters: ", log_reg_cv.best_params_)
print("accuracy: ", log_reg_cv.best_score_)

## Explanation.

The code will print out the best hyperparameters and accuracy of logistic regression.