# **Background information:** 

Welcome to the coding challenging! This will be a 3-part challenge and you will develop an image processing and computer vision techniques to analyze the health of Solar photovoltaic (PV) systems. The challenege will be judged based on these  three results.

**(a)**Bench marking classification results that can be shared broadly with the community. 


**(b)**Algorithm optimization towards real-time classification that can be used by low performance edge-computing devices.

## **Bonus**
**(c)** Incorporation of more classes and the ability to recognize anomalies that are outside of the 12 classes of InfraredSolarModules.


We will be focusing on Solar photovoltaic (PV) datasets. Here's something we'd like you to know before you start:

We would like you to pre-process, visualize, perform an exploratory data analysis. Correlation analysis and anything you can think of are welcomed.




If you have any questions, feel free to reach out to **@Tannistha** :)

## **Data downloads and requirements**:

We will use datasets from:
https://github.com/RaptorMaps/InfraredSolarModules


You are encouraged to use Python and its libraries for this challenge. For evaluation, we recommend numpy, pandas, matplotlib and seaborn for part a and Keras or PyTorch for part b. Please submit your work as one single .ipynb (recommended) or .py file.
Please attach your model file that has weight (pkl) and a explain how to use it with new dataset.

### Link to previous work is available on 

*   InfraRed Thermography:

https://www.mdpi.com/1424-8220/20/4/1055

https://www.mdpi.com/1424-8220/20/4/1055
*   Other relevant links



http://arxiv.org/abs/1807.02894

https://ai4earthscience.github.io/iclr-2020-workshop/papers/ai4earth22.pdf

https://onlinelibrary.wiley.com/doi/abs/10.1002/pip.3191

https://www.sciencedirect.com/science/article/abs/pii/S0038092X20308665


In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.image as img
import numpy as np
import json
import os
import random

In [4]:
os.listdir('./')

['.ipynb_checkpoints',
 'Data_exploration.ipynb',
 'holdout_set.docx',
 'holdout_set.txt',
 'images',
 'module_metadata.json',
 'starter_notebook.ipynb']

In [2]:
with open("./module_metadata.json", 'r') as read_file:
    data = json.load(read_file)

In [3]:
df_files = pd.DataFrame(data)
df_files.head()

Unnamed: 0,13357,13356,19719,11542,11543,11540,11541,11546,11547,11544,...,8483,8484,8485,8486,8487,8488,8489,7464,18065,13354
image_filepath,images/13357.jpg,images/13356.jpg,images/19719.jpg,images/11542.jpg,images/11543.jpg,images/11540.jpg,images/11541.jpg,images/11546.jpg,images/11547.jpg,images/11544.jpg,...,images/8483.jpg,images/8484.jpg,images/8485.jpg,images/8486.jpg,images/8487.jpg,images/8488.jpg,images/8489.jpg,images/7464.jpg,images/18065.jpg,images/13354.jpg
anomaly_class,No-Anomaly,No-Anomaly,No-Anomaly,No-Anomaly,No-Anomaly,No-Anomaly,No-Anomaly,No-Anomaly,No-Anomaly,No-Anomaly,...,Vegetation,Vegetation,Vegetation,Vegetation,Vegetation,Vegetation,Vegetation,Cracking,No-Anomaly,No-Anomaly


In [63]:
def loadFiles(data):
    new_df = []
    for file in data.iloc[0]:
        img_data = img.imread(file)
        new_df.append(img_data / 255)
    return np.array(new_df)

In [30]:
image_shape = (40, 24, 1)

In [64]:
X = loadFiles(df_files)
X

array([[[0.55294118, 0.57647059, 0.61568627, ..., 0.60392157,
         0.58039216, 0.55294118],
        [0.55686275, 0.58431373, 0.62352941, ..., 0.59215686,
         0.56862745, 0.54509804],
        [0.56078431, 0.58823529, 0.63137255, ..., 0.60392157,
         0.58431373, 0.56470588],
        ...,
        [0.54509804, 0.54509804, 0.57647059, ..., 0.58039216,
         0.57254902, 0.54509804],
        [0.50980392, 0.51372549, 0.54117647, ..., 0.53333333,
         0.5254902 , 0.49411765],
        [0.45490196, 0.45490196, 0.4745098 , ..., 0.45882353,
         0.44705882, 0.41568627]],

       [[0.38823529, 0.37254902, 0.35686275, ..., 0.64705882,
         0.64705882, 0.64313725],
        [0.40392157, 0.39607843, 0.38823529, ..., 0.62745098,
         0.64313725, 0.65098039],
        [0.45882353, 0.45098039, 0.45098039, ..., 0.6       ,
         0.62352941, 0.63921569],
        ...,
        [0.56862745, 0.57254902, 0.58431373, ..., 0.61176471,
         0.58823529, 0.56862745],
        [0.5

In [65]:
y = df_files.iloc[1]
pd.DataFrame(y)
y.to_numpy()

array(['No-Anomaly', 'No-Anomaly', 'No-Anomaly', ..., 'Cracking',
       'No-Anomaly', 'No-Anomaly'], dtype=object)

In [50]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.utils import to_categorical

In [45]:
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split

In [46]:
le = LabelEncoder()
le.fit(['No-Anomaly', 'Cell', 'Cell-Multi', 'Cracking', 'Diode', 'Diode-Multi', 'Hot-Spot', 'Hot-Spot-Multi', 'Offline-Module', 'Shadowing', 'Soiling', 'Vegetation'])
print(list(le.classes_))
le.transform(y)

['Cell', 'Cell-Multi', 'Cracking', 'Diode', 'Diode-Multi', 'Hot-Spot', 'Hot-Spot-Multi', 'No-Anomaly', 'Offline-Module', 'Shadowing', 'Soiling', 'Vegetation']


array([7, 7, 7, ..., 2, 7, 7])

In [74]:
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

In [100]:
print(y_train_binary)

[[0. 0. 0. ... 1. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]


In [77]:
img_rows, img_cols = 40, 24
num_classes = 12

#x_train = x_train_df.to_numpy()
#x_test = x_test_df.to_numpy()

x_train_reshaped = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
x_test_reshaped = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)

y_train_binary = to_categorical(le.transform(y_train), num_classes)
y_test_binary = to_categorical(le.transform(y_test), num_classes)

In [27]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dropout, Flatten, Dense, Conv2D, MaxPooling2D

def classificationModel(filters_list, nodes_list, conv_filter_size, pool_filter_size, activation_list, dropout):
    
    model = Sequential()
    model.add(Conv2D(filters_list[0], conv_filter_size, activation=activation_list[0], input_shape=image_shape))
    model.add(MaxPooling2D(pool_size=pool_filter_size))
    model.add(Conv2D(filters_list[1], conv_filter_size, activation=activation_list[0]))
    model.add(MaxPooling2D(pool_size=pool_filter_size))
    model.add(Conv2D(filters_list[2], conv_filter_size, activation=activation_list[0]))
    model.add(MaxPooling2D(pool_size=pool_filter_size))
    model.add(Flatten())
    model.add(Dense(nodes_list[0], activation=activation_list[0]))
    model.add(Dropout(dropout))
    model.add(Dense(nodes_list[1], activation=activation_list[1]))
    
    return model
    

In [106]:
model = classificationModel([32, 64, 64], [128, num_classes], 3, 2, ['relu', 'softmax'], 0.5)
model.summary()

Model: "sequential_12"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_32 (Conv2D)           (None, 38, 22, 32)        320       
_________________________________________________________________
max_pooling2d_30 (MaxPooling (None, 19, 11, 32)        0         
_________________________________________________________________
conv2d_33 (Conv2D)           (None, 17, 9, 64)         18496     
_________________________________________________________________
max_pooling2d_31 (MaxPooling (None, 8, 4, 64)          0         
_________________________________________________________________
conv2d_34 (Conv2D)           (None, 6, 2, 64)          36928     
_________________________________________________________________
max_pooling2d_32 (MaxPooling (None, 3, 1, 64)          0         
_________________________________________________________________
flatten_10 (Flatten)         (None, 192)             

In [107]:
model.compile(optimizer='SGD', loss='categorical_crossentropy', metrics=['AUC'])

In [90]:
batch_size = 128
epochs = 4

In [108]:
model.fit(x_train_reshaped, y_train_binary, batch_size, epochs, validation_data=(x_test_reshaped, y_test_binary))
loss = model.evaluate(x_test_reshaped, y_test_binary)
results = model.predict(x_test_reshaped)

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


In [115]:
print("Loss: {0}\nAccuracy: {1}".format(loss[0], loss[1]))
print(results)

Loss: 1.8042711019515991
Accuracy: 0.8113715648651123
[[0.07989099 0.06077064 0.04673268 ... 0.04653941 0.01359702 0.07895671]
 [0.08239485 0.06302927 0.04895558 ... 0.04951662 0.01527982 0.08178192]
 [0.08148136 0.06337512 0.04975862 ... 0.04948734 0.01561604 0.07957096]
 ...
 [0.08316119 0.06413703 0.04990875 ... 0.05047848 0.0161427  0.08220628]
 [0.08049724 0.06123699 0.04744691 ... 0.04746706 0.01404742 0.07959191]
 [0.08200482 0.06318575 0.04918591 ... 0.04971274 0.01541614 0.08139557]]


In [112]:
from sklearn.metrics import roc_auc_score

In [116]:
score = roc_auc_score(y_test_binary, results, multi_class='ovo')
print(score)

0.3925239420309559


In [None]:
def print_roc_curve()