# Progress Report

### Introduction to Data & Data Collection:

The project is based on Image denoising which is the process of taking away noise from an image. Information will be lost if there is an increase in noise. A digital camera's sensor illumination levels, damaged electrical circuits from heat, faulty memory locations in hardware, or bit errors during data transmission over long distances are a few examples of how noise can occur. Noise is when extra unnecessary pixel values are added to an image, resulting in the loss of information. We have obtained a dataset from University of California Berkeley in order to denoise the image.

The dataset we are studying is found at,

Data-https://github.com/BIDS/BSDS500 

This Dataset  consists of 500 natural images.

We plan to answer the following questions-
1. How can we remove blind noise from images?
2. How can we boost picture processing efficiency?
3. How can we enhance photos taken in poor light?
4. How do we restore old photos/documents?

### Any changes:
The scope has not changed since the check-in proposal slides.

Noises are present in the real-world photos that are photographed. These noises may emerge for a variety of causes, including unstable electric signals, broken camera sensors, dim illumination, lost data during long-distance data transmission, etc. As a result of noise, the original pixel values are sometimes replaced by random values, which can reduce the quality of the acquired image and result in information loss. Therefore, when it comes to low-level vision tasks and image processing, it is necessary to eliminate these noises from pictures.


### Data:
For our problem set,we need clean images, first we collected high quality images from random sources -
1. Smartphone Image Denoising Dataset (SIDD)
2. Real Low-Light Image Noise Reduction Dataset (RENOIR)
3. NIND -Natural Image Noise Dataset
4. University of california berkeley Segmentation Data Set and Benchmarks 500 (BSDS500)

There were four distinct kinds of datasets gathered; the BSDS500 dataset had 500 natural photos, all of which were of good quality. Of the 500 photos, we divided them into 400 train images and 100 test images.then we create patches for each image,We will split each of these images into small patches.because, splitting images into patches and using these patches for training improve model performance in denoising.

Then also the images in the dataset are of different sizes,we resize each image in the dataset to equal sizes for each image, patches of 40*40 pixels, 40-stride height, and various crop sizes are created.
Since there are 500 total pictures in the dataset, there will be around 85000 patches for training and 21000 patches for the data.


### Exploratory data analysis:

In [None]:
import os
import pandas as pd
import cv2
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
from google.colab.patches import cv2_imshow as imshow
import random
import numpy as np
import math
import matplotlib.image as mpimg
import glob
from patchify import patchify, unpatchify
from google.colab import drive
from sklearn.model_selection import train_test_split
import pickle
from tqdm import tqdm
import tensorflow as tf
from tensorflow.keras.callbacks import LearningRateScheduler,ReduceLROnPlateau
from tensorflow.keras import models, layers
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Conv2D, Conv2DTranspose, BatchNormalization, Activation, Flatten, Dense, Input, MaxPooling2D, Add, Reshape, concatenate, AveragePooling2D, Multiply, GlobalAveragePooling2D, UpSampling2D, MaxPool2D,Softmax
from tensorflow.keras.activations import softmax
from tensorflow.keras import initializers, regularizers
from tensorflow.keras.optimizers import Adam
import skimage.color
import skimage.io
import imageio as iio
from skimage.metrics import peak_signal_noise_ratio as psnr
from skimage.metrics import structural_similarity as ssim
from PIL import Image


In [None]:
def add_gaussian_blur(data):
    dst = cv2.GaussianBlur(data, (35, 35), cv2.BORDER_DEFAULT)
    return dst

def add_gaussian_noise(data):
    mean = (10, 10, 10)
    std = (50, 50, 50)
    row, col, channel = data.shape
    noise = np.random.normal(mean, std, (row, col, channel)).astype('uint8')
    return data + noise
def patches(img,patch_size):
    patches = patchify(img, (patch_size, patch_size, 3), step=patch_size)
    return patches

# adds salt and pepper noise 
def add_salt_pepper_noise(data, p=0.05):
    rows, columns, channels = data.shape
    output = np.zeros(data.shape, np.uint8)
    for i in range(rows):
    for j in range(columns):
        r = np.random.random()
        if r < p/2:
            output[i][j] = [0, 0, 0]
        elif r > p/2 and r <= p:
            output[i][j] = [255, 255, 255]
        else:
            output[i][j] = data[i][j]
    return output

def load_images_from_folder(folder):
    images = []
    sizes=[]
    fname=[]
    for filename in os.listdir(folder):
        fname.append(filename)
        img = cv2.imread(os.path.join(folder,filename))
        if img is not None:
            images.append(img)
            sizes.append(img.shape)
    return fname,images,sizes

In [None]:
f,k,s= load_images_from_folder('/content/train/')
df = pd.DataFrame()
df['filename']=f[:200]
noised_data=[]

df['ground_truth']=k
for i in k:
    g=add_gaussian_blur(i)
    n=add_gaussian_noise(g)
    noised_data.append(n)

df['noised_images']=noised_data
df['size']=s
df['size'].astype(str)
df[3][2]

In [None]:
fig = plt.figure(figsize = ( 400, 200))
df.shape
y = list(df['size'].value_counts())
print("y",y)
x1 = df['size'].value_counts().index.tolist()
x=x1[1]
x=x[:2]
print("x",x)
plt.bar(x,y)
plt.title("Images vs Size")
plt.xlabel("Size of images")
plt.ylabel("No. of images")

In [None]:
img = mpimg.imread('/content/train/'+df.iloc[3][0])
fig, axes = plt.subplots(1,2,figsize=(16, 16))
axes[0].imshow(df.iloc[3][1])
axes[0].set_title('Ground Truth Image')
axes[1].imshow(df.iloc[3][2])
axes[1].set_title('Noisy Image')
plt.show()

In [None]:
#Creating patches for Ground Truth Image
random_gt_path = df.iloc[3][0]
img = cv2.imread('/content/train/'+random_gt_path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
print('Image shape: {}'.format(img.shape))
patches_gt = patches(img,50)
print('Patch shape: {}'.format(patches_gt.shape))

In [None]:
#Creating patches for a Noisy Image
img = df.iloc[3][2]
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
print('Image shape: {}'.format(img.shape))

patches_nsy = patches(img,50)
print('Patch shape: {}'.format(patches_nsy.shape))

In [None]:
rows = patches_nsy.shape[0]
cols = patches_nsy.shape[1]
fig, axs = plt.subplots(rows,cols,figsize=(20,10))
for i in range(rows):
    for j in range(cols):
        axs[i][j].imshow(patches_gt[i][j][0])
        axs[i][j].get_xaxis().set_visible(False)
        axs[i][j].get_yaxis().set_visible(False)

In [None]:
fig, axs = plt.subplots(1,5,figsize=(20,10))
r = random.sample(range(0, rows), 5)
c = random.sample(range(0, cols), 5)
fig.suptitle('Train Image Patches',fontweight ="bold")
for i in range(5):
    axs[i].imshow(patches_gt[r[i]][c[i]][0])
    axs[i].set_title('Ground Truth Image Patches')

In [None]:
fig, axs = plt.subplots(1,5,figsize=(20,10))
for i in range(5):
    axs[i].imshow(patches_nsy[r[i]][c[i]][0])
    axs[i].set_title('Noisy Image Patches')

In [None]:
mean_red_gt = []
mean_blue_gt = []
mean_green_gt = []
mean_red_nsy = []
mean_blue_nsy = []
mean_green_nsy = []
for i in df['ground_truth']:
    img = i
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    mean_red_gt.append(np.mean(img[:,:,0]))
    mean_green_gt.append(np.mean(img[:,:,1]))
    mean_blue_gt.append(np.mean(img[:,:,2]))

for j in df['noised_images']:
    img = j
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    mean_red_nsy.append(np.mean(img[:,:,0]))
    mean_green_nsy.append(np.mean(img[:,:,1]))
    mean_blue_nsy.append(np.mean(img[:,:,2]))

In [None]:
red_gt = pd.DataFrame()
green_gt = pd.DataFrame()
blue_gt = pd.DataFrame()
red_nsy = pd.DataFrame()
green_nsy = pd.DataFrame()
blue_nsy = pd.DataFrame()

red_gt['Mean Pixel on Ground Truth Images'] = mean_red_gt
red_gt['channel'] = 'red'
red_nsy['Mean Pixel on  Noisy Images'] = mean_red_nsy
red_nsy['channel'] = 'red'

green_gt['Mean Pixel on Ground Truth Images'] = mean_green_gt
green_gt['channel'] = 'green'
green_nsy['Mean Pixel on  Noisy Images'] = mean_green_nsy
green_nsy['channel'] = 'green'

blue_gt['Mean Pixel on Ground Truth Images'] = mean_blue_gt
blue_gt['channel'] = 'blue'
blue_nsy['Mean Pixel on  Noisy Images'] = mean_blue_nsy
blue_nsy['channel'] = 'blue'

concat_gt = pd.concat([red_gt,green_gt,blue_gt],ignore_index=True)
concat_nsy = pd.concat([red_nsy,green_nsy,blue_nsy],ignore_index=True)

In [None]:
color = {'color': ['r', 'g', 'b']}
sns.FacetGrid(concat_gt,hue='channel',size=5,hue_kws=color).map(sns.distplot,'Mean Pixel on Ground Truth Images',hist=False).add_legend()

In [None]:
sns.FacetGrid(concat_nsy,hue='channel',size=5,hue_kws=color).map(sns.distplot,'Mean Pixel on  Noisy Images',hist=False).add_legend()

In [None]:
fig, axes = plt.subplots(3,2,figsize=(16, 16))
fig.suptitle("Ground Truth Images", fontsize = 'x-large' , fontweight = 'bold' )
sns.histplot(mean_red_gt,ax=axes[0][0],color='r')
sns.distplot(mean_red_gt,ax=axes[0][1],hist=False,color='r')
axes[0][0].set_xlabel('Mean Pixels')
axes[0][1].set_xlabel('Mean Pixels')

sns.histplot(mean_green_gt,ax=axes[1][0],color='g')
sns.distplot(mean_green_gt,ax=axes[1][1],hist=False,color='g')
axes[1][0].set_xlabel('Mean Pixels')
axes[1][1].set_xlabel('Mean Pixels')

sns.histplot(mean_blue_gt,ax=axes[2][0],color='b')
sns.distplot(mean_blue_gt,ax=axes[2][1],hist=False,color='b')
axes[2][0].set_xlabel('Mean Pixels')
axes[2][1].set_xlabel('Mean Pixels')

In [None]:
fig, axes = plt.subplots(3,2,figsize=(16, 16))
fig.suptitle("Noisy Images", fontsize = 'x-large' , fontweight = 'bold' )
sns.histplot(mean_red_nsy,ax=axes[0][0],color='r')
sns.distplot(mean_red_nsy,ax=axes[0][1],hist=False,color='r')
axes[0][0].set_xlabel('Mean Pixels')
axes[0][1].set_xlabel('Mean Pixels')

sns.histplot(mean_green_nsy,ax=axes[1][0],color='g')
sns.distplot(mean_green_nsy,ax=axes[1][1],hist=False,color='g')
axes[1][0].set_xlabel('Mean Pixels')
axes[1][1].set_xlabel('Mean Pixels')

sns.histplot(mean_blue_nsy,ax=axes[2][0],color='b')
sns.distplot(mean_blue_nsy,ax=axes[2][1],hist=False,color='b')
axes[2][0].set_xlabel('Mean Pixels')
axes[2][1].set_xlabel('Mean Pixels')

In [None]:
sample_ground_truth=[df.iloc[9][1],df.iloc[99][1],df.iloc[198][1]]
sample_noisy=[df.iloc[9][2],df.iloc[99][2],df.iloc[198][2]]
fig, axes = plt.subplots(len(sample_ground_truth),3,figsize=(20, 20))
for i in range(len(sample_ground_truth)):
    img = sample_ground_truth[i]
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    resized_img = cv2.resize(img,(512,512))
    axes[i][0].imshow(resized_img)
    axes[i][0].set_title('Ground Truth Image')
    axes[i][1].plot(cv2.calcHist([img],[0],None,[256],[0,256]),color='r')
    axes[i][1].plot(cv2.calcHist([img],[1],None,[256],[0,256]),color='g')
    axes[i][1].plot(cv2.calcHist([img],[2],None,[256],[0,256]),color='b')
    axes[i][1].set_title('Ground Truth Image Histogram')

    img = sample_noisy[i]
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    axes[i][2].plot(cv2.calcHist([img],[0],None,[256],[0,256]),color='r')
    axes[i][2].plot(cv2.calcHist([img],[1],None,[256],[0,256]),color='g')
    axes[i][2].plot(cv2.calcHist([img],[2],None,[256],[0,256]),color='b')
    axes[i][2].set_title('Noisy Image Histogram')

In [None]:
SSIM = [];PSNR = [];
for i in tqdm(range(len(df))):
    img1 = df.iloc[i][1]
    img1 = img1.astype("float32") / 255.0
    img2 = df.iloc[i][2]
    img2 = img2.astype("float32") / 255.0
    SSIM.append(ssim(img1,img2,multichannel=True,data_range=img2.max() - img2.min()))
    PSNR.append(psnr(img1,img2))

In [None]:
ax = sns.displot(PSNR,kind='kde')
ax.set(xlabel='PSNR', ylabel='Density')
ax = sns.displot(PSNR)
ax.set(xlabel='PSNR', ylabel='Count')

In [None]:
ax = sns.displot(SSIM,kind='kde')
ax.set(xlabel='SSIM', ylabel='Density')
ax = sns.displot(SSIM)
ax.set(xlabel='SSIM', ylabel='Count')

### Visualization:

### ML analysis:

### Reflection:

The most challenging part of the project was collecting the dataset as initially we found images from different datasets from various sources. In a few datasets, images were taken from mobile phones while few other datasets contained images taken from DSLR cameras and also had limited images. Inorder to get higher accuracy we had to choose BSDS500 as it consisted of more than 500 images.
In the preliminary analysis we observed that we need more images in order to get more accuracy from our noisy images. Also when the level of noise is too high, the model fails to provide good results. 

Are there any concrete results you can show at this point? If not, why not?


Our biggest problems continue to be hardware, as well as figuring out appropriate algorithms to conduct analyses with.
We believe we are on track to finish this project by the end of the semester,and we are having a plan to build a web application of the model and this is definitely worth proceeding with.

Given your initial exploration of the data, is it worth proceeding with your project, why? If not, how will you move forward (method, data etc)?



### Next steps:
Going forward, we're looking to denoise the images with higher accuracy. Ideally, we're planning on completing one or two a week, which should allow us to hit the deadline.