## Image Quality using OpenCV

#### Reference for image quality definition

**Image quality was measured based on the followinig question: "How is the image quality/resolution?". The responses were recoded as "High" (1), "Medium" (2), "Low" (3), "Poor" (4). Image quality has been considered a useful measure of overall engagement in terms of number of comments and likes [4]. By following recent studies, we treated image quality as a discrete variable.**

There are two types of image quality:

### **1. Reference based image quality**

Avaliable metrics: Mean squared error (MSE), Root mean squared error (RMSE), Peak signal to noise ratio (PSNR), Structural similarity index (SSI)

### **2. Non reference based image quality** (Focus of this script)

Avaliable metrics:

#### (1) Sharpness

Uses difference of differences in grayscale values of a median-filtered image (${\triangle}$DoM) as an indicator of edge sharpness

Reference: 

Kumar, J., Chen, F., & Doermann, D. (2012, November). Sharpness estimation for document and scene images. In Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012) (pp. 3292-3295). IEEE.

Used library GitHub Page: https://github.com/umang-singhal/pydom

**Note: 0 <= Sharpness Score <= sqrt(2) (~1.414)** The higher the sharpness score, the better the image quality

#### (2)  Blind/referenceless image spatial quality evaluator (BRISQUE)

BRISQUE score is computed using a support vector regression (SVR) model trained on an image database with corresponding differential mean opinion score (DMOS) values

Reference:

Mittal, A., Moorthy, A. K., & Bovik, A. C. (2012). No-reference image quality assessment in the spatial domain. IEEE Transactions on image processing, 21(12), 4695-4708.

Used library GitHub Page: https://github.com/ocampor/image-quality

Alternative library GitHub Page: https://github.com/rehanguha/brisque

**Note:** The higher the BRISQUE score, the WORSE the image quality

### Setup

In [1]:
# OpenCV library
!pip install opencv-contrib-python

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [2]:
# For calculating sharpness
!pip install git+https://github.com/umang-singhal/pydom.git

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting git+https://github.com/umang-singhal/pydom.git
  Cloning https://github.com/umang-singhal/pydom.git to /tmp/pip-req-build-tl57pwre
  Running command git clone --filter=blob:none --quiet https://github.com/umang-singhal/pydom.git /tmp/pip-req-build-tl57pwre
  Resolved https://github.com/umang-singhal/pydom.git to commit 2554af8d08a80658539f002eae58ece89cbcc6d4
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: pydom
  Building wheel for pydom (setup.py) ... [?25l[?25hdone
  Created wheel for pydom: filename=pydom-0.1-py3-none-any.whl size=18001 sha256=fc1512e414cd71c79c20686663ed0bd276e2272116100e4aaebaa36f9bd41774
  Stored in directory: /tmp/pip-ephem-wheel-cache-hkhz_7fl/wheels/6f/9c/11/81bbf3cf51629251092cf22fbd204299795c73b2c15b9b6d25
Successfully built pydom
Installing collected packages: pydom
Successfully installed pydom-0.1


In [3]:
# For calculating BRISQUE score
!pip install image-quality
!pip install brisque

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting image-quality
  Downloading image_quality-1.2.7-py3-none-any.whl (146 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m146.6/146.6 KB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
Collecting libsvm>=3.23.0
  Downloading libsvm-3.23.0.4.tar.gz (170 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m170.6/170.6 KB[0m [31m16.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: libsvm
  Building wheel for libsvm (setup.py) ... [?25l[?25hdone
  Created wheel for libsvm: filename=libsvm-3.23.0.4-cp39-cp39-linux_x86_64.whl size=253862 sha256=66307854b46a1cfe195b4c27e5b08f4c60205c96d89dcf7dd5638ade20b654aa
  Stored in directory: /root/.cache/pip/wheels/c1/ce/25/0d50035499973fcbcc407fcb897d53e47b6eb4601308789aa6
Successfully built libsvm
Installing collected packages: libsvm, 

In [4]:
!pip install parmap

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting parmap
  Downloading parmap-1.6.0-py2.py3-none-any.whl (12 kB)
Installing collected packages: parmap
Successfully installed parmap-1.6.0


In [5]:
from brisque import BRISQUE
import numpy as np
import pandas as pd
import cv2
import matplotlib.pyplot as plt
%matplotlib inline  
# if you are running this code in Jupyter notebook
import os
import glob
import math
from collections import Counter
from multiprocessing import Process
from time import sleep
from random import random
import random
import warnings
from dom import DOM
import imquality.brisque as brisque
import PIL.Image
#from brisque import BRISQUE
import parmap
import multiprocessing
warnings.filterwarnings('ignore')

In [6]:
from joblib import Parallel, delayed
from os import listdir
from os.path import isfile, join
import shutil
#onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]

In [7]:
# For parallel purpose
cpu_count = multiprocessing.cpu_count()

In [12]:
# Read in csv that have all post metadata
ins_profile_data = pd.read_csv(f'/content/new.csv')

In [14]:
len(ins_profile_data)

53013

In [None]:
# Read in posts that were not scraped (possibly due to deleted post, post not available anymore, etc)
final_not_scraped_post = pd.read_csv(f'../../SA_Instagram/phantom_collection/not_scraped_URL_6th.csv')

In [None]:
final_not_scraped_post

Unnamed: 0,postUrl
0,https://www.instagram.com/p/CX1pDFMvcTu/
1,https://www.instagram.com/p/CXedQjPFwBm/
2,https://www.instagram.com/p/Ce7P3VBIhKu/
3,https://www.instagram.com/p/CeXThkZIh2d/
4,https://www.instagram.com/p/ClRRu8Qt09u/
5,https://www.instagram.com/p/Cj5jUaVLYXF/
6,https://www.instagram.com/p/CjnmjugL5yn/
7,https://www.instagram.com/p/CiSnA7IBmpS/
8,https://www.instagram.com/p/CiBAKUGhvT8/
9,https://www.instagram.com/p/ChpbUmtBGoB/


In [None]:
def get_image_name(imgUrl):
    '''Get the image name of the post based on imgUrl'''
    image_name = imgUrl.split('/')[-1].split('?')[0]
    return image_name

In [15]:
ins_profile_data['image_name']

0        100947353_709902596450712_6272631665784288093_...
1        101005572_246374363479652_2970192058962596638_...
2        101434842_2669739969937705_4387298007395419894...
3        102552647_112678700284390_2273291108529974386_...
4        103091894_620685288805037_5175208616444185610_...
                               ...                        
53008    95982167_136347404659412_2947517883731741745_n...
53009    96512853_155809976000943_8076515679898206064_n...
53010    96535561_239436547291755_7611420616627866921_n...
53011    96574629_282240836242812_5998586339255429370_n...
53012    70057059_2197125367065655_6964911436064038025_...
Name: image_name, Length: 53013, dtype: object

In [19]:
ins_profile_data['image_name'][0]

'100947353_709902596450712_6272631665784288093_n.jpg'

In [None]:
# Remove posts that were not scraped
ins_profile_data = ins_profile_data[~ins_profile_data['postUrl'].isin(list(final_not_scraped_post.postUrl.unique()))]

In [None]:
len(ins_profile_data)

53005

In [16]:
ins_profile_data.columns

Index(['index', 'image_name', 'image_storage_URL', 'ocr_text',
       'ocr_text_bounding_box', 'has_text_ocr', 'image_tags', 'has_text_tag',
       'unique_tags', 'description_text', 'description_confidence',
       'category_name', 'category_score0'],
      dtype='object')

### Create a dataframe that list all images and their associated accounts

In [None]:
image_df = ins_profile_data[['postUrl', 'imgUrl', 'username','image_name']]
image_df.head()

Unnamed: 0,postUrl,imgUrl,username,image_name
0,https://www.instagram.com/p/CiwFx5VreXl/,https://scontent-lga3-1.cdninstagram.com/v/t51...,calgarycasa,307691913_488615476157775_1769517694690430621_...
1,https://www.instagram.com/p/Ck9o6B6J2sW/,https://scontent-lga3-1.cdninstagram.com/v/t51...,calgarycasa,315433945_654113342886787_3268897495009490845_...
2,https://www.instagram.com/p/ClH56Q4pIQe/,https://scontent-lga3-1.cdninstagram.com/v/t51...,calgarycasa,316001466_1195597701364466_2346899986306992260...
3,https://www.instagram.com/p/ClMvfksvfv8/,https://scontent-lga3-1.cdninstagram.com/v/t51...,calgarycasa,315214169_525785036070377_1929267117663433673_...
4,https://www.instagram.com/p/ClKiKxSJpJj/,https://scontent-lga3-1.cdninstagram.com/v/t51...,calgarycasa,315995484_8292977360775576_5729175343490777045...


In [None]:
# image_df = pd.DataFrame()
# # Create a dataframe that list all files under each account folder
# for i in range(0, len(account_folder)):
#     # Locate the user (Change the folder name if needed)
#     user_image_df = pd.DataFrame()
#     username = account_folder[i]
#     user_folder = f'../../SA_Instagram/Images/profile_post_img/{username}/'
#     # List all images of the account
#     all_user_image = [f for f in listdir(user_folder) if isfile(join(user_folder, f))]
#     user_image_df['image_name'] = all_user_image
#     user_image_df['username'] = username
#     image_df = pd.concat([image_df, user_image_df], ignore_index= True)

### Randomly Select 2 images from each account to check the image qualities (75 accounts, 150 images in total)

In [None]:
random_image_df = image_df.groupby('username').sample(n=2, random_state= 2333).reset_index(drop=True)

In [None]:
# image_folder = f'../../SA_Instagram/Images/profile_post_img/'

# # Get all the account name associated with the folder
# account_folder = [name for name in os.listdir(image_folder) if os.path.isdir(os.path.join(image_folder, name))]
# account_folder

# os.listdir(f'../../SA_Instagram/Images/profile_post_img/adsumforwomen/')

# random_image_df = pd.DataFrame()
# # Create a dataframe that list all files under each account folder
# for i in range(0, len(account_folder)):
#     # Locate the user (Change the folder name if needed)
#     user_image_df = pd.DataFrame()
#     username = account_folder[i]
#     user_folder = f'../../SA_Instagram/Images/profile_post_img/{username}/'
#     # Randomly Choose 2 images from each account without repitition
#     random_user_image = random.sample(os.listdir(f'../../SA_Instagram/Images/profile_post_img/{username}/'), 2)
#     #all_user_image = [f for f in listdir(user_folder) if isfile(join(user_folder, f))]
#     user_image_df['image_name'] = random_user_image
#     user_image_df['username'] = username
#     random_image_df = pd.concat([random_image_df, user_image_df], ignore_index= True)

In [None]:
random_image_df.head()

Unnamed: 0,postUrl,imgUrl,username,image_name
0,https://www.instagram.com/p/Bw7VMDVH82Q/,https://scontent-lga3-1.cdninstagram.com/v/t51...,adsumforwomen,57606673_496046250931971_103575814176400476_n.jpg
1,https://www.instagram.com/p/B1M_y6HnAVN/,https://scontent-lga3-1.cdninstagram.com/v/t51...,adsumforwomen,69126123_176761633361055_5634377825757748469_n...
2,https://www.instagram.com/p/CLNcm5dhioK/,https://scontent-lga3-1.cdninstagram.com/v/t51...,amelia.rising.svsc,148714075_489555962209107_6503052471705524818_...
3,https://www.instagram.com/p/CLE6LiyhXu3/,https://scontent-lga3-1.cdninstagram.com/v/t51...,amelia.rising.svsc,147439471_136979898276801_4057767448963803683_...
4,https://www.instagram.com/p/CjB53hRJEV2/,https://scontent-iad3-2.cdninstagram.com/v/t51...,anndavissociety,309403210_197918892656899_6247647267140408121_...


In [None]:
random_image_df[random_image_df.image_name.str.contains('webp')]

Unnamed: 0,postUrl,imgUrl,username,image_name
4,https://www.instagram.com/p/CjB53hRJEV2/,https://scontent-iad3-2.cdninstagram.com/v/t51...,anndavissociety,309403210_197918892656899_6247647267140408121_...
80,https://www.instagram.com/p/Cjq5D85LOzt/,https://scontent-yyz1-1.cdninstagram.com/v/t51...,maisonhina,311430153_199403495779111_4479968023173172539_...
130,https://www.instagram.com/p/CfwgujDOjbW/,https://instagram.fhio3-1.fna.fbcdn.net/v/t51....,thirdplaceth,292325973_619230373159458_4818080377625759112_...


### Parallel the sharpness and BRISQUE calculation

In [None]:
# To parallelize the sharpness and BRISQUE calculation, create file names in list format
random_image_df.image_name.iloc[0]

'57606673_496046250931971_103575814176400476_n.jpg'

In [None]:
# Get list of images for parallel process
image_location_list = []
for i in range(0, len(random_image_df)):
    username = random_image_df.username.iloc[i]
    image_name = random_image_df.image_name.iloc[i]
    image_location = f'../../SA_Instagram/Images/profile_post_img/{username}/{image_name}'
    image_location_list.append(image_location)

### Sharpness

In [20]:
def calculate_sharpness(image_location):
    '''Calculate the sharpness of the image'''
    iqa = DOM()
    
    sharpness_score = iqa.get_sharpness(image_location)
    return sharpness_score

In [None]:
# sharpness_score_list = []
# for i in range(0, len(image_location_list)):
#     sharpness_score = calculate_sharpness(image_location_list[i])
#     sharpness_score_list.append(sharpness_score)

In [None]:
sharpness_score_result = Parallel(n_jobs=cpu_count)(delayed(calculate_sharpness)(i) for i in image_location_list)

In [None]:
random_image_df['sharpness'] = sharpness_score_result

### BRISQUE

In [17]:
def calculate_BRISQUE(image_location):
    '''Calculate the BRISQUE score of the image'''
    img = PIL.Image.open(image_location)
    brisque_score = brisque.score(img)
    return brisque_score

In [18]:
def calculate_BRISQUE_alt(image_location):
    '''Calculate the BRISQUE score of the image using another package'''
    img = cv2.imread(image_location)
    obj = BRISQUE(url = False)
    brisque_score_alt = obj.score(img)
    return brisque_score_alt

In [None]:
%%time
# demo
calculate_BRISQUE('../../SA_Instagram/Images/profile_post_img/adsumforwomen/57606673_496046250931971_103575814176400476_n.jpg')

Wall time: 7.51 s


15.791576122748296

In [None]:
%%time
calculate_BRISQUE_alt('../../SA_Instagram/Images/profile_post_img/adsumforwomen/57606673_496046250931971_103575814176400476_n.jpg')

Wall time: 946 ms


16.767552431506004

In [None]:
%%time
brisque_score_result = Parallel(n_jobs=cpu_count)(delayed(calculate_BRISQUE)(i) for i in image_location_list)

Wall time: 5min 5s


In [None]:
%%time
brisque_score_result_alt = Parallel(n_jobs=cpu_count)(delayed(calculate_BRISQUE_alt)(i) for i in image_location_list)

Wall time: 30.4 s


In [None]:
random_image_df['BRISQUE'] = brisque_score_result
random_image_df['BRISQUE_alt'] = brisque_score_result_alt

In [None]:
random_image_df.head()

Unnamed: 0,postUrl,imgUrl,username,image_name,sharpness,BRISQUE,BRISQUE_alt
0,https://www.instagram.com/p/Bw7VMDVH82Q/,https://scontent-lga3-1.cdninstagram.com/v/t51...,adsumforwomen,57606673_496046250931971_103575814176400476_n.jpg,0.974528,15.791576,16.767552
1,https://www.instagram.com/p/B1M_y6HnAVN/,https://scontent-lga3-1.cdninstagram.com/v/t51...,adsumforwomen,69126123_176761633361055_5634377825757748469_n...,1.047816,13.501934,13.261451
2,https://www.instagram.com/p/CLNcm5dhioK/,https://scontent-lga3-1.cdninstagram.com/v/t51...,amelia.rising.svsc,148714075_489555962209107_6503052471705524818_...,1.04166,28.173285,28.669672
3,https://www.instagram.com/p/CLE6LiyhXu3/,https://scontent-lga3-1.cdninstagram.com/v/t51...,amelia.rising.svsc,147439471_136979898276801_4057767448963803683_...,1.002779,52.629991,51.603742
4,https://www.instagram.com/p/CjB53hRJEV2/,https://scontent-iad3-2.cdninstagram.com/v/t51...,anndavissociety,309403210_197918892656899_6247647267140408121_...,1.108666,43.406689,45.91061


In [None]:
random_image_df.to_csv(f'../../SA_Instagram/data/image_quality_explore/image_quality_explore.csv', index= False)

### Copy the randomly selected images for reference

In [None]:
for filename in image_location_list:
    shutil.copy(filename, f'../../SA_Instagram/data/image_quality_explore/')