### Introduction

This dataset contains 3,200+ images of different gemstones. The images are grouped into 87 classes which are already divided into train and test data. All images are in various sizes and are in .jpeg format.
As for gemstones I tried to include them in various shapes - round, oval, square, rectangle, heart.

This dataset is composed of two folders:

train:
This file contains 87 subfolders and ~2,800 files in total. Each subfolder contains .jpeg images of different gemstones.

test: This file contains 87 subfolders and ~400 files in total. Each subfolder contains .jpeg images of different gemstones


### Import libraries

In [7]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [4]:
!pip install rembg
!pip install onnxruntime --upgrade

Collecting onnxruntime
  Downloading onnxruntime-1.20.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (4.5 kB)
Collecting coloredlogs (from onnxruntime)
  Downloading coloredlogs-15.0.1-py2.py3-none-any.whl.metadata (12 kB)
Collecting humanfriendly>=9.1 (from coloredlogs->onnxruntime)
  Downloading humanfriendly-10.0-py2.py3-none-any.whl.metadata (9.2 kB)
Downloading onnxruntime-1.20.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (13.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.3/13.3 MB[0m [31m81.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading coloredlogs-15.0.1-py2.py3-none-any.whl (46 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m46.0/46.0 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading humanfriendly-10.0-py2.py3-none-any.whl (86 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.8/86.8 kB[0m [31m10.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected pac

In [5]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
import os
from rembg import remove
import cv2
from random import randint


In [41]:
# dataset directory

directory = '/content/drive/MyDrive/Mestrado/2024_2/Qualification/gems/train'
#printing all the gemstone categories present in our dataset
Name=[]
for file in os.listdir(directory):
    Name+=[file]
print("The gemsstones in the dataset are \n")
print(Name)
print("\n The count of the gemstones categories: ",len(Name))

The gemsstones in the dataset are 

['Diamond', 'Danburite', 'Fluorite', 'Emerald', 'Dumortierite', 'Diaspore']

 The count of the gemstones categories:  6


#### Map and display all the categories present in our dataset. There are total 87 different kinds of gemstones.

In [42]:
gems_map = dict(zip(Name, [t for t in range(len(Name))]))
print(gems_map)
r_gems_map=dict(zip([t for t in range(len(Name))],Name))

{'Diamond': 0, 'Danburite': 1, 'Fluorite': 2, 'Emerald': 3, 'Dumortierite': 4, 'Diaspore': 5}


In [43]:
img_w, img_h = 256, 256


#### Create functions to read images and labels of gemstones from the training dataset.

In [45]:
import os
import cv2
import numpy as np

# Function which reads images and class names
def read_images():
    Images, Labels = [], []

    for root, dirs, files in os.walk('/content/drive/MyDrive/Mestrado/2024_2/Qualification/gems/train'):
        f = os.path.basename(root)
        for file in files:
            Labels.append(f)
            try:
                # Caminho completo do arquivo atual
                image_path = os.path.join(root, file)

                # Carrega a imagem
                image = cv2.imread(image_path)              # Read the image (OpenCV)
                image = cv2.resize(image, (int(img_w), int(img_h)))  # Resize the image
                image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)  # Convert to RGB

                # Aplica a transformação
                processed_image = remove(image_rgb)

                # Salva a imagem processada no mesmo local
                cv2.imwrite(image_path, cv2.cvtColor(processed_image, cv2.COLOR_BGR2RGB))  # Save back in BGR format

                Images.append(processed_image)  # Adiciona a imagem processada à lista

            except Exception as e:
                print(f"Error processing file {file}: {e}")
            print(file)

    Images = np.array(Images)
    return (Images, Labels)


In [46]:
#function which converts string labels to numbers
def get_class_index(Labels):
    for i, n in enumerate(Labels):
        for j, k in enumerate(Name):
            if n == k:
                Labels[i] = j
    Labels = np.array(Labels)
    return Labels

In [47]:
# Read the images and labels from the training set

Train_Imgs, Train_Lbls = read_images()
Train_Lbls = get_class_index(Train_Lbls)
print('Shape of train images: {}'.format(Train_Imgs.shape))
print('Shape of train labels: {}'.format(Train_Lbls.shape))

diamond_33.jpg
diamond_5.jpg
diamond_6.jpg
diamond_7.jpg
diamond_8.jpg
diamond_34.jpg
diamond_4.jpg
diamond_17.jpg
diamond_16.jpg
diamond_19.jpg
diamond_22.jpg
diamond_23.jpg
diamond_24.jpg
diamond_25.jpg
diamond_32.jpg
diamond_11.jpg
diamond_10.jpg
diamond_13.jpg
diamond_12.jpg
diamond_15.jpg
diamond_21.jpg
diamond_26.jpg
diamond_20.jpg
diamond_27.jpg
diamond_29.jpg
diamond_30.jpg
diamond_31.jpg
diamond_14.jpg
diamond_0.jpg
diamond_2.jpg
diamond_1.jpg
danburite_27.jpg
danburite_21.jpg
danburite_15.jpg
danburite_12.jpg
danburite_14.jpg
danburite_11.jpg
danburite_0.jpg
danburite_1.jpg
danburite_10.jpg
danburite_16.jpg
danburite_17.jpg
danburite_13.jpg
danburite_24.jpg
danburite_23.jpg
danburite_26.jpg
danburite_30.jpg
danburite_7.jpg
danburite_5.jpg
danburite_6.jpg
danburite_4.jpg
danburite_29.jpg
danburite_22.jpg
danburite_19.jpg
danburite_2.jpg
danburite_20.jpg
danburite_25.jpg
danburite_35.jpg
danburite_31.jpg
danburite_32.jpg
danburite_34.jpg
danburite_9.jpg
danburite_33.jpg
fluorit

### Visualization