# Task 1. Open Images Class Hierarchy

Task is to do:

Create a parser to read class hierarchy files and a data structure to efficiently perform the
following operations.
- Find all siblings class of a class name
- Find the parent class of a class name
- Find all ancestor classes of a class name
- Find if both class 1 and class 2 belong to the same ancestor class(es)

You are provided with two files:
 - Oidv6-class-descriptions.csv : each line consists of a mid and its corresponding class name
 - Bbox_labels_600_hierarchy: class hierarchy

In [4]:
###importing all the necessary libraries

import pandas as pd
import json

In [4]:
###reading the input files providede by Customindz:

class_description = pd.read_csv("C:\\Users\\\dias_\\Desktop\\Customindz\\task1\\oidv6-class-descriptions.csv")
hierarchy = json.loads(open("C:\\Users\\dias_\\Desktop\\Customindz\\task1\\bbox_labels_600_hierarchy.json", 'r').read())

In [5]:
###inspecting input files:

print(class_description.shape)
class_description.head()

(19994, 2)


Unnamed: 0,LabelName,DisplayName
0,/m/0100nhbf,Sprenger's tulip
1,/m/0104x9kv,Vinegret
2,/m/0105jzwx,Dabu-dabu
3,/m/0105ld7g,Pistachio ice cream
4,/m/0105lxy5,Woku


In [6]:
### we could observe that the csv file has 19994 unique values for LabelNames and 
### a few duplicated for DisplayName. Therefore, LabelName column will be a unique key of classes

class_description['DisplayName'].value_counts()

Lilac             2
Gopher snake      2
Powder            2
Friendship day    2
Anole             2
                 ..
Golf equipment    1
Honda cr-z        1
Rally cap         1
Glider            1
Baptism           1
Name: DisplayName, Length: 19982, dtype: int64

In [8]:
### according to our solution, we decided to recursively search over nested dictionary.
### this will take O(n) time

class Parser:
    
    def __init__(self, hierarchy_file_path, class_description_path):
        
        self.hierarchy = json.loads(open(hierarchy_file_path, 'r').read())
        self.class_description = pd.read_csv(class_description_path)
        self.siblings = []
        self.ancestors = []
        self.sibling_ancestor_pairs = []
        

### the main algorithm used for searching:
    def recursive_search(self, class_name, dictionary, parent_class=None, ancestors = None):
        
        for key, value in dictionary.items():
            if (key == 'LabelName') & (value == class_name):
                for sibling in parent_class['Subcategory']:
                    self.siblings.append(sibling['LabelName'])
                ancestors = str(self.ancestors)
                self.sibling_ancestor_pairs.append([self.siblings, ancestors])
                self.siblings = []
        
            if (key == 'Subcategory'):
                self.ancestors.append(dictionary['LabelName'])
                for subcategories in dictionary['Subcategory']:
                    self.recursive_search(class_name, subcategories, dictionary, self.ancestors)
                self.ancestors.pop()
                    
                
### a) finding all siblings class of a class name
    def finding_all_siblings(self, class_DisplayName):
        
        ### identifying the LabelName from DisplayName via given dataframe
        ### according to paper specifications, we should handle exceptions:
        try:
            LabelName = self.class_description['LabelName'][self.class_description['DisplayName'] == class_DisplayName]
        except:
            print("Given DisplayName is not included in provided csv file")

        ### as there might be many label names having the same display name:
        ### which we found during analysis of dataframe
        for i in range(len(LabelName)):
            
            ### we have to empty our arrays during each cycle
            self.siblings = []; self.ancestors = []; self.sibling_ancestor_pairs = []
            class_name = LabelName.iloc[i]
            self.recursive_search(class_name, dictionary = self.hierarchy)
            
            try:
                for sib in self.sibling_ancestor_pairs:
    
                    ### for your checking purposes I decided to write out my results:
                    with open('siblings.txt', 'w+') as file:
                        string = "Siblings of " + class_DisplayName + " are: "
                        file.write(string)
                        for classes in sib[0]:
                            DisplayName = self.class_description['DisplayName'][self.class_description['LabelName'] == classes]
                            string2 = DisplayName.iloc[0] + " "
                            file.write(string2)
                        file.write("\n")
            except:
                print("Error with hierarchy json file, given class_name was not found")
    

### b) finding the parent class of a class name
    def finding_parent_class(self, class_DisplayName):
        
        ### this function uses almost the same logic (therefore doesnt need detailed commenting) 
        ### as the "finding_all_siblings" function:
        try:
            LabelName = self.class_description['LabelName'][self.class_description['DisplayName'] == class_DisplayName]
        except:
            print("Given DisplayName is not included in provided csv file")
        
        for i in range(len(LabelName)):
            self.siblings = []; self.ancestors = []; self.sibling_ancestor_pairs = []
            class_name = LabelName.iloc[i]
            self.recursive_search(class_name, dictionary = self.hierarchy)
            num_branches = len(self.sibling_ancestor_pairs)
            string1 = "There are " + str(num_branches) + " branches for " + class_DisplayName + ": \n"
            file = open("parents.txt", 'w+')
            file.write(string1)
            branch = 1
            try:
                for i in self.sibling_ancestor_pairs:
                    temp_arr = i[-1].split("'")
                    ancestors = []
                    for i in range(len(temp_arr)):
                        if i%2 == 0:
                            ancestors.append(temp_arr[i-1])
                    parent = ancestors[-1]
                    Parent_display = self.class_description['DisplayName'][self.class_description['LabelName'] == parent]
                    string = "Parents of branch number " + str(branch) + ": " + Parent_display.iloc[0] + "\n"
                    file.write(string)
                    branch += 1
            except:
                print("Error with hierarchy json file, given class_name was not found")


### c) finding all ancestor classes of a class name
    def finding_all_ancestors(self, class_DisplayName):
        
        try:
            LabelName = self.class_description['LabelName'][self.class_description['DisplayName'] == class_DisplayName]
        except:
            print("Given DisplayName is not included in provided csv file")
        
        for i in range(len(LabelName)):
            self.siblings = []; self.ancestors = []; self.sibling_ancestor_pairs = []
            class_name = LabelName.iloc[i]
            self.recursive_search(class_name, dictionary = self.hierarchy)
            num_branches = len(self.sibling_ancestor_pairs)
            string1 = "There are " + str(num_branches) + " branches for " + class_DisplayName + ": \n"
            file = open("ancestors.txt", 'w+')
            file.write(string1)
            branch = 1
            try:
                for i in self.sibling_ancestor_pairs:
                    temp_arr = i[-1].split("'")
                    ancestors = []
                    for i in range(len(temp_arr)):
                        if i%2 == 0:
                            ancestors.append(temp_arr[i-1])
                    ancestor = ancestors[1:]
                    for i in range(len(ancestor)):
                        ancestor_display = self.class_description['DisplayName'][self.class_description['LabelName'] == ancestor[i]]
                        try:
                            ancestor[i] = ancestor_display.iloc[0]
                        except:
                            pass
                    string = "Ancestors of branch number " + str(branch) + ": "
                    for i in ancestor:
                        string = string + i + " "
                    file.write(string); file.write("\n")
                    branch += 1
            except:
                print("Error with hierarchy json file, given class_name was not found")


### d) finding if both class 1 and class 2 belong to the same ancestor class(es)
    def comparing_two_classes(self, class_DisplayName1, class_DisplayName2):
        
        try:
            LabelNames = [self.class_description['LabelName'][self.class_description['DisplayName'] == class_DisplayName1].iloc[0],
                          self.class_description['LabelName'][self.class_description['DisplayName'] == class_DisplayName2].iloc[0]]
        except:
            print("Given DisplayNames are not included in the provided csv-file")

        array1 = []; array2 = [];
        for label_num in range(len(LabelNames)):
            self.siblings = []; self.ancestors = []; self.sibling_ancestor_pairs = []
            self.recursive_search(LabelNames[label_num], dictionary = self.hierarchy)
            num_branches = len(self.sibling_ancestor_pairs)
            branch = 1
            try:
                for i in self.sibling_ancestor_pairs:
                    temp_arr = i[-1].split("'")
                    ancestors = []
                    for i in range(len(temp_arr)):
                        if i%2 == 0: ancestors.append(temp_arr[i-1])
                    ancestor = ancestors[1:]
                    if label_num == 0: array1.append(ancestor)
                    else: array2.append(ancestor)
                    branch += 1
            except:
                print("Error with hierarchy json file, given class_name was not found")

        file = open("comparing_classess.txt", 'a+')
        for i in range(len(array1)):
            for j in range(len(array2)):
                if array1[i]==array2[j]:
                    string = "There is a similarity at {}th position of {} and {}th position of {}\n".format(i, class_DisplayName1, j, class_DisplayName2)
                    file.write(string)



---
For verification purposes, we tested all the functions. The results are provided as txt_files.
"Oyster" and "Lobster" were used for testings.


In [20]:
path_of_json_file = "C:\\Users\\dias_\\Desktop\\Customindz\\task1\\bbox_labels_600_hierarchy.json"
path_of_csv_file = "C:\\Users\\\dias_\\Desktop\\Customindz\\task1\\oidv6-class-descriptions.csv"

### constructing our object:
parser = Parser(hierarchy_file_path = path_of_json_file, class_description_path = path_of_csv_file)

In [21]:
### testing out every function:
parser.finding_all_siblings('Oyster')
parser.finding_parent_class('Oyster')
parser.finding_all_ancestors('Oyster')
parser.comparing_two_classes('Oyster', 'Lobster')

For results verification purposes, you could refer to the attached txt files.