# __Project presentation__

Here we present our end-of-the-course project for [Data Science](https://www.unibo.it/en/study/phd-professional-masters-specialisation-schools-and-other-programmes/course-unit-catalogue/course-unit/2023/467046), a module of the integrated course in  Computational Management of Data (I.C.), aa 2023/2024, [DhDk](https://corsi.unibo.it/2cycle/DigitalHumanitiesKnowledge) unibo.

<div class="alert alert-block alert-info">
<ul><i>Collaborators:</i> <br>
<li>Hubert Krzywonos - hubert.krzywonos@studio.unibo.it</li>
<li>Mohamed Iheb Ouerghi - mohamediheb.ouerghi@studio.unibo.it</li>
<li>Giorgia Umana - giorgia.umana@studio.unibo.it</li>
<li>Lucrezia Pograri - lucrezia.pograri@studio.unibo.it</li></ul></div>

Here you can find the [link](https://github.com/comp-management-data-project/DataScience-DHDK-gp24) to the GitHub repository of this project.

The Python file named `impl.py` contains all the scripts necessary to function the program. This impl.py file serves as the implementation module where classes, functions, and methods are defined to perform operations on the source data that need to be managed.

In [2]:
from impl import *

## `Cultural Objects` classes

The following classes provide a structured way to represent various types of cultural heritage objects and their attributes within a Python-based system.

<ul>
<li>IdentifiableEntity (superclass):<br>
This class represents an entity that can be identified uniquely by an ID. It serves as the base class for other identifiable entities in the system.</li>
<li>Person (inherits from IdentifiableEntity):<br>
Person represents an individual with a name and an identifier. It inherits the identification functionality from IdentifiableEntity and adds a name attribute.</li>
<li>CulturalHeritageObject (inherits from IdentifiableEntity):<br>
CulturalHeritageObject represents an object of cultural heritage with attributes such as title, date, owner, place, and authors. It inherits the identification functionality from IdentifiableEntity and adds additional attributes specific to cultural heritage objects.</li>
<li>NauticalChart (inherits from CulturalHeritageObject):<br>
Represents a nautical chart, a subtype of CulturalHeritageObject.</li>
<li>ManuscriptPlate (inherits from CulturalHeritageObject):<br>
Represents a manuscript plate, a subtype of CulturalHeritageObject.</li>
<li>ManuscriptVolume (inherits from CulturalHeritageObject):<br>
Represents a manuscript volume, a subtype of CulturalHeritageObject.</li>
<li>PrintedVolume (inherits from CulturalHeritageObject):<br>
Represents a printed volume, a subtype of CulturalHeritageObject.</li>
<li>PrintedMaterial (inherits from CulturalHeritageObject):<br>
Represents a printed material, a subtype of CulturalHeritageObject.</li>
<li>Herbarium (inherits from CulturalHeritageObject):<br>
Represents a herbarium, a subtype of CulturalHeritageObject.</li>
<li>Specimen (inherits from CulturalHeritageObject):<br>
Represents a specimen, a subtype of CulturalHeritageObject.</li>
<li>Painting (inherits from CulturalHeritageObject):<br>
Represents a painting, a subtype of CulturalHeritageObject.</li>
<li>Model (inherits from CulturalHeritageObject):<br>
Represents a model, a subtype of CulturalHeritageObject.</li>
<li>Map (inherits from CulturalHeritageObject):<br>
Represents a map, a subtype of CulturalHeritageObject.</li>

Now, let's create instances of these classes using the provided CSV data. For brevity, we'll just demonstrate with a few objects

In [32]:
# Function to parse the CSV file and create instances of CHO classes

def create_instances_from_csv(file_path):
    instances = []
    with open(file_path, newline='', encoding='utf-8') as csvfile:
        reader = csv.DictReader(csvfile)
        for row in reader:
            id = row['Id']
            title = row['Title']
            date = row['Date']
            owner = row['Owner']
            place = row['Place']
            
            # Split author information and create Person instances
            authors = []
            author_names = row['Author'].split(';')
            author_ids = row['Author'].split('(')
            for name, author_id in zip(author_names, author_ids):
                authors.append(Person(author_id, name.strip()))
            
            # Determine object type and create the corresponding instance
            object_type = row['Type']
            if object_type == 'Nautical chart':
                instance = NauticalChart(id, title, date, authors, owner, place)
            elif object_type == 'Manuscript plate':
                instance = ManuscriptPlate(id, title, date, authors, owner, place)
            elif object_type == 'Manuscript volume':
                instance = ManuscriptVolume(id, title, date, authors, owner, place)
            elif object_type == 'Printed volume':
                instance = PrintedVolume(id, title, date, authors, owner, place)
            elif object_type == 'Printed material':
                instance = PrintedMaterial(id, title, date, authors, owner, place)
            elif object_type == 'Herbarium':
                instance = Herbarium(id, title, date, authors, owner, place)
            elif object_type == 'Specimen':
                instance = Specimen(id, title, date, authors, owner, place)
            elif object_type == 'Painting':
                instance = Painting(id, title, date, authors, owner, place)
            elif object_type == 'Model':
                instance = Model(id, title, date, authors, owner, place)
            elif object_type == 'Map':
                instance = Map(id, title, date, authors, owner, place)
            else:
                print(f"Unknown object type: {object_type}")
                continue
            
            instances.append(instance)
    
    return instances

file_path = os.path.join("resources", "meta.csv")

# Create instances from CSV file
instances = create_instances_from_csv(file_path)

# Display information for each instance
for instance in instances:
    print(f"{instance.__class__.__name__}:")
    print("Object ID:", instance.getId())
    print("Title:", instance.getTitle())
    print("Date:", instance.getDate())
    print("Authors:", [author.getName() for author in instance.getAuthors()])
    print("Owner:", instance.getOwner())
    print("Place:", instance.getPlace())
    print()

NauticalChart:
Object ID: 1
Title:  Nautical chart
Date: 1482
Authors: ['Benincasa, Grazioso (ULAN:500114874)']
Owner: BUB
Place: Bologna

PrintedVolume:
Object ID: 2
Title: The History of Plants
Date: 1497
Authors: ['Teofrasto (VIAF:265397758)']
Owner: BUB
Place: Bologna

Herbarium:
Object ID: 3
Title: On Medical Material
Date: 1523
Authors: ['Dioscorides Pedanius (VIAF:78822798)']
Owner: BUB
Place: Bologna

PrintedVolume:
Object ID: 4
Title: The Natural History
Date: 1519
Authors: ['Plinius Secundus, Gaius (VIAF:100219162)']
Owner: BUB
Place: Bologna

NauticalChart:
Object ID: 5
Title:  Incomplete coastal profile of the American continent
Date: 1500-1599
Authors: ['Agnese, Battista (ULAN:500048088)']
Owner: BUB
Place: Bologna

NauticalChart:
Object ID: 6
Title: Map of Cusco
Date: 1556
Authors: ['Ramusio, Giovanni Battista (VIAF:68943129)']
Owner: BUB
Place: Bologna

PrintedVolume:
Object ID: 7
Title: On the New World
Date: 1530
Authors: ["Anghiera, Pietro Martire d' (VIAF:68967770)"]

## `Activity` classes

These classes provide a framework for modeling and managing the different stages involved in the creation of a digital twin of physical cultural heritage objects, allowing for structured representation and tracking of activities, responsible individuals, tools used, and associated timelines.

<ul>
<li>Activity (superclass):<br>
Activity serves as a base class for different kinds of processes involved in creating a digital twin of physical cultural heritage objects. It contains attributes such as the responsible person, tools used, start and end dates of the activity, and a reference to the cultural heritage object it pertains to.</li>
<li>Acquisition (inherits from Activity):<br>
Acquisition represents the process of acquiring data or information related to a cultural heritage object. It inherits attributes and methods from Activity and adds a technique attribute to specify the acquisition technique used.</li>
<li>Processing (inherits from Activity):<br>
Processing represents the process of manipulating or transforming acquired data or information.</li> It inherits attributes and methods from Activity.</li>
<li>Modelling (inherits from Activity):<br>
Modelling represents the process of creating a digital model or representation of a physical cultural heritage object. It inherits attributes and methods from Activity.</li>
<li>Optimising (inherits from Activity):<br>
Optimising represents the process of optimizing or refining the digital twin or its components. It inherits attributes and methods from Activity.</li>
<li>Exporting (inherits from Activity):<br>
Exporting represents the process of exporting the digital twin or its components for various purposes such as preservation, presentation, or analysis. It inherits attributes and methods from Activity.</li>

In [33]:
# Load JSON data from file
file_path = os.path.join("resources", "process.json")
with open(file_path, "r") as file:
    json_data = json.load(file)

# Extract the first object from the JSON data
item = json_data[0]

# Common attributes
refersTo_cho = item['object id']
start_date = item['acquisition']['start date']
end_date = item['acquisition']['end date']

# Iterate over each activity type
for activity_type in ['acquisition', 'processing', 'modelling', 'optimising', 'exporting']:
    activity_data = item[activity_type]
    institute = activity_data['responsible institute']
    person = activity_data['responsible person']
    tools = set(activity_data.get('tool', []))

    if activity_type == 'acquisition':
        technique = activity_data.get('technique', '')
        activity = Acquisition(refersTo_cho, institute, person, start_date, end_date, technique, tools)
    else:
        activity = globals()[activity_type.capitalize()](refersTo_cho, institute, person, start_date, end_date, tools)

    print(f"Refers To CHO: {activity.getRefersTo_cho()}")
    print(f"Activity Type: {activity.__class__.__name__}")
    print(f"Institute: {activity.getResponsibleInstitute()}")
    print(f"Person: {activity.getResponsiblePerson()}")
    print(f"Tools: {activity.getTools()}")
    print(f"Start Date: {activity.getStartDate()}")
    print(f"End Date: {activity.getEndDate()}")
    print()


Refers To CHO: 1
Activity Type: Acquisition
Institute: Council
Person: Alice Liddell
Tools: {'Nikon D7200 Nikor 50mm'}
Start Date: 2023-05-08
End Date: 2023-05-08

Refers To CHO: 1
Activity Type: Processing
Institute: Council
Person: Alice Liddell
Tools: {'3DF Zephyr'}
Start Date: 2023-05-08
End Date: 2023-05-08

Refers To CHO: 1
Activity Type: Modelling
Institute: Philology
Person: Grace Hopper
Tools: {'Blender'}
Start Date: 2023-05-08
End Date: 2023-05-08

Refers To CHO: 1
Activity Type: Optimising
Institute: Philology
Person: Grace Hopper
Tools: {'Gimp', 'Instant Meshes', 'Blender'}
Start Date: 2023-05-08
End Date: 2023-05-08

Refers To CHO: 1
Activity Type: Exporting
Institute: Philology
Person: Grace Hopper
Tools: {'Blender'}
Start Date: 2023-05-08
End Date: 2023-05-08



## `Handler` classes

### `Upload Handler`

`ProcessData Upload Handler`

This class is responsible for handling the processing and uploading of data to SQLite.<br> 

The implementation includes six different methods: `__init__`, `process_data`, `map_object_ids`, `create_dataframes`, `pushDataToDb`, `handle_duplicates`.

In order to push information to the relational database, DataFrames are created for each activity type - acquisition, processing, modelling, optimising, exporting - and for the tools used, populating them with the relevant data parsing from the JSON file. The use of internal identifiers for the activities and for the tools and the mapping of object IDs intends to add clarity to the data organization.

The relational database is created using the related source data:

In [25]:
# # Instantiate the ProcessDataUploadHandler to create the relational database
# process_data_upload_handler = ProcessDataUploadHandler(db_name="relational.db")

# # Load JSON data and process them
# json_file_path = os.path.join("resources", "process.json")
# activity_dfs, tools_df = process_data_upload_handler.process_data(json_file_path)

# # Push the data to the database
# process_data_upload_handler.pushDataToDb(activity_dfs, tools_df)


rel_path = "relational.db"
process = ProcessDataUploadHandler()
process.setDbPathOrUrl(rel_path)
process.pushDataToDb(os.path.join("data", "process.json"))

True

`Metadata Upload Handler`


Then, the graph database is created using the related source data.<br>
It is important to first remember to run the Blazegraph instance.<br>
Furthermore, one could, in principle, push one or more files calling the method one or more times - even calling the method twice specifying the same file.

Terminal command to run Blazegraph:

In [None]:
''' java -server -Xmx1g -jar blazegraph.jar '''

In [7]:
grp_endpoint = "http://127.0.0.1:9999/blazegraph/sparql"
# metadata = MetadataUploadHandler(grp_endpoint)
# metadata.setDbPathOrUrl(grp_endpoint)

# csv_file_path = os.path.join("data", "meta.csv")

# metadata.pushDataToDb(csv_file_path)
metadata = MetadataUploadHandler()
metadata.setDbPathOrUrl(grp_endpoint)
metadata.pushDataToDb("data/meta.csv")

True

### `Query Handler`

In the next passage, the query handlers are created for both the databases, using the related classes.

In [10]:
process_qh = ProcessDataQueryHandler()
rel_path = "relational.db"
process_qh.setDbPathOrUrl(rel_path)

metadata_qh = MetadataQueryHandler()
metadata_qh.setDbPathOrUrl(grp_endpoint)

True

`ProcessData Query Handler`

This class provides different methods which return dataframes in order to perform queries using Pandas library on the SQLite database.

In [20]:
# Example usages of query methods

# Instantiate ProcessDataQueryHandler
sql_query_handler = ProcessDataQueryHandler(dbPathOrUrl="relational.db")

In [21]:
# Example 1: Get all activities
all_activities_df = sql_query_handler.getAllActivities()
print("All Activities:")
all_activities_df

All Activities:


Unnamed: 0,Activity_internal_id,Refers To,Responsible Institute,Responsible Person,Technique,Start Date,End Date,Tool
0,Acquisition-01,CH Object-1,Council,Alice Liddell,Photogrammetry,2023-05-08,2023-05-08,Nikon D7200 Nikor 50mm
1,Acquisition-05,CH Object-5,Council,Alice Liddell,Photogrammetry,2023-03-04,2023-03-04,Nikon D7200 Nikor 35mm
2,Acquisition-06,CH Object-6,Council,Alice Liddell,Photogrammetry,2023-03-04,2023-03-04,Nikon D7200 Nikor 35mm
3,Acquisition-07,CH Object-7,Council,Alice Liddell,Photogrammetry,2023-03-04,2023-03-04,Nikon D7200 Nikor 35mm
4,Acquisition-08,CH Object-8,Council,Alice Liddell,Photogrammetry,2023-03-04,2023-03-04,Nikon D7200 Nikor 35mm
...,...,...,...,...,...,...,...,...
197,Processing-30,CH Object-30,Philology,Grace Hopper,,2023-11-05,2023-11-05,3DF Zephyr
198,Processing-32,CH Object-32,Philology,Grace Hopper,,2023-11-05,2023-11-05,3DF Zephyr
199,Processing-33,CH Object-33,Council,Leonardo da Pisa,,2023-05-15,2023-05-15,3DF Zephyr
200,Processing-34,CH Object-34,Engineering,Emily Bronte,,2023-05-22,2023-08-29,Artec Studio 16


In [13]:
# Example 2: Get activities by responsible institution
responsible_institution = "Heritage"
activities_by_institution_df = sql_query_handler.getActivitiesByResponsibleInstitution(partialName=responsible_institution)
print(f"Activities by responsible institution '{responsible_institution}':")
activities_by_institution_df

Activities by responsible institution 'Heritage':


Unnamed: 0,Activity_internal_id,Refers To,Responsible Institute,Responsible Person,Technique,Start Date,End Date,Tool
0,Acquisition-10,CH Object-10,Heritage,Ada Lovelace,Structured-light 3D scanner,2023-03-04,2023-03-04,Artec EVA
1,Acquisition-28,CH Object-28,Heritage,Ada Lovelace,Structured-light 3D scanner,2023-03-04,2023-03-04,Artec EVA
2,Acquisition-29,CH Object-29,Heritage,Gretel Grimm,Photogrammetry,2023-03-04,2023-03-04,Nikon D750
3,Exporting-10,CH Object-10,Heritage,Ada Lovelace,,2023-06-22,2023-06-22,Adobe Photoshop 2023
4,Exporting-10,CH Object-10,Heritage,Ada Lovelace,,2023-06-22,2023-06-22,Artec Studio 15
5,Exporting-28,CH Object-28,Heritage,Ada Lovelace,,2023-06-16,2023-06-16,Adobe Photoshop 2023
6,Exporting-28,CH Object-28,Heritage,Ada Lovelace,,2023-06-16,2023-06-16,Artec Studio 15
7,Exporting-29,CH Object-29,Heritage,Gretel Grimm,,2023-04-13,2023-04-13,Metashape
8,Modelling-10,CH Object-10,Heritage,Ada Lovelace,,2023-06-22,2023-06-22,Artec Studio 15
9,Modelling-28,CH Object-28,Heritage,Ada Lovelace,,2023-03-04,2023-03-04,Artec Studio 15


In [40]:
# Example 3: Get activities by responsible person
responsible_person = "Gretel Grim"
activities_by_person_df = sql_query_handler.getActivitiesByResponsiblePerson(partialName=responsible_person)
print(f"Activities by responsible person '{responsible_person}':")
activities_by_person_df

Activities by responsible person 'Gretel Grim':


Unnamed: 0,Activity_internal_id,Refers To,Responsible Institute,Responsible Person,Technique,Start Date,End Date
0,Acquisition-29,CH Object-29,Heritage,Gretel Grimm,Photogrammetry,2023-03-04,2023-03-04
1,Exporting-29,CH Object-29,Heritage,Gretel Grimm,,2023-04-13,2023-04-13
2,Optimising-29,CH Object-29,Heritage,Gretel Grimm,,2023-04-13,2023-04-13
3,Processing-29,CH Object-29,Heritage,Gretel Grimm,,2023-04-04,2023-04-04


In [41]:
# Example 4: Get activities using a specific tool
tool_name = "Blender"
activities_using_tool_df = sql_query_handler.getActivitiesUsingTool(partialName=tool_name)
print(f"Activities using tool '{tool_name}':")
activities_using_tool_df

Activities using tool 'Blender':


Unnamed: 0,Activity_internal_id,Refers To,Responsible Institute,Responsible Person,Technique,Start Date,End Date
0,Exporting-01,CH Object-1,Philology,Grace Hopper,,2023-06-07,2023-06-07
1,Exporting-02,CH Object-2,Philology,Grace Hopper,,2023-07-26,2023-07-26
2,Exporting-03,CH Object-3,Philology,Grace Hopper,,2023-07-14,2023-07-14
3,Exporting-04,CH Object-4,Philology,Grace Hopper,,2023-11-07,2023-11-07
4,Exporting-05,CH Object-5,Philology,Grace Hopper,,2023-07-24,2023-07-24
...,...,...,...,...,...,...,...
56,Optimising-05,CH Object-5,Philology,Grace Hopper,,2023-07-21,2023-07-24
57,Optimising-06,CH Object-6,Philology,Grace Hopper,,2023-07-27,2023-07-27
58,Optimising-07,CH Object-7,Philology,Grace Hopper,,2023-07-17,2023-07-18
59,Optimising-08,CH Object-8,Philology,Grace Hopper,,2023-07-19,2023-07-19


In [42]:
# Example 5: Get activities started after a specific date
start_date = "2023-08-21"
activities_started_after_df = sql_query_handler.getActivitiesStartedAfter(date=start_date)
print(f"Activities started after '{start_date}':")
activities_started_after_df

Activities started after '2023-08-21':


Unnamed: 0,Activity_internal_id,Refers To,Responsible Institute,Responsible Person,Technique,Start Date,End Date
0,Exporting-04,CH Object-4,Philology,Grace Hopper,,2023-11-07,2023-11-07
1,Exporting-15,CH Object-15,Philology,Grace Hopper,,2023-09-10,2023-09-10
2,Exporting-17,CH Object-17,Philology,Grace Hopper,,2023-09-19,2023-09-19
3,Exporting-18,CH Object-18,Philology,Grace Hopper,,2023-11-09,2023-11-09
4,Exporting-19,CH Object-19,Philology,Grace Hopper,,2023-10-10,2023-10-10
5,Exporting-20,CH Object-20,Philology,Grace Hopper,,2023-08-31,2023-08-31
6,Exporting-21,CH Object-21,Philology,Grace Hopper,,2023-11-10,2023-11-10
7,Exporting-22,CH Object-22,Philology,Grace Hopper,,2023-08-23,2023-08-23
8,Exporting-23,CH Object-23,Philology,Grace Hopper,,2023-10-24,
9,Exporting-24,CH Object-24,Philology,Grace Hopper,,2023-09-15,2023-09-15


In [43]:
# Example 6: Get activities ended before a specific date
end_date = "2023-09-19"
activities_ended_before_df = sql_query_handler.getActivitiesEndedBefore(date=end_date)
print(f"Activities ended before '{end_date}':")
activities_ended_before_df

Activities ended before '2023-09-19':


Unnamed: 0,Activity_internal_id,Refers To,Responsible Institute,Responsible Person,Technique,Start Date,End Date
0,Acquisition-01,CH Object-1,Council,Alice Liddell,Photogrammetry,2023-05-08,2023-05-08
1,Acquisition-02,CH Object-2,Council,Jane Doe,Photogrammetry,2023-04-17,2023-04-17
2,Acquisition-03,CH Object-3,Council,Jane Doe,Photogrammetry,2023-04-17,2023-04-17
3,Acquisition-04,CH Object-4,Council,Jane Doe,Photogrammetry,2023-04-17,2023-04-17
4,Acquisition-05,CH Object-5,Council,Alice Liddell,Photogrammetry,2023-03-04,2023-03-04
...,...,...,...,...,...,...,...
145,Processing-29,CH Object-29,Heritage,Gretel Grimm,,2023-04-04,2023-04-04
146,Processing-31,CH Object-31,,,,,
147,Processing-33,CH Object-33,Council,Leonardo da Pisa,,2023-05-15,2023-05-15
148,Processing-34,CH Object-34,Engineering,Emily Bronte,,2023-05-22,2023-08-29


In [44]:
# Example 7: Get acquisitions by technique
technique_name = "Structured-light 3D scanner"
acquisitions_by_technique_df = sql_query_handler.getAcquisitionsByTechnique(partialName=technique_name)
print(f"Acquisitions by technique '{technique_name}':")
acquisitions_by_technique_df

Acquisitions by technique 'Structured-light 3D scanner':


Unnamed: 0,Activity_internal_id,Refers To,Responsible Institute,Responsible Person,Technique,Start Date,End Date
0,Acquisition-10,CH Object-10,Heritage,Ada Lovelace,Structured-light 3D scanner,2023-03-04,2023-03-04
1,Acquisition-28,CH Object-28,Heritage,Ada Lovelace,Structured-light 3D scanner,2023-03-04,2023-03-04
2,Acquisition-34,CH Object-34,Engineering,Grazia Deledda,Structured-light 3D scanner,2023-05-29,2023-05-29


`Metadata Query Handler`

In [22]:
# Example usages of query methods

# Instantiate MetadataQueryHandler
# sparql_query_handler = MetadataQueryHandler(dbPathOrUrl="http://192.168.1.90:9999/blazegraph/")

sparql_query_handler = MetadataUploadHandler()
sparql_query_handler.setDbPathOrUrl(grp_endpoint)
sparql_query_handler.pushDataToDb("data/meta.csv")

True

In [46]:
# problem with indexing byte values as strings in a certain part of your code. 
# It appears to be related to the method execute_sparql_query

In [23]:
# Example 1: Get all people
all_people_df = sparql_query_handler.getAllPeople()
print("All People:")
all_people_df

AttributeError: 'MetadataUploadHandler' object has no attribute 'getAllPeople'

In [None]:
# Example 2: Get all chos
all_people_df = sparql_query_handler.getAllCulturalHeritageObjects()
print("All Cultural Heritage Objects:")
all_people_df

Unexpected exception formatting exception. Falling back to standard exception


Traceback (most recent call last):
  File "C:\Users\Lucrezia\AppData\Roaming\Python\Python311\site-packages\IPython\core\interactiveshell.py", line 3553, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "C:\Users\Lucrezia\AppData\Local\Temp\ipykernel_19044\318338794.py", line 2, in <module>
    all_people_df = sparql_query_handler.getAllCulturalHeritageObjects()
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\Lucrezia\OneDrive\Documenti\GitHub\ciao_a_tutti\impl.py", line 622, in getAllCulturalHeritageObjects
    df = df.reset_index(drop=True);
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\Lucrezia\OneDrive\Documenti\GitHub\ciao_a_tutti\impl.py", line 655, in execute_sparql_query
    return self.execute_sparql_query(query)
                 ^^^^^^^^^^^
TypeError: byte indices must be integers or slices, not str

During handling of the above exception, another exception occurred:

Traceback (most recent call last):


In [None]:
# Example 3: Get authors of cho
all_people_df = sparql_query_handler.getAuthorsOfCulturalHeritageObject()
print("Authors of cultural heritage object:")
all_people_df

In [None]:
# Example 4: Get cho authored by
all_people_df = sparql_query_handler.getCulturalHeritageObjectsAuthoredBy()
print("Cultural objects authored by ' ':")
all_people_df

## `Mashup` classes

### `Basic Mashup`

The method `createActivityList` within the BasicMashup class is reused to implement the methods related to the Activity class. It takes a dataframe <df> as input and processes it to create a list of activities based on certain conditions.<br>

The method `createObjectList` acts as a function that takes a DataFrame <cho_df> as input and iterates over each row of the DataFrame. It is essentially converting rows from the DataFrame into a list of objects, where each object corresponds to a row in the DataFrame, with attributes populated from the respective columns of the DataFrame. It is designed to handle the different subclasses of `CulturalHeritageObject` superclass (NauticalChart, ManuscriptPlate, ManuscriptVolume, etc.).

### `Advanced Mashup`

Finally, the team created an Advanced Mashup object for asking about data.

In [None]:
mashup = AdvancedMashup()
mashup.addProcessHandler(process_qh)
mashup.addMetadataHandler(metadata_qh)

result_q1 = mashup.getAllActivities()
result_q2 = mashup.getAuthorsOfCulturalHeritageObject("1")
result_q3 = mashup.getAuthorsOfObjectsAcquiredInTimeFrame("2023-04-01", "2023-05-01")
# etc...

# Tests

In [None]:
import unittest
from os import sep
from pandas import DataFrame
from impl import MetadataUploadHandler, ProcessDataUploadHandler
from impl import MetadataQueryHandler, ProcessDataQueryHandler
from impl import AdvancedMashup
from impl import Person, CulturalHeritageObject, Activity, Acquisition

REMEMBER: before launching the tests, please run the Blazegraph instance.

Terminal command to run the tests:

In [None]:
''' python -m unittest test '''

In [None]:
class TestProjectBasic(unittest.TestCase):

    # The paths of the files used in the test should change depending on what you want to use
    # and the folder where they are. Instead, for the graph database, the URL to talk with
    # the SPARQL endpoint must be updated depending on how you launch it - currently, it is
    # specified the URL introduced during the course, which is the one used for a standard
    # launch of the database.
    metadata = "resources" + sep + "meta.csv"
    process = "resources" + sep + "process.json"
    relational = "." + sep + "relational.db"
    graph = "http://127.0.0.1:9999/blazegraph/sparql"

    def test_01_MetadataUploadHandler(self):
        u = MetadataUploadHandler()
        self.assertTrue(u.setDbPathOrUrl(self.graph))
        self.assertEqual(u.getDbPathOrUrl(), self.graph)
        self.assertTrue(u.pushDataToDb(self.metadata))

    def test_02_ProcessDataUploadHandler(self):
        u = ProcessDataUploadHandler()
        self.assertTrue(u.setDbPathOrUrl(self.relational))
        self.assertEqual(u.getDbPathOrUrl(), self.relational)
        self.assertTrue(u.pushDataToDb(self.process))
    
    def test_03_MetadataQueryHandler(self):
        q = MetadataQueryHandler()
        self.assertTrue(q.setDbPathOrUrl(self.graph))
        self.assertEqual(q.getDbPathOrUrl(), self.graph)

        self.assertIsInstance(q.getById("just_a_test"), DataFrame)

        self.assertIsInstance(q.getAllPeople(), DataFrame)
        self.assertIsInstance(q.getAllCulturalHeritageObjects(), DataFrame)
        self.assertIsInstance(q.getAuthorsOfCulturalHeritageObject("just_a_test"), DataFrame)
        self.assertIsInstance(q.getCulturalHeritageObjectsAuthoredBy(
            "just_a_test"), DataFrame)
    
    def test_04_ProcessDataQueryHandler(self):
        q = ProcessDataQueryHandler()
        self.assertTrue(q.setDbPathOrUrl(self.relational))
        self.assertEqual(q.getDbPathOrUrl(), self.relational)

        self.assertIsInstance(q.getById("just_a_test"), DataFrame)

        self.assertIsInstance(q.getAllActivities(), DataFrame)
        self.assertIsInstance(q.getActivitiesByResponsibleInstitution(
            "just_a_test"), DataFrame)
        self.assertIsInstance(q.getActivitiesByResponsiblePerson("just_a_test"), DataFrame)
        self.assertIsInstance(q.getActivitiesUsingTool("just_a_test"), DataFrame)
        self.assertIsInstance(q.getActivitiesStartedAfter("1088-01-01"), DataFrame)
        self.assertIsInstance(q.getActivitiesEndedBefore("2029-01-01"), DataFrame)
        self.assertIsInstance(q.getAcquisitionsByTechnique("just_a_test"), DataFrame)
        
    def test_05_AdvancedMashup(self):
        qm = MetadataQueryHandler()
        qm.setDbPathOrUrl(self.graph)
        qp = ProcessDataQueryHandler()
        qp.setDbPathOrUrl(self.relational)

        am = AdvancedMashup()
        self.assertIsInstance(am.cleanMetadataHandlers(), bool)
        self.assertIsInstance(am.cleanProcessHandlers(), bool)
        self.assertTrue(am.addMetadataHandler(qm))
        self.assertTrue(am.addProcessHandler(qp))

        self.assertEqual(am.getEntityById("just_a_test"), None)

        r = am.getAllPeople()
        self.assertIsInstance(r, list)
        for i in r:
            self.assertIsInstance(i, Person)

        r = am.getAllCulturalHeritageObjects()
        self.assertIsInstance(r, list)
        for i in r:
            self.assertIsInstance(i, CulturalHeritageObject)

        r = am.getAuthorsOfCulturalHeritageObject("just_a_test")
        self.assertIsInstance(r, list)
        for i in r:
            self.assertIsInstance(i, Person)

        r = am.getCulturalHeritageObjectsAuthoredBy("just_a_test")
        self.assertIsInstance(r, list)
        for i in r:
            self.assertIsInstance(i, CulturalHeritageObject)

        r = am.getAllActivities()
        self.assertIsInstance(r, list)
        for i in r:
            self.assertIsInstance(i, Activity)

        r = am.getActivitiesByResponsibleInstitution("just_a_test")
        self.assertIsInstance(r, list)
        for i in r:
            self.assertIsInstance(i, Activity)

        r = am.getActivitiesByResponsiblePerson("just_a_test")
        self.assertIsInstance(r, list)
        for i in r:
            self.assertIsInstance(i, Activity)

        r = am.getActivitiesUsingTool("just_a_test")
        self.assertIsInstance(r, list)
        for i in r:
            self.assertIsInstance(i, Activity)

        r = am.getActivitiesStartedAfter("1088-01-01")
        self.assertIsInstance(r, list)
        for i in r:
            self.assertIsInstance(i, Activity)

        r = am.getActivitiesEndedBefore("2029-01-01")
        self.assertIsInstance(r, list)
        for i in r:
            self.assertIsInstance(i, Activity)

        r = am.getAcquisitionsByTechnique("just_a_test")
        self.assertIsInstance(r, list)
        for i in r:
            self.assertIsInstance(i, Acquisition)

        r = am.getActivitiesOnObjectsAuthoredBy("just_a_test")
        self.assertIsInstance(r, list)
        for i in r:
            self.assertIsInstance(i, Activity)

        r = am.getObjectsHandledByResponsiblePerson("just_a_test")
        self.assertIsInstance(r, list)
        for i in r:
            self.assertIsInstance(i, CulturalHeritageObject)

        r = am.getObjectsHandledByResponsibleInstitution("just_a_test")
        self.assertIsInstance(r, list)
        for i in r:
            self.assertIsInstance(i, CulturalHeritageObject)

        r = am.getAuthorsOfObjectsAcquiredInTimeFrame("1088-01-01", "2029-01-01")
        self.assertIsInstance(r, list)
        for i in r:
            self.assertIsInstance(i, Person)   