# __Project presentation__

Here we present our end-of-the-course project for [Data Science](https://www.unibo.it/en/study/phd-professional-masters-specialisation-schools-and-other-programmes/course-unit-catalogue/course-unit/2023/467046), a module of the integrated course in  Computational Management of Data (I.C.), aa 2023/2024, [DhDk](https://corsi.unibo.it/2cycle/DigitalHumanitiesKnowledge) unibo.

<div class="alert alert-block alert-info">
<ul><i>Collaborators:</i> <br>
<li>Hubert Krzywonos - hubert.krzywonos@studio.unibo.it</li>
<li>Mohamed Iheb Ouerghi - mohamediheb.ouerghi@studio.unibo.it</li>
<li>Giorgia Umana - giorgia.umana@studio.unibo.it</li>
<li>Lucrezia Pograri - lucrezia.pograri@studio.unibo.it</li></ul></div>

## `Cultural Objects` classes

The following classes provide a structured way to represent various types of cultural heritage objects and their attributes within a Python-based system.

<ul>
<li>IdentifiableEntity (superclass):<br>
This class represents an entity that can be identified uniquely by an ID. It serves as the base class for other identifiable entities in the system.</li>
<li>Person (inherits from IdentifiableEntity):<br>
Person represents an individual with a name and an identifier. It inherits the identification functionality from IdentifiableEntity and adds a name attribute.</li>
<li>CulturalHeritageObject (inherits from IdentifiableEntity):<br>
CulturalHeritageObject represents an object of cultural heritage with attributes such as title, date, owner, place, and authors. It inherits the identification functionality from IdentifiableEntity and adds additional attributes specific to cultural heritage objects.</li>
<li>NauticalChart (inherits from CulturalHeritageObject):<br>
Represents a nautical chart, a subtype of CulturalHeritageObject.</li>
<li>ManuscriptPlate (inherits from CulturalHeritageObject):<br>
Represents a manuscript plate, a subtype of CulturalHeritageObject.</li>
<li>ManuscriptVolume (inherits from CulturalHeritageObject):<br>
Represents a manuscript volume, a subtype of CulturalHeritageObject.</li>
<li>PrintedVolume (inherits from CulturalHeritageObject):<br>
Represents a printed volume, a subtype of CulturalHeritageObject.</li>
<li>PrintedMaterial (inherits from CulturalHeritageObject):<br>
Represents a printed material, a subtype of CulturalHeritageObject.</li>
<li>Herbarium (inherits from CulturalHeritageObject):<br>
Represents a herbarium, a subtype of CulturalHeritageObject.</li>
<li>Specimen (inherits from CulturalHeritageObject):<br>
Represents a specimen, a subtype of CulturalHeritageObject.</li>
<li>Painting (inherits from CulturalHeritageObject):<br>
Represents a painting, a subtype of CulturalHeritageObject.</li>
<li>Model (inherits from CulturalHeritageObject):<br>
Represents a model, a subtype of CulturalHeritageObject.</li>
<li>Map (inherits from CulturalHeritageObject):<br>
Represents a map, a subtype of CulturalHeritageObject.</li>

Now, let's create instances of these classes using the provided CSV data. For brevity, we'll just demonstrate with a few objects

In [30]:
from impl import *

csv_data = [
    {"Id": "1", "Type": "Nautical chart", "Title": "Nautical chart", "Date": "1482", "Author": "Benincasa, Grazioso (ULAN:500114874)", "Owner": "BUB", "Place": "Bologna"},
    {"Id": "2", "Type": "Printed volume", "Title": "The History of Plants", "Date": "1497", "Author": "Teofrasto (VIAF:265397758)", "Owner": "BUB", "Place": "Bologna"},
    # More data in /meta.csv from /resources directory
]

# Function to create instances from CSV data
def create_objects_from_csv(data):
    objects = []
    for item in data:
        if item["Type"] == "Nautical chart":
            obj = NauticalChart(item["Id"], item["Title"], item["Owner"], item["Place"], item["Date"], [Person("1", "Benincasa, Grazioso")])
        elif item["Type"] == "Manuscript plate":
            obj = ManuscriptPlate(item["Id"], item["Title"], item["Owner"], item["Place"], item["Date"], [Person("2", "Teofrasto")])
        # Create other object types as needed...
        else:
            obj = CulturalHeritageObject(item["Id"], item["Title"], item["Owner"], item["Place"], item["Date"], [Person("3", "Dioscorides Pedanius")])
        objects.append(obj)
    return objects

# Create objects from CSV data
objects = create_objects_from_csv(csv_data)

# Now, let's demonstrate accessing attributes and methods of these objects

# Accessing methods
print(objects[0].getId())
print(objects[0].getTitle())
print(objects[0].getOwner())
print(objects[0].getAuthors()[0].getName())
print(objects[1].getDate())

1
Nautical chart
BUB
Benincasa, Grazioso
1497


## `Activity` classes

These classes provide a framework for modeling and managing the different stages involved in the creation of a digital twin of physical cultural heritage objects, allowing for structured representation and tracking of activities, responsible individuals, tools used, and associated timelines.

<ul>
<li>Activity (superclass):<br>
Activity serves as a base class for different kinds of processes involved in creating a digital twin of physical cultural heritage objects. It contains attributes such as the responsible person, tools used, start and end dates of the activity, and a reference to the cultural heritage object it pertains to.</li>
<li>Acquisition (inherits from Activity):<br>
Acquisition represents the process of acquiring data or information related to a cultural heritage object. It inherits attributes and methods from Activity and adds a technique attribute to specify the acquisition technique used.</li>
<li>Processing (inherits from Activity):<br>
Processing represents the process of manipulating or transforming acquired data or information.</li> It inherits attributes and methods from Activity.</li>
<li>Modelling (inherits from Activity):<br>
Modelling represents the process of creating a digital model or representation of a physical cultural heritage object. It inherits attributes and methods from Activity.</li>
<li>Optimising (inherits from Activity):<br>
Optimising represents the process of optimizing or refining the digital twin or its components. It inherits attributes and methods from Activity.</li>
<li>Exporting (inherits from Activity):<br>
Exporting represents the process of exporting the digital twin or its components for various purposes such as preservation, presentation, or analysis. It inherits attributes and methods from Activity.</li>

In [None]:
import json
import os
from impl import *

# Load JSON data from file
file_path = os.path.join("resources", "process.json")
with open(file_path, "r") as file:
    json_data = json.load(file)

# Extract the first object from the JSON data
item = json_data[0]

# Common attributes
refersTo_cho = item['object id']
start_date = item['acquisition']['start date']
end_date = item['acquisition']['end date']

# Iterate over each activity type
for activity_type in ['acquisition', 'processing', 'modelling', 'optimising', 'exporting']:
    activity_data = item[activity_type]
    institute = activity_data['responsible institute']
    person = activity_data['responsible person']
    tools = set(activity_data.get('tool', []))

    if activity_type == 'acquisition':
        technique = activity_data.get('technique', '')
        activity = Acquisition(refersTo_cho, institute, person, start_date, end_date, technique, tools)
    else:
        activity = globals()[activity_type.capitalize()](refersTo_cho, institute, person, start_date, end_date, tools)

    print(f"Refers To CHO: {activity.getRefersTo_cho()}")
    print(f"Activity Type: {activity.__class__.__name__}")
    print(f"Institute: {activity.getResponsibleInstitute()}")
    print(f"Person: {activity.getResponsiblePerson()}")
    print(f"Tools: {activity.getTools()}")
    print(f"Start Date: {activity.getStartDate()}")
    print(f"End Date: {activity.getEndDate()}")
    print()


## `Handler` classes

### `Upload Handler`

`ProcessData Upload Handler`

In this design, it's reasonable to put the creation and management of the pandas DataFrames (activity_dfs and tools_df) within the ProcessDataUploadHandler class. This class is responsible for handling the processing and uploading of data to SQLite. The method `process_data` is responsible for the creation of the different dataframe.

### Dataframes
DataFrames are created for each activity type - acquisition, processing, modelling, optimising, exporting - and populated with the relevant data parsing the JSON file. The use of internal identifiers for the activities and the mapping of object IDs intends to add clarity to the data organization.

In [None]:
# to be corrected

In [32]:
from upload import *

upload_handler = ProcessDataUploadHandler(db_name="path_to_your_database.db")

# Step 4: Load JSON data and process it
json_file_path = os.path.join("resources", "process.json")
activity_dfs, tools_df = upload_handler.process_data(json_file_path)

# Step 5: Print the dataframes
for activity_type, activity_df in activity_dfs.items():
    print(f"DataFrame for {activity_type}:")
    print(activity_df)

print()  # Add an empty line for better readability

print("DataFrame for tools:")
tools_df

DataFrame for acquisition:
   Activity_internal_id     Refers To Responsible Institute  \
0        Acquisition-01   CH Object-1               Council   
1        Acquisition-02   CH Object-2               Council   
2        Acquisition-03   CH Object-3               Council   
3        Acquisition-04   CH Object-4               Council   
4        Acquisition-05   CH Object-5               Council   
5        Acquisition-06   CH Object-6               Council   
6        Acquisition-07   CH Object-7               Council   
7        Acquisition-08   CH Object-8               Council   
8        Acquisition-09   CH Object-9               Council   
9        Acquisition-10  CH Object-10              Heritage   
10       Acquisition-11  CH Object-11          Architecture   
11       Acquisition-12  CH Object-12          Architecture   
12       Acquisition-13  CH Object-13               Council   
13       Acquisition-14  CH Object-14               Council   
14       Acquisition-15  CH 

Unnamed: 0,Tool_internal_id,Tool,Activity_internal_id
0,Acquisition-01-tool,Nikon D7200 Nikor 50mm,Acquisition-01
1,Acquisition-05-tool,Nikon D7200 Nikor 35mm,Acquisition-05
2,Acquisition-06-tool,Nikon D7200 Nikor 35mm,Acquisition-06
3,Acquisition-07-tool,Nikon D7200 Nikor 35mm,Acquisition-07
4,Acquisition-08-tool,Nikon D7200 Nikor 35mm,Acquisition-08
...,...,...,...
197,Exporting-29-tool,Metashape,Exporting-29
198,Exporting-30-tool,Blender,Exporting-30
199,Exporting-32-tool,Blender,Exporting-32
200,Exporting-34-tool,Artec Studio 16,Exporting-34


The relational database is created using the related source data:

In [None]:

from upload import *

# Instantiate the ProcessDataUploadHandler to create the relational database
process_data_upload_handler = ProcessDataUploadHandler(db_name="relational.db")

# Load JSON data and process them
json_file_path = os.path.join("resources", "process.json")
activity_dfs, tools_df = process_data_upload_handler.process_data(json_file_path)

# Push the data to the database
process_data_upload_handler.pushDataToDb(activity_dfs, tools_df)

`Metadata Upload Handler`


Then, the graph database is created using the related source data.<br>
It is important to first remember to run the Blazegraph instance.<br>
Furthermore, one could, in principle, push one or more files calling the method one or more times - even calling the method twice specifying the same file.

In [None]:
from upload import *

grp_endpoint = "http://127.0.0.1:9999/blazegraph/sparql"
metadata = MetadataUploadHandler()
metadata.setDbPathOrUrl(grp_endpoint)
metadata.pushDataToDb("data/meta.csv")

### `Query Handler`

In the next passage, the query handlers are created for both the databases, using the related classes.

In [None]:
from impl import *
from query import *


process_qh = ProcessDataQueryHandler()
rel_path = "relational.db"
process_qh.setDbPathOrUrl(rel_path)

metadata_qh = MetadataQueryHandler()
metadata_qh.setDbPathOrUrl(grp_endpoint)

`ProcessData Query Handler`

This class provides different methods which return dataframes in order to perform queries using Pandas library on the SQLite database.

In [None]:
from query import *

# Example usages of query methods

# Instantiate ProcessDataQueryHandler
query_handler = ProcessDataQueryHandler(dbPathOrUrl="relational_database.db")

In [None]:
# Example 1: Get all activities
all_activities_df = query_handler.getAllActivities()
print("All Activities:")
all_activities_df

In [None]:
# Example 2: Get activities by responsible institution
responsible_institution = "Heritage"
activities_by_institution_df = query_handler.getActivitiesByResponsibleInstitution(partialName=responsible_institution)
print(f"Activities by responsible institution '{responsible_institution}':")
activities_by_institution_df

In [None]:
# Example 3: Get activities by responsible person
responsible_person = "Gretel Grim"
activities_by_person_df = query_handler.getActivitiesByResponsiblePerson(partialName=responsible_person)
print(f"Activities by responsible person '{responsible_person}':")
activities_by_person_df

In [None]:
# Example 4: Get activities using a specific tool
tool_name = "Blender"
activities_using_tool_df = query_handler.getActivitiesUsingTool(partialName=tool_name)
print(f"Activities using tool '{tool_name}':")
activities_using_tool_df

In [None]:
# Example 5: Get activities started after a specific date
start_date = "2023-08-21"
activities_started_after_df = query_handler.getActivitiesStartedAfter(date=start_date)
print(f"Activities started after '{start_date}':")
activities_started_after_df

In [None]:
# Example 6: Get activities ended before a specific date
end_date = "2023-09-19"
activities_ended_before_df = query_handler.getActivitiesEndedBefore(date=end_date)
print(f"Activities ended before '{end_date}':")
activities_ended_before_df

In [None]:
# Example 7: Get acquisitions by technique
technique_name = "Structured-light 3D scanner"
acquisitions_by_technique_df = query_handler.getAcquisitionsByTechnique(partialName=technique_name)
print(f"Acquisitions by technique '{technique_name}':")
acquisitions_by_technique_df

`Metadata Query Handler`

In [None]:
#

## `Mashup` classes

### `Basic Mashup`

The method `createActivityList` within the BasicMashup class is reused to implement the methods related to the Activity class. The createActivityList method takes a dataframe df as input and processes it to create a list of activities based on certain conditions.

In [None]:
#

### `Advanced Mashup`

Finally, create an Advanced Mashup object for asking about data.

In [None]:
from mashup import *

mashup = AdvancedMashup()
mashup.addProcessHandler(process_qh)
mashup.addMetadataHandler(metadata_qh)

result_q1 = mashup.getAllActivities()
result_q2 = mashup.getAuthorsOfCulturalHeritageObject("1")
result_q3 = mashup.getAuthorsOfObjectsAcquiredInTimeFrame("2023-04-01", "2023-05-01")
# etc...