# Linked Data and Ontologies for Construction Management


We will be using [Csite ontology](https://www.sciencedirect.com/science/article/pii/S0926580523004843) to demonstrate how linked data helps construction management. You can cite csite ontology as below if you need to use it in your work.

>Farghaly, Karim, Ranjith Soman, and Jennifer Whyte. "cSite ontology for production control of construction sites." Automation in Construction 158 (2024): 105224.

## Import necessary libraries


**pandas** is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.pandas can be installed via pip from PyPI.

`pip install pandas`

RDFLib is open source and is maintained in a GitHub repository. RDFLib releases, current and previous, are listed on PyPi

The best way to install RDFLib is to use pip (sudo as required):

`pip install rdflib`
If you want the latest code to run, clone the main branch of the GitHub repo and use that or you can pip install directly from GitHub:

`pip install git+https://github.com/RDFLib/rdflib.git@main#egg=rdflib`

In [1]:
import rdflib
import owlrl
import pandas as pd

## Define functions to load ttl file and perform SPARQL query

Define the function to load a Turtle file into an RDF graph

In [2]:
def load_ttl_file(file_path):
    graph = rdflib.Graph()
    graph.parse(file_path, format="ttl")
    return graph

Define the function to perform a SPARQL query

In [7]:
def perform_sparql_query(graph, query):
    """ Perform a SPARQL query on the RDFLib graph. """
    return graph.query(query)

def display_query_results(results):
    """ Display SPARQL query results as a pandas DataFrame. """
    # Transform the results into a list of dictionaries
    uri_to_remove = "http://www.owl-ontologies.com/"
    
    # Transform the results into a list of dictionaries, removing the specified URI part
    data = []
    for row in results:
        row_dict = {}
        for field in results.vars:
            value = str(row[field])
            # Remove the unwanted URI part if it's present
            if value.startswith(uri_to_remove):
                value = value[len(uri_to_remove):]
            row_dict[str(field)] = value
        data.append(row_dict)
    
    # Create and display a DataFrame
    return pd.DataFrame(data)

## Load the turtle file

In [8]:
ttl_file = "ProjectGrpah.ttl"  # Change this to your Turtle file path
graph = load_ttl_file(ttl_file)


## Define SPARQL query and return the outputs

### Query 1 :Identifying deliverables per floor

The first query is about tracking the number of deliverables (activities to be delivered)per floor in a weekplan. This helps the project managers to understand the plans and remove workspace conflicts. It also helps the clustering of work to be distributed to different package managers.  For this, the query first looks at each week’s plan and then identify the activities and their location and then groups it by storey.

In [10]:
# Define a SPARQL query



query = """
PREFIX cSite: <http://www.owl-ontologies.com/cSite#>
PREFIX bot: <http://www.owl-ontologies.com/bot#>



SELECT ?storey ?activity ?activityversion
WHERE {
  ?activity cSite:hasZone ?space.
  ?space bot:hasStorey ?storey.
  ?activity cSite:PlannedVersion ?activityversion.
  ?activity cSite:discussedIn cSite:THISWEEK-WeeklyPMUpdate220401.xlsx.
}
GROUP BY ?storey ?activity
"""


results = perform_sparql_query(graph, query)

# Display results
df = display_query_results(results)
print(df)

       storey                 activity activityversion
0   cSite#001  cSite#3450-QSO-001-0001               1
1   cSite#002  cSite#3450-QSO-002-0001               1
2   cSite#006  cSite#3800-QSO-006-0007               1
3   cSite#006  cSite#3800-QSO-006-PS01               1
4   cSite#006  cSite#3800-QSO-006-PSGF               1
5   cSite#005  cSite#4255-QHP-005-MB28               1
6   cSite#001  cSite#4255-QSO-001-B228               1
7   cSite#002  cSite#4255-QSO-002-0020               1
8   cSite#003  cSite#4255-QSO-003-0022               1
9   cSite#003  cSite#4255-QSO-003-0024               1
10  cSite#001  cSite#4300-QSO-001-1401               1
11  cSite#001  cSite#4446-QSO-001-0076               1
12  cSite#001  cSite#4446-QSO-001-0077               1
13  cSite#001  cSite#4446-QSO-001-0078               1
14  cSite#001  cSite#4446-QSO-001-0079               1
15  cSite#001  cSite#4446-QSO-001-0080               1
16  cSite#002  cSite#4446-QSO-002-0473               1
17  cSite#

### Query 2: Tracking the deliverables per floor

The second query is about tracking the number of deliverables that has been completed in the last 6 weeks. This helps the project manager to understand the flow of work and plan the further activities. For this, the query first looks for all the activities in the last 6 weeks and infers where the activity occurred and then checks for the completion status. All the completed activities are then grouped by storey. 

In [12]:
# Define a SPARQL query



query = """
PREFIX cSite: <http://www.owl-ontologies.com/cSite#>
PREFIX bot: <http://www.owl-ontologies.com/bot#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?storey ?weekplan (COUNT(?activity) AS ?count) WHERE {
  ?weekplan cSite:CoversFrom ?date.
  ?activity cSite:discussedIn ?weekplan;
    cSite:hasZone ?space.
  ?space bot:hasStorey ?storey.
  ?activity cSite:StatusCompleted ?status.
  #BIND((NOW()) - "P0Y6M28DT0M0S"^^xsd:yearMonthDuration AS ?sixweeksago)
  BIND("2022-04-01"^^xsd:date AS ?sixweeksago)
  FILTER(?date > ?sixweeksago)
  FILTER(?status = "Yes"^^xsd:string)
}
GROUP BY ?storey ?weekplan
ORDER BY (?storey)

"""


results = perform_sparql_query(graph, query)

# Display results
df = display_query_results(results)
print(df)

       storey                                  weekplan count
0   cSite#001  cSite#THISWEEK-WeeklyPMUpdate220408.xlsx     1
1   cSite#001  cSite#THISWEEK-WeeklyPMUpdate220506.xlsx     6
2   cSite#001  cSite#THISWEEK-WeeklyPMUpdate220527.xlsx    30
3   cSite#001  cSite#THISWEEK-WeeklyPMUpdate220415.xlsx     6
4   cSite#002  cSite#THISWEEK-WeeklyPMUpdate220429.xlsx    10
5   cSite#002  cSite#THISWEEK-WeeklyPMUpdate220408.xlsx     7
6   cSite#002  cSite#THISWEEK-WeeklyPMUpdate220506.xlsx    14
7   cSite#002  cSite#THISWEEK-WeeklyPMUpdate220527.xlsx     4
8   cSite#002  cSite#THISWEEK-WeeklyPMUpdate220415.xlsx     9
9   cSite#002  cSite#THISWEEK-WeeklyPMUpdate220422.xlsx     4
10  cSite#003  cSite#THISWEEK-WeeklyPMUpdate220506.xlsx     2
11  cSite#020  cSite#THISWEEK-WeeklyPMUpdate220429.xlsx     1


### Query 3: Identifying subcontractor productivities
For effectively managing the workflow and ensure continuous flow of work, it is imperative to know the bottlenecks in the construction. Subcontractor productivity is a major factor contributing to this. Therefore, the third query is identifying the low performing subcontractors in each storey.  For calculating this, the query first looks for all the activities planned for the last six weeks and then identify subcontractor responsible for the same. Then it looks at the completion status for each activity and then groups the results by subcontractor and storey. 

In [13]:
# Define a SPARQL query



query = """
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX cSite: <http://www.owl-ontologies.com/cSite#>
PREFIX bot: <http://www.owl-ontologies.com/bot#>
SELECT ?storey ?org ((xsd:float(COUNT(DISTINCT ?activityC))) / (xsd:float(COUNT(DISTINCT ?activity))) AS ?Completion_Rate) WHERE {
  ?weekplan cSite:CoversFrom ?date.
  ?activityC cSite:discussedIn ?weekplan;
  	cSite:StatusCompleted ?status;
    cSite:hasOrganization ?org;
    cSite:hasZone ?space1.
  ?activity cSite:discussedIn ?weekplan.
  ?activity
        cSite:StatusCompleted ?status1;
        cSite:hasOrganization ?org;
        cSite:hasZone ?space.
  ?space bot:hasStorey ?storey.
  ?space1 bot:hasStorey ?storey.
  #BIND((NOW()) - "P0Y6M28DT0M0S"^^xsd:yearMonthDuration AS ?sixweeksago)
  BIND("2022-04-01"^^xsd:date AS ?sixweeksago)
  FILTER(?date > ?sixweeksago)
  FILTER(?status = "Yes"^^xsd:string)
}
GROUP BY ?storey ?org
ORDER BY (?storey) (?Completion_Rate)

"""


results = perform_sparql_query(graph, query)

# Display results
df = display_query_results(results)
print(df)

      storey             org      Completion_Rate
0  cSite#001  cSite#CompanyL  0.45454545454545453
1  cSite#001  cSite#CompanyO    0.972972972972973
2  cSite#001  cSite#CompanyC                  1.0
3  cSite#002  cSite#CompanyL                  0.5
4  cSite#002  cSite#CompanyO   0.5974025974025974
5  cSite#003  cSite#CompanyM   0.2857142857142857
6  cSite#020  cSite#CompanyI                  1.0


### Query 4 : Identifying locations with least productivity

Besides determining subcontractor productivity, it is essential to consider location-based productivity as well. This is because certain locations may have inefficient logistics and working plans, which can contribute to lower overall productivity. Assessing location-based productivity provides valuable insights that cannot be solely obtained from subcontractor productivity analysis. The query algorithm begins by examining all activities conducted within the past six weeks. It then evaluates the specific spaces in which these activities took place. Finally, it checks the completion status of each activity to calculate the completion rate. By analysing this information, areas or floors with lower productivity can be pinpointed and assessed for potential issues in logistics or working plans. This enables stakeholders to address these concerns and take appropriate measures to improve overall productivity in those specific locations.

In [14]:
# Define a SPARQL query



query = """
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX cSite: <http://www.owl-ontologies.com/cSite#>
PREFIX bot: <http://www.owl-ontologies.com/bot#>

SELECT ?storey ((xsd:float(COUNT(DISTINCT ?activityC))) / (xsd:float(COUNT(DISTINCT ?activity))) AS ?Completion_Rate) WHERE {
  ?weekplan cSite:CoversFrom ?date.
  ?activityC cSite:discussedIn ?weekplan.
  ?activityC cSite:StatusCompleted ?status.
  ?activityC cSite:hasOrganization ?org;
    cSite:hasZone ?space1.
  ?activity cSite:discussedIn ?weekplan.
  ?activity cSite:StatusCompleted ?status1.
  ?activity cSite:hasOrganization ?org;
    cSite:hasZone ?space.
  ?space bot:hasStorey ?storey.
  ?space1 bot:hasStorey ?storey.
    #BIND((NOW()) - "P0Y6M28DT0M0S"^^xsd:yearMonthDuration AS ?sixweeksago)
  BIND("2022-04-01"^^xsd:date AS ?sixweeksago)
  FILTER(?date > ?sixweeksago)
  FILTER(?status = "Yes"^^xsd:string)
}
GROUP BY ?storey
ORDER BY (?Completion_Rate) (?storey)

"""


results = perform_sparql_query(graph, query)

# Display results
df = display_query_results(results)
print(df)

      storey     Completion_Rate
0  cSite#003  0.2857142857142857
1  cSite#002  0.5949367088607594
2  cSite#001                0.86
3  cSite#020                 1.0


### Query 5 : Identifying subcontractor completion rates

In addition to obtaining subcontractor productivities per floor, it is equally crucial to determine the overall completion rate of subcontractors. This information plays a vital role in identifying underperforming subcontractors and taking necessary corrective actions. Listing 5 presents a query specifically designed to calculate the completion rate of various subcontractors on a weekly basis. The query algorithm retrieves all activities performed within the last six weeks and identifies the responsible subcontractor for each activity. Subsequently, it checks whether each activity has been completed and aggregates the data on a weekly basis for each subcontractor. TThis data aids in the identification of subcontractors with lower performance and assists in implementing appropriate measures to rectify any issues.

In [15]:
# Define a SPARQL query



query = """
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX cSite: <http://www.owl-ontologies.com/cSite#>
SELECT ?weekplan ?org (COUNT(DISTINCT ?activityC) AS ?N_completed_activities) (COUNT(DISTINCT ?activity) AS ?N_activities) ((xsd:float(COUNT(DISTINCT ?activityC))) / (xsd:float(COUNT(DISTINCT ?activity))) AS ?Completion_Rate) WHERE {
  ?weekplan cSite:CoversFrom ?date.
  ?activityC cSite:discussedIn ?weekplan.
  ?activityC cSite:StatusCompleted ?status.
  ?activityC cSite:hasOrganization ?org.
  ?activity cSite:discussedIn ?weekplan.
  ?activity cSite:StatusCompleted ?status1.
  ?activity cSite:hasOrganization ?org.
   #BIND((NOW()) - "P0Y6M28DT0M0S"^^xsd:yearMonthDuration AS ?sixweeksago)
  BIND("2022-04-01"^^xsd:date AS ?sixweeksago)
  FILTER(?date > ?sixweeksago)
  FILTER(?status = "Yes"^^xsd:string)
}
GROUP BY ?weekplan ?org
ORDER BY (?weekplan) DESC (?Completion_Rate)


"""


results = perform_sparql_query(graph, query)

# Display results
df = display_query_results(results)
print(df)

                                    weekplan             org  \
0   cSite#THISWEEK-WeeklyPMUpdate220408.xlsx  cSite#CompanyC   
1   cSite#THISWEEK-WeeklyPMUpdate220408.xlsx  cSite#CompanyO   
2   cSite#THISWEEK-WeeklyPMUpdate220415.xlsx  cSite#CompanyC   
3   cSite#THISWEEK-WeeklyPMUpdate220415.xlsx  cSite#CompanyO   
4   cSite#THISWEEK-WeeklyPMUpdate220415.xlsx  cSite#CompanyL   
5   cSite#THISWEEK-WeeklyPMUpdate220422.xlsx  cSite#CompanyO   
6   cSite#THISWEEK-WeeklyPMUpdate220429.xlsx  cSite#CompanyI   
7   cSite#THISWEEK-WeeklyPMUpdate220429.xlsx  cSite#CompanyO   
8   cSite#THISWEEK-WeeklyPMUpdate220506.xlsx  cSite#CompanyO   
9   cSite#THISWEEK-WeeklyPMUpdate220506.xlsx  cSite#CompanyL   
10  cSite#THISWEEK-WeeklyPMUpdate220506.xlsx  cSite#CompanyM   
11  cSite#THISWEEK-WeeklyPMUpdate220527.xlsx  cSite#CompanyO   

   N_completed_activities N_activities     Completion_Rate  
0                       1            1                 1.0  
1                       7            7       

### Query 6 Identifying reasons for non-completions

Although identifying the completion rate can help identify the subcontractor productivity. For making the best rectifying action, it is also necessary to identify the reasons for non-completion. The query first identifies the activities which are not completed in the last 6 weeks and associates the organisation with these activities and the calculates which has the highest no of non-completions. Then it looks for the reasons for the non-completion and counts them. 

In [16]:
# Define a SPARQL query



query = """
PREFIX cSite: <http://www.owl-ontologies.com/cSite#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?org ?reason (COUNT(?reason) AS ?reasoncount) WHERE {
  ?activity cSite:hasOrganization ?org.
  ?activity cSite:discussedIn ?weekplan.
  
  ?activity cSite:ReasonForDelay ?reason.
  {
    SELECT ?org (COUNT(?activity) AS ?count) WHERE {
      ?weekplan cSite:CoversFrom ?date.
      ?activity cSite:StatusCompleted ?status.
      ?activity cSite:hasOrganization ?org.
      #BIND((NOW()) - "P0Y6M28DT0M0S"^^xsd:yearMonthDuration AS ?sixweeksago)
  	  BIND("2022-04-01"^^xsd:date AS ?sixweeksago)
      FILTER(?date > ?sixweeksago)
      FILTER(?status = "No"^^xsd:string)
    }
    GROUP BY ?org
    ORDER BY DESC (?count)
    LIMIT 10
  }
}
GROUP BY ?org ?reason
ORDER BY DESC (?reasoncount)
LIMIT 10


"""


results = perform_sparql_query(graph, query)

# Display results
df = display_query_results(results)
print(df)

              org                                       reason reasoncount
0  cSite#CompanyM                           Activity Durations          64
1  cSite#CompanyM     Coordination of Trades at the Work Front          45
2  cSite#CompanyL                 Labour Resource Availability          30
3  cSite#CompanyM                         Access and Logistics          22
4  cSite#CompanyO                           Activity Durations          18
5  cSite#CompanyK                 Labour Resource Availability          18
6  cSite#CompanyL                           Activity Durations          14
7  cSite#CompanyO  Subcontractor & Supplier design Information          12
8  cSite#CompanyL                             Outanding Design          11
9  cSite#CompanyM                           Construction Logic          10


### Query 7 Identifying activity that has been replanned the most, the reason and who is most affected

Frequent replanning of certain activities is a significant aspect that warrants investigation. Understanding the reasons behind these replans, as well as identifying the organizations involved and affected, is crucial. The query examines the number of times each activity has been replanned, investigates the reasons behind the replanning, identifies the responsible organization for non-completion, and determines the organization most impacted by these replans. The resulting output will provide valuable insights into the replanning dynamics and their organizational implications.

In [18]:
# Define a SPARQL query



query = """
PREFIX cSite: <http://www.owl-ontologies.com/cSite#>
PREFIX bot: <http://www.owl-ontologies.com/bot#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?storey ?org ?activity ?reason ?resporg WHERE {
  ?weekplan cSite:CoversFrom ?date.
  ?activity cSite:discussedIn ?weekplan;
    cSite:hasOrganization ?org;
    cSite:hasZone ?space.
  ?space bot:hasStorey ?storey.
  ?activity cSite:PlannedVersion ?version;
    cSite:ReasonForDelay ?reason;
    cSite:responsibleForNonCompletion ?resporg.
  FILTER((xsd:integer(?version)) > 1 )
  FILTER(?date > ?sixweeksago)
  {
    SELECT ?storey ?org (COUNT(?activity) AS ?n_orgreplanned) ?sixweeksago WHERE {
      ?weekplan cSite:CoversFrom ?date.
      ?activity cSite:discussedIn ?weekplan.
      ?activity cSite:hasOrganization ?org;
        cSite:hasZone ?space.
      ?space bot:hasStorey ?storey.
      ?activityversion cSite:PlannedVersion ?version.
      FILTER(?date > ?sixweeksago)
      FILTER((xsd:integer(?version)) > 1 )
      {
        SELECT ?storey (COUNT(DISTINCT ?activityversion) AS ?n_replanned) ?sixweeksago WHERE {
          ?weekplan cSite:CoversFrom ?date.
          ?activity cSite:discussedIn ?weekplan.
          ?activity  cSite:hasZone ?space.
          ?space bot:hasStorey ?storey.
          ?activityversion cSite:PlannedVersion ?version.
          #BIND((NOW()) - "P0Y6M28DT0M0S"^^xsd:yearMonthDuration AS ?sixweeksago)
  	  	  BIND("2022-04-01"^^xsd:date AS ?sixweeksago)
          FILTER(?date > ?sixweeksago)
          FILTER((xsd:integer(?version)) > 1 )
        }
        GROUP BY ?storey ?sixweeksago
        ORDER BY DESC (?n_replanned)
        LIMIT 4
      }
    }
    GROUP BY ?storey ?sixweeksago ?org
    LIMIT 4
  }
}
GROUP BY ?storey ?org ?activity ?reason ?resporg
ORDER BY (?activity)


"""


results = perform_sparql_query(graph, query)

# Display results
df = display_query_results(results)
print(df)