# 2.1 Aggregation

Aggregate functions are used to calculate expressions over a set of values.

RDFox offers several, including common functions like SUM and COUNT, and more specialized functions like COUNT_MAX and MAX_ARGMIN.

See the [full list shown here](https://oxfordsemtech.github.io/DocumentationDevBuild/querying.html#aggregate-functions) (excluding non-deterministic functions).

# Using aggregate functions in rules

To use aggregation in a rule, you must declare it within the body with `AGGREGATE()`, that contains the relevant atoms.

You must also indicate which variable(s) the you're aggregating on with `ON`, and bind the result to a new variable with `BIND`.

# Variable scope

Variables in aggregate atoms are local to the atom unless mentioned outside of the atom. Two aggregates can use a variable with the same name so long as it is not a mentioned group variable.


## Example

Below is an example of MAX and COUNT_MAX used within one rule to calculate the percentage of 5 star reviews for a product in a catalogue.

In [6]:
agg_data = """
@prefix : <https://rdfox.com/example#> .

:review1 a :Review ;
    :hasStars 5 .

:review2 a :Review ;
    :hasStars 5 .

:review3 a :Review ;
    :hasStars 2 .

:productABC :hasReview :review1 ;
    :hasReview :review2 ;
    :hasReview :review3 .
    
"""

In [7]:
agg_rules = """
[?product, :hasPercentageOfFiveStars, ?percentage] :-
    AGGREGATE (
        [?product, :hasReview, ?review]
        ON ?product
        BIND COUNT(?review) AS ?count 
    ),
    AGGREGATE (
        [?product, :hasReview, ?review],
        [?review, :hasStars, ?stars]
        ON ?product
        BIND COUNT_MAX(?stars) AS ?maxCount 
    ),
    BIND( (?maxCount / ?count) AS ?percentage ) .
"""

In [8]:
import requests

# Set up the SPARQL endpoint
rdfox_server = "http://localhost:12110"

# Helper function to raise exception if the REST endpoint returns an unexpected status code
def assert_response_ok(response, message):
    if not response.ok:
        raise Exception(
            message + "\nStatus received={}\n{}".format(response.status_code, response.text))

# Clear data store
clear_response = requests.delete(
    rdfox_server + "/datastores/default/content?facts=true&axioms&rules")
assert_response_ok(clear_response, "Failed to clear data store.")

# Add data
payload = {'operation': 'add-content-update-prefixes'}
data_response = requests.patch(
    rdfox_server + "/datastores/default/content", params=payload, data=agg_data)
assert_response_ok(data_response, "Failed to add facts to data store.")

# Get rules
rules_response = requests.post(rdfox_server + "/datastores/default/content", data=agg_rules)
assert_response_ok(rules_response, "Failed to add rule.")

# Issue select query
sparql_text = "SELECT ?percentage WHERE { ?review :hasPercentageOfFiveStars ?percentage}"
response = requests.get(
    rdfox_server + "/datastores/default/sparql", params={"query": sparql_text})
assert_response_ok(response, "Failed to run select query.")
print('\n=== Percentage of 5 star ratings ===')
print(response.text)



=== Percentage of 5 star ratings ===
?percentage
0.666666666666666667



## Exercise

Complete the rule `aggregationRules.dlog` so that we can use the query below to directly find 'Star Products' - that is, products with a higher than 4 star average AND more than 5 total reviews.

In [10]:
agg_sparql = """
SELECT ?starProduct ?averageStars ?reviewCount
WHERE {
    ?starProduct a :StarProduct ;
    :hasAverageStars ?averageStars ;
    :hasReviewCount ?reviewCount .
} ORDER BY DESC(?averageStars)
"""

Here is a representitive sample of the data in `aggregationData.ttl`.

In [11]:
sammple_data = """
@prefix : <https://rdfox.com/example#> .

:product0001 a :Sofa ;
    :hasReview :review11.

:review11 :hasStars 5 .

"""

### Run this code when you're ready!

You should see your star products!

In [14]:
# Clear data store
clear_response = requests.delete(
    rdfox_server + "/datastores/default/content?facts=true&axioms&rules")
assert_response_ok(clear_response, "Failed to clear data store.")

# Get and add data
with open("../data/2_1-AggregationData.ttl", "r") as file:
    aggregation_data = file.read()
payload = {'operation': 'add-content-update-prefixes'}
data_response = requests.patch(
    rdfox_server + "/datastores/default/content", params=payload, data=aggregation_data)
assert_response_ok(data_response, "Failed to add facts to data store.")

# Get and add rules
with open("../rules/2_1-AggregationRules.dlog", "r") as file:
    aggregation_rules = file.read()
rules_response = requests.post(rdfox_server + "/datastores/default/content", data=aggregation_rules)
assert_response_ok(rules_response, "Failed to add rule.")

# Issue select query
response = requests.get(
    rdfox_server + "/datastores/default/sparql", params={"query": agg_sparql})
assert_response_ok(response, "Failed to run select query.")
print('\n=== Star Products ===')
print(response.text)

Exception: Failed to add rule.
Status received=400
RuleCompilationException: An exception occurred while compiling ... :- FILTER(?averageStars > 4), FILTER(?reviewCount > 5) .
    QueryCompilationException: A plan satisfying the binding requirements could not be found.