
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/RelevanceAI/RelevanceAI-readme-docs/blob/v1.1.4/docs/GENERAL_FEATURES/aggregations/_notebooks/aggregation-quick-start.ipynb)


# Installation

In [None]:
# remove `!` if running the line in a terminal
!pip install -U RelevanceAI[notebook]==1.1.4


# Setup

In [None]:
from relevanceai import Client

"""
You can sign up/login and find your credentials here: https://cloud.relevance.ai/sdk/api
Once you have signed up, click on the value under `Activation token` and paste it here
"""
client = Client()



# Data

In [3]:
import pandas as pd
from relevanceai.datasets import get_realestate_dataset

# Retrieve our sample dataset. - This comes in the form of a list of documents.
documents = get_realestate_dataset()

# ToDo: Remove this cell when the dataset is updated

for d in documents:
  if '_clusters_' in d:
    del d['_clusters_']

pd.DataFrame.from_dict(documents).head()




In [None]:
df = client.Dataset("quickstart_aggregation")
df.insert_documents(documents)


# 1. Grouping the Data

In general, the group-by field is structured as
```
{"name": ALIAS, 
"field": FIELD, 
"agg": TYPE-OF-GROUP}
```



## Categorical Data

In [6]:
location_group = {"name": "location", "field": "propertyDetails.area", "agg": "category"}

## Numerical Data

In [7]:
bedrooms_group = {"name": "bedrooms", "field": "propertyDetails.bedrooms", "agg": "numeric"}

## Putting it Together

In [8]:
groupby = [location_group, bedrooms_group]

# 2. Creating Aggregation Metrics

In general, the aggregation field is structured as

```
{"name": ALIAS, 
"field": FIELD, 
"agg": TYPE-OF-AGG}
```



## Average, Minimum and Maximum

In [9]:
avg_price_metric = {"name": "avg_price", "field": "priceDetails.price", "agg": "avg"}
max_price_metric = {"name": "max_price", "field": "priceDetails.price", "agg": "max"}
min_price_metric = {"name": "min_price", "field": "priceDetails.price", "agg": "min"}

## Sum

In [10]:
sum_bathroom_metric = {"name": "bathroom_sum", "field": "propertyDetails.bathrooms", "agg": "sum"}

## Cardinality

In [11]:
cardinality_suburbs_metric = {"name": "num_suburbs", "field": "propertyDetails.suburb", "agg": "cardinality"}

## Putting it Together

In [12]:
metrics = [avg_price_metric, 
           max_price_metric, 
           min_price_metric, 
           sum_bathroom_metric, 
           #cardinality_suburbs_metric
           ]

# 3. Combining Grouping and Aggregating

In [16]:
## TODO: update to the new aggregate
results = client.services.aggregate.aggregate("quickstart_aggregation", metrics = metrics, groupby = groupby)


In [None]:
#Use jsonshower to demonstrate json result
from jsonshower import show_json
show_json(results, text_fields= list(results[0].keys()))