This is one of the Objectiv [example notebooks](https://objectiv.io/docs/modeling/example-notebooks/). These notebooks can also run [on your own data](https://objectiv.io/docs/modeling/get-started-in-your-notebook/) (see [how to set up tracking](https://objectiv.io/docs/tracking/)).

# Basic user intent analysis

This example notebook shows how you can easily do basic user intent analysis on your data.

## Get started
We first have to instantiate the model hub and an Objectiv DataFrame object.

In [None]:
# set the timeframe of the analysis
start_date = '2022-03-01'
end_date = '2022-05-01'

In [None]:
from modelhub import ModelHub, display_sql_as_markdown
import bach
import pandas as pd
from datetime import timedelta

# instantiate the model hub and set the default time aggregation to daily
# and set the global contexts that will be used in this example
modelhub = ModelHub(time_aggregation='%Y-%m-%d', global_contexts=['application'])
# get a Bach DataFrame with Objectiv data within a defined timeframe
df = modelhub.get_objectiv_dataframe(start_date=start_date, end_date=end_date)

The `location_stack` column, and the columns taken from the global contexts, contain most of the event-specific data. These columns are JSON typed, and we can extract data from it using the keys of the JSON objects with [`SeriesLocationStack`](https://objectiv.io/docs/modeling/open-model-hub/api-reference/SeriesLocationStack/SeriesLocationStack/) methods, or the `context` accessor for global context columns. See the [open taxonomy example](open-taxonomy-how-to.ipynb#Location-stack-&-global-contexts) for how to use the `location_stack` and global contexts. 

In [None]:
# adding specific contexts to the data as columns
df['application_id'] = df.application.context.id
df['root_location'] = df.location_stack.ls.get_from_context_with_type_series(type='RootLocationContext', key='id')

### Reference
* [modelhub.ModelHub](https://objectiv.io/docs/modeling/open-model-hub/api-reference/ModelHub/ModelHub/)
* [modelhub.ModelHub.get_objectiv_dataframe](https://objectiv.io/docs/modeling/open-model-hub/api-reference/ModelHub/get_objectiv_dataframe/)
* [using global context data](open-taxonomy-how-to.ipynb#Location-stack-&-global-contexts)
* [modelhub.SeriesLocationStack.ls](https://objectiv.io/docs/modeling/open-model-hub/api-reference/SeriesLocationStack/ls/)

## Explore where users spend time
The `root_location` context in the location stack represents the top-level UI location of the user. As a first step of grasping user intent, this is a good starting point to see in what main areas of your product users are spending time.

In [None]:
# see the number of unique users per application and root_location
users_root = modelhub.aggregate.unique_users(df, groupby=['application_id', 'root_location'])
users_root.sort_index().head(10)

Another good pointer to explore for user intent is how much time users spend in each `root_location`.

In [None]:
# see duration per application and root location
duration_root = modelhub.aggregate.session_duration(df, groupby=['application_id', 'root_location']).sort_index()
duration_root.head(10)

Finally, let's look at the distribution of time spent. We'll use this distribution to define the different stages of user intent.

In [None]:
# see how the overall time spent is distributed
session_duration = modelhub.aggregate.session_duration(df, groupby='session_id')
# materialization is needed because the expression of the created Series contains aggregated data, 
# and it is not allowed to aggregate that.
session_duration = session_duration.materialize()
# show quantiles
session_duration.quantile(q=[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]).head(10)

### Reference
* [modelhub.Aggregate.unique_users](https://objectiv.io/docs/modeling/open-model-hub/models/aggregation/unique_users/)
* [bach.DataFrame.sort_index](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/sort_index/)
* [bach.DataFrame.head](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/head/)
* [modelhub.Aggregate.session_duration](https://objectiv.io/docs/modeling/open-model-hub/models/aggregation/session_duration/)
* [bach.DataFrame.materialize](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/materialize/)
* [bach.DataFrame.quantile](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/quantile/)

## Define the stages of user intent
Now that we've explored the `root_location` and session duration (both per `root_location` and overall quantiles) where users spend their time, we can make a simple definition of the different stages of their intent.

Based on this dataset (objectiv.io website data) we think that:

- Users that spent most time on the site (the 90th percentile), and specifically in our documentation sections, are in the Implement phase.
- As there's a jump beyond the one minute mark at the 70th percentile, we assume that users in the 70th to 90th percentile duration in our documentation sections are in the Explore phase.
- The remaining users are Informing themselves about the product. Those users are spending less than 1:30 in the docs and/or spend any amount of time on our main website.

Summarizing:

| User intent | Root locations | Duration |
| :--- | :--- | :--- |
| 1 - Inform | *all sections other than the ones mentioned below* | *any time spent* |
| 1 - Inform | Docs: modeling, taxonomy, tracking, home | less than 1:30 |
| 2 - Explore | Docs: modeling, taxonomy, tracking, home | between 1:30 and 11:30 |
| 3 - Implement | Docs: modeling, taxonomy, tracking, home | more than 11:30 | 

This is just for illustration purposes, you can adjust these definitions based on your own collected data. 

## Assign user intent
Using our intent definitions above, we can now assign a stage of intent to each user.

In [None]:
# select the root_locations to use for each of the intent stages
roots = bach.DataFrame.from_pandas(
    engine=df.engine, 
    df=pd.DataFrame({'roots': ['modeling', 'taxonomy', 'tracking', 'home', 'docs']})
).roots

In [None]:
# calculate the total time spent per user
user_intent_buckets = modelhub.agg.session_duration(df, 
                                                    groupby=['user_id'], 
                                                    method='sum',
                                                    exclude_bounces=False).to_frame()

In [None]:
# same as above, but for selected root_locations only
selector = (df.root_location.isin(roots)) & (df.application_id=='objectiv-docs')
explore_inform_users_session_duration = modelhub.agg.session_duration(df[selector], groupby='user_id', method='sum')
# and set it as column
user_intent_buckets['explore_inform_duration'] = explore_inform_users_session_duration

In [None]:
# set the Inform bucket as a catch-all, meaning users that do not fall into Explore and Implement will be defined as Inform
user_intent_buckets['bucket'] = '1 - inform'

In [None]:
# calculate buckets duration
user_intent_buckets.loc[(user_intent_buckets.explore_inform_duration >= timedelta(0, 90)) &
                        (user_intent_buckets.explore_inform_duration <= timedelta(0, 690)), 'bucket'] = '2 - explore'

user_intent_buckets.loc[user_intent_buckets.explore_inform_duration > timedelta(0, 690), 'bucket'] = '3 - implement'

### Reference
* [bach.DataFrame.from_pandas](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/from_pandas/)
* [modelhub.Aggregate.session_duration](https://objectiv.io/docs/modeling/open-model-hub/models/aggregation/session_duration/)
* [bach.Series.isin](https://objectiv.io/docs/modeling/bach/api-reference/Series/isin/)
* [bach.DataFrame.loc](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/loc/)

## Work with the user intent results

Now that we have assigned intent to each user, we can run any analysis on it. For example, we can look at the total number of users per intent bucket.

In [None]:
# see the total number of users per intent bucket
user_intent_buckets.reset_index().groupby('bucket').agg({'user_id': 'nunique'}).sort_index().head()

Other examples of analyses you could run:

- Which product features do each of the intent groups use? 
- With what kind of intent do users come from different marketing campaigns? 
- How can we drive more users to the 'Implement' phase? For instance, look at different product features that users with the 'Implement' intent use, compared to 'Explore'.

A good starting point for these analyses on top of the user intent buckets is the basic product analytics example in the [example notebooks](https://objectiv.io/docs/modeling/example-notebooks/).

### Reference
* [bach.DataFrame.groupby](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/groupby/)
* [bach.DataFrame.agg](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/agg/)
* [bach.DataFrame.sort_index](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/sort_index/)
* [bach.DataFrame.head](https://objectiv.io/docs/modeling/bach/api-reference/DataFrame/head/)

## Get the SQL for any analysis
The SQL for any analysis can be exported with one command, so you can use models in production directly to simplify data debugging & delivery to BI tools like Metabase, dbt, etc. See how you can [quickly create BI dashboards with this](https://objectiv.io/docs/home/up#creating-bi-dashboards).

In [None]:
# show SQL for analysis; this is just one example, and works for any Objectiv model/analysis
display_sql_as_markdown(user_intent_buckets)

That’s it! [Join us on Slack](https://objectiv.io/join-slack) if you have any questions or suggestions.

# Next Steps


## Use this notebook with your own data
You can use the example notebooks on any dataset that was collected with Objectiv's tracker, so feel free to 
use them to bootstrap your own projects. They are available as Jupyter notebooks on our [GitHub repository](https://github.com/objectiv/objectiv-analytics/tree/main/notebooks). See [instructions to set up the Objectiv tracker](https://objectiv.io/docs/tracking/).


## Check out related example notebooks
- [Product Analytics](./product-analytics.ipynb) - easily run basic product analytics on your data.
- [Funnel Discovery](./funnel-discovery.ipynb) - discover all the (top) user journeys that lead to conversion or drop-off, and run subsequent analyses on them.