## Background

- The user journey is the path that visitors take on a website, from their initial entry point to the completion of a desired action or goal.

*Key Components:*
1. **Entry Point:**
   - Identifies where users first land on the website, whether through direct traffic, organic search, or referral sources.

2. **Navigation:**
   - Tracks users' movements within the site, including page views, interactions, and the sequence of actions taken.

3. **Engagement:**
   - Measures the level of user interaction with site elements, such as clicks, form submissions, and time spent on specific pages.

4. **Conversion Points:**
   - Highlights areas where users fulfill desired actions, such as making a purchase, filling a form, or subscribing to a newsletter.

5. **Exit Points:**
   - Indicates where users leave the site, providing insights into potential pain points or areas for improvement.

*Importance of Understanding User Journeys:*
- **Enhanced User Experience:**
  - Tailoring the website to align with user behavior improves overall satisfaction and engagement.

- **Optimized Conversion Paths:**
  - Identifying effective conversion paths allows for strategic optimization to drive desired outcomes.

- **Data-Driven Decision Making:**
  - Analyzing user journeys enables data-driven insights, guiding informed decisions for website enhancements.

- **Improved Marketing Strategies:**
  - Understanding how users interact with the site informs marketing efforts, ensuring targeted and effective campaigns.

*Tools and Techniques:*
- Utilize advanced Google Analytics features such as event tracking, custom dimensions, and goal tracking to gain granular insights into user journeys.

- Implement A/B testing and user flow analysis to experiment with and visualize variations in user journeys for optimization.

- Leverage user journey mapping tools to create visual representations of typical user paths, identifying common touchpoints and potential bottlenecks.

*Continuous Optimization:*
- Regularly review and optimize the user journey based on ongoing data analysis, user feedback, and evolving business goals.

By comprehensively understanding and strategically optimizing user journeys, businesses can create a seamless and engaging online experience, driving higher conversions and user satisfaction.


*****


# 1. Loading data

In [2]:
from retentioneering import datasets

# load sample user behavior data:
stream = datasets.load_simple_shop()

In [3]:
stream.to_dataframe().head()

Unnamed: 0,event_id,event_type,event_index,event,timestamp,user_id
0,8621a075-e4db-4d1a-ab79-26c1c82c89d4,path_start,0,path_start,2019-11-01 17:59:13.273932,219483890
1,8621a075-e4db-4d1a-ab79-26c1c82c89d4,raw,0,catalog,2019-11-01 17:59:13.273932,219483890
2,ee13f137-da0e-4d1d-b28a-92d9c419d681,raw,1,product1,2019-11-01 17:59:28.459271,219483890
3,5f3992ce-db91-49b4-a19d-042154ad219d,raw,2,cart,2019-11-01 17:59:29.502214,219483890
4,48cd0eeb-c541-4b1e-a3cd-1e0a5d1df3d2,raw,3,catalog,2019-11-01 17:59:32.557029,219483890


# 2. Visualization  
- User journey visualization represents a stepwise directed graph, as:  
> - Nodes associated with events that appear at a particular step in a user’s trajectory, sorted from left to right according to the ordinal number of step (1, 2, etc).
> - Edges represent how often transition from, say, event A happened at i-th step to event B happened at i+1-th step occurred.
> - The nodes and edges sizes reflect the number of unique users involved.  

In [4]:
stream.step_sankey(max_steps=5, threshold=0.1)

<retentioneering.tooling.step_sankey.step_sankey.StepSankey at 0x7939ab5b7dc0>

## Interpreting above visualization
 1. The visualization reflects connections between successive steps.  
 2. The visualization is interactive, so can hover the nodes and edges and look at the detailed info, move the nodes, and even merge them (to merge use Box Select or Lasso Select tools located at the top-right corner on hover).   
 3. The nodes are grouped into columns in stepwise manner.
 > - The first column corresponds to the events that occurred at the users’ first step.
 > - The second column corresponds to the second step and so on.
 > - The height of a rectangle representing a node is proportional to the frequency this particular event occurred at this particular step.  
 > Ex. Hovering over the node corresponding to the first step 'catalog' event - 2.69K unique users (71.61% of the total users) went through this whereas the main event appeared 1.07K times (28.39% of the users) - that's why the red rectangular (for the catalog event) is ~2.5 times higher than the green rectangular (for the main event). The percentage of the users is calculated with respect to all the users participating in the previous step.   


4. An edge’s width is proportional to the frequency of this transition in the journey.
> - Hovering over the edges reveal frequencies of the unique users along with information on how long a transition took the users on average.  
> Example - transition from catalog (1st step) to catalog (2nd step) appeared in 869 paths, and it took 29 seconds on average.  

5. Terminating event - added at the end of short paths (for paths where length is less than max_steps) so that their length becomes exactly max_path, ensuring that the sum of the user shares over each column (i.e. each step) is exactly equal 1.  
- Example: For 2nd step - 443 users and at the 3rd step ENDED event contains 823 users, and for 443 of them the event have been propagated from the previous step.  