# Exploring Pokémon Battles with Atoti

This notebook explores the Pokémon data from Kaggle in an [OLAP](https://en.wikipedia.org/wiki/Online_analytical_processing) cube created using [Atoti](https://www.atoti.io/):  
- [Tuan Nguyen Van Anh's Pokémon Dataset with Team Combat](https://www.kaggle.com/tuannguyenvananh/pokemon-dataset-with-team-combat)  
- [Mario Tormo Romero's Complete Pokémon Dataset](https://www.kaggle.com/mariotormo/complete-pokemon-dataset-updated-090420).

I will demonstrate how I can create a analytics platform end-to-end:
- Preprocessing: processing data prior to data loading into the cube
- Measure creation: creating KPIs for the story
- Data visualization: creating visualizations in Jupyter notebook during data modelling stage
- Dashboarding: complete storytelling with dashboards using Atoti web application

Bonus:  
Using the [<img src="https://img.shields.io/badge/🔒-Atoti-291A40" />](https://docs.atoti.io/latest/how_tos/unlock_all_features.html#), I have also extended the Atoti web application to create custom widgets.  
The custom widget will invoke the [custom endpoints](https://docs.atoti.io/latest/lib/atoti.session.html#atoti.session.Session.endpoint) to simulate random battles.   

__Note: <img src="https://img.shields.io/badge/🔒-Atoti-291A40" /> is the unlocked version of the free <img src="https://img.shields.io/badge/-Atoti%20CE-AF4D61" />.__

In [1]:
import datetime as dt
import urllib
from datetime import date

import atoti as tt
import numpy as np
import pandas as pd

## 1. Data preprocessing

Atoti can read from various datasources including Pandas. I will make use of Pandas to clean up my data and perform some low level computations before loading data into Atoti.

### 1.1 Pokémon stats data

Before looking at each battle, let's understand the Pokémon a little more. Using Pandas, I renamed the columns into a friendlier convention.  

Since __Atoti will automatically inherit the data type from the DataFrame__, I will cast the data type for the Pandas DataFrame column accordingly.    
For instance, the `Pokedex` column is inferred as `int` by Pandas but I want it to be treated as a `String`. So let's cast it explicitly.

In [2]:
pokemon_df = pd.read_csv(
    "https://data.atoti.io/notebooks/pokemon/pokemon.csv",
    header=0,
    names=[
        "Pokedex",
        "Generation",
        "Pokemon",
        "JP Name",
        "Primary Type",
        "Secondary Type",
        "Classification",
        "Percentage Male",
        "Percentage Female",
        "Height",
        "Weight",
        "Capture Rate",
        "Base Egg Steps",
        "HP",
        "Attack",
        "Defense",
        "SP Attack",
        "SP Defense",
        "Speed",
        "Sub-Legendary",
        "Legendary",
        "Mythical",
    ],
    dtype={"Pokedex": str},
)

pokemon_df.head()

Unnamed: 0,Pokedex,Generation,Pokemon,JP Name,Primary Type,Secondary Type,Classification,Percentage Male,Percentage Female,Height,...,Base Egg Steps,HP,Attack,Defense,SP Attack,SP Defense,Speed,Sub-Legendary,Legendary,Mythical
0,1,I,Bulbasaur,Fushigidane,grass,poison,Seed,88.14,11.86,0.7,...,5120,45,49,49,65,65,45,0,0,0
1,2,I,Ivysaur,Fushigisou,grass,poison,Seed,88.14,11.86,1.0,...,5120,60,62,63,80,80,60,0,0,0
2,3,I,Venusaur,Fushigibana,grass,poison,Seed,88.14,11.86,2.0,...,5120,80,82,83,100,100,80,0,0,0
3,4,I,Charmander,Hitokage,fire,,Lizard,88.14,11.86,0.6,...,5120,39,52,43,60,50,65,0,0,0
4,5,I,Charmeleon,Lizardo,fire,,Flame,88.14,11.86,1.1,...,5120,58,64,58,80,65,80,0,0,0


I want to combine the 3 columns-`Sub-Legendary`, `Legendary` and `Mythical` into a boolean column `(Sub)Legendary or Mythical` and drop the columns that are not being used.

In [3]:
pokemon_df["(Sub)Legendary or Mythical"] = False
pokemon_df.loc[
    (pokemon_df["Sub-Legendary"] == 1)
    | (pokemon_df["Legendary"] == 1)
    | (pokemon_df["Mythical"] == 1),
    "(Sub)Legendary or Mythical",
] = True

In [4]:
pokemon_df.drop(
    columns=[
        "Sub-Legendary",
        "Legendary",
        "Mythical",
        "JP Name",
        "Primary Type",
        "Secondary Type",
    ],
    inplace=True,
)

pokemon_df.head()

Unnamed: 0,Pokedex,Generation,Pokemon,Classification,Percentage Male,Percentage Female,Height,Weight,Capture Rate,Base Egg Steps,HP,Attack,Defense,SP Attack,SP Defense,Speed,(Sub)Legendary or Mythical
0,1,I,Bulbasaur,Seed,88.14,11.86,0.7,6.9,45,5120,45,49,49,65,65,45,False
1,2,I,Ivysaur,Seed,88.14,11.86,1.0,13.0,45,5120,60,62,63,80,80,60,False
2,3,I,Venusaur,Seed,88.14,11.86,2.0,100.0,45,5120,80,82,83,100,100,80,False
3,4,I,Charmander,Lizard,88.14,11.86,0.6,8.5,45,5120,39,52,43,60,50,65,False
4,5,I,Charmeleon,Flame,88.14,11.86,1.1,19.0,45,5120,58,64,58,80,65,80,False


### 1.2 Pokémon's combat data

This dataset gives information on each battle such as the battling Pokémon, the battle results and the number of views.  
Using Pandas, I've renamed the columns during the read of the combats.csv file and cast the IDs to `String`.

In [5]:
combat_df = pd.read_csv(
    "https://data.atoti.io/notebooks/pokemon/combats.csv",
    header=0,
    names=[
        "Registration ID",
        "Combat ID",
        "Pokemon ID",
        "Opponent Pokemon ID",
        "Result",
        "Type Multiplier",
        "Replay Views",
        "Live Views (Start - End)",
    ],
    dtype={
        "Registration ID": str,
        "Combat ID": str,
        "Pokemon ID": str,
        "Opponent Pokemon ID": str,
    },
)

combat_df.head()

Unnamed: 0,Registration ID,Combat ID,Pokemon ID,Opponent Pokemon ID,Result,Type Multiplier,Replay Views,Live Views (Start - End)
0,1,1,266,298,losE,0.5,29629,15306;1664
1,2,1,298,266,WIn,1.0,109229,83732;75822
2,3,2,702,701,LOSE,2.0,27634,4533;2656
3,4,2,701,702,WIN,0.5,82359,9983;40424
4,5,3,191,668,LOSE,0.5,71664,54796;16377


The `Result` column is of random casing. Let's set the column values to a standardized casing, uppercase i.

In [6]:
combat_df["Result"] = combat_df["Result"].str.upper()
combat_df.head()

Unnamed: 0,Registration ID,Combat ID,Pokemon ID,Opponent Pokemon ID,Result,Type Multiplier,Replay Views,Live Views (Start - End)
0,1,1,266,298,LOSE,0.5,29629,15306;1664
1,2,1,298,266,WIN,1.0,109229,83732;75822
2,3,2,702,701,LOSE,2.0,27634,4533;2656
3,4,2,701,702,WIN,0.5,82359,9983;40424
4,5,3,191,668,LOSE,0.5,71664,54796;16377


I split the `Live Views (Start - End)` column by the delimiter `;` to get the viewership at the start and the end of the combat.

In [7]:
combat_df[["Live Views Start", "Live Views End"]] = combat_df[
    "Live Views (Start - End)"
].str.split(";", n=1, expand=True)
combat_df[["Live Views Start", "Live Views End"]] = combat_df[
    ["Live Views Start", "Live Views End"]
].astype(int)

combat_df.head()

Unnamed: 0,Registration ID,Combat ID,Pokemon ID,Opponent Pokemon ID,Result,Type Multiplier,Replay Views,Live Views (Start - End),Live Views Start,Live Views End
0,1,1,266,298,LOSE,0.5,29629,15306;1664,15306,1664
1,2,1,298,266,WIN,1.0,109229,83732;75822,83732,75822
2,3,2,702,701,LOSE,2.0,27634,4533;2656,4533,2656
3,4,2,701,702,WIN,0.5,82359,9983;40424,9983,40424
4,5,3,191,668,LOSE,0.5,71664,54796;16377,54796,16377


Using Pandas, I will compute the rate of change in views from the start to the end of the battle. This could also be done in the cube using Atoti's aggregation functions. However, as these statistics at the granular level will not change, computing it before loading into the cube will save some computation resources at query time (Atoti computes on the fly).

In [8]:
combat_df["Live Views Change"] = (
    combat_df["Live Views End"] - combat_df["Live Views Start"]
) / combat_df["Live Views Start"]
combat_df.head()

Unnamed: 0,Registration ID,Combat ID,Pokemon ID,Opponent Pokemon ID,Result,Type Multiplier,Replay Views,Live Views (Start - End),Live Views Start,Live Views End,Live Views Change
0,1,1,266,298,LOSE,0.5,29629,15306;1664,15306,1664,-0.891284
1,2,1,298,266,WIN,1.0,109229,83732;75822,83732,75822,-0.094468
2,3,2,702,701,LOSE,2.0,27634,4533;2656,4533,2656,-0.414075
3,4,2,701,702,WIN,0.5,82359,9983;40424,9983,40424,3.049284
4,5,3,191,668,LOSE,0.5,71664,54796;16377,54796,16377,-0.701128


Since I have the number of live views at the start and end of the battle, I will drop the column `Live Views (Start - End)`.

In [9]:
combat_df.drop(columns=["Live Views (Start - End)"], inplace=True)

## 2. Datasets used for simulations  
Later on in the use case, I will be performing some what-if simulations by modifying some data:
- Number of tickets sold in `Combat info` dataset

While Atoti is able to load CSV directly into its table, here I'm going to load the data into Pandas DataFrame first.  
This is to facilitate data manipulation later on for the [source simulations](https://docs.atoti.io/latest/tutorial/tutorial.html#Source-simulation) at the end of the notebook.

### 2.1 Combat info   

This dataset gives the number of tickets sold for each combat and the date of the battle.  

In [10]:
combat_info_df = pd.read_csv(
    "combatinfo (1).csv",
    dtype={"Season": str, "Combat ID": str, "Location ID": str},
)
combat_info_df["Date"] = pd.to_datetime(
    combat_info_df["Date"], format="%d/%m/%Y"
).dt.date

combat_info_df.head()

Unnamed: 0,Combat ID,Season,Location ID,Tickets Sold,Date
0,1,1,1,1959,2253-02-18
1,2,1,2,1921,2253-02-18
2,3,1,3,1936,2253-02-18
3,4,1,4,1737,2253-02-18
4,5,1,5,1971,2253-02-18


## 3. Getting started with Atoti

Now that I have cleaned up our data, I can get started with Atoti.  

### 3.0 Switching between Atoti and Atoti Community Edition 

<img src="https://img.shields.io/badge/-Atoti%20CE-AF4D61" /> 🔛 <img src="https://img.shields.io/badge/🔒-Atoti-291A40" />  

Switch from Atoti Community Edition to Atoti unlocked version by updating the below variable to `True`. 

In [11]:
atoti_unlock = False

By setting the above variable to `True`, we can leverage Atoti's locked feature to create customized widgets for the Atoti UI.  
**If you don't have an Atoti license yet, [register online for a trial license](https://atoti.io/evaluation-license-request/) to [unlock all the features of Atoti](https://docs.atoti.io/latest/how_tos/unlock_all_features.html).**

Setting the above variable to `False` will continue to use the Atoti Community Edition. No licensing setup is required..

Below shows the configuration(s) required to register the custom widgets in this use case.

In [12]:
plus_app_ext = (
    # {"@atoti/pokemon-extension": "./extensions/dist"} if atoti_unlock else {}
    {"@atoti/pokemon-extension": "./ui/dist"} if atoti_unlock else {}
)

In [13]:
# from utils import build

# if atoti_unlock:
#     build.download_dist()

Let's proceed with basic Atoti and proceed to create an Atoti session. 

### 3.1 Create Atoti session
Using the [session configuration](https://docs.atoti.io/latest/lib/atoti.config.session_config.html#atoti.config.session_config.SessionConfig), I can:
1. fix the `port` that will be used by the Atoti web application
2. create `user_content_storage` to save our dashboards. Otherwise, dashboards will be stored in transient memory and will be gone when the session is gone.

In [14]:
session = tt.Session(
    port=8080,
    user_content_storage="./content",
    app_extensions=plus_app_ext,
    java_options=["-Xms1G", "-Xmx12G"],
)

### 3.2 Create Atoti tables  

Atoti can load data from a variety of datasources, such as CSV and Pandas Dataframe as demonstrated below.

In [15]:
type_t = session.read_csv(
    "s3://data.atoti.io/notebooks/pokemon/type_reg.csv",
    table_name="Types",
    keys=["ID"],
    types={"ID": tt.type.STRING, "Registration ID": tt.type.STRING},
)

type_t.head(3)

Unnamed: 0_level_0,Pokemon Type,Opponent Pokemon Type,Registration ID
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
19001,Flying,Fire,10709
3,Normal,Bug,2
13769,Flying,Rock,7793


In [16]:
combat_t = session.read_pandas(
    combat_df,
    table_name="Combats",
    keys=["Registration ID"],
    types={
        "Replay Views": tt.type.LONG,
        "Live Views Change": tt.type.DOUBLE,
        "Live Views Start": tt.type.LONG,
        "Live Views End": tt.type.LONG,
    },
)

combat_t.head(3)

Unnamed: 0_level_0,Combat ID,Pokemon ID,Opponent Pokemon ID,Result,Type Multiplier,Replay Views,Live Views Start,Live Views End,Live Views Change
Registration ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
19,10,302,31,LOSE,1.0,66777,46798,38510,-0.177102
39,20,563,578,WIN,2.0,103339,60255,89631,0.487528
59,30,694,747,WIN,1.5,59205,15409,49521,2.213771


In [17]:
combat_info_t = session.read_pandas(
    combat_info_df, table_name="Combat Info", keys=["Combat ID"]
)

The location dataset gives detail to the location where the combats are held and the costs of hosting a battle.

In [18]:
location_t = session.read_csv(
    "s3://data.atoti.io/notebooks/pokemon/locations.csv",
    table_name="Locations",
    keys=["Location ID"],
    types={
        "Location ID": tt.type.STRING,
        "Stadium Cost": tt.type.DOUBLE,
        "Ticket Price": tt.type.DOUBLE,
    },
)

location_t.head(3)

Unnamed: 0_level_0,Region,Settlement,Stadium Cost,Ticket Price,Stadium Capacity
Location ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
19,Johto,Safari Zone Gate,16000.0,15.0,2000
20,Kanto,Pallet Town,6000.0,10.0,2000
1,Alola,Iki Town,17000.0,15.0,2000


I will load the Pokémon stats dataset into 2 separate tables as:
1. the battling Pokémon
2. the opponent Pokémon (columns renamed)

In [19]:
pokemon_t = session.read_pandas(pokemon_df, table_name="Pokemon", keys=["Pokedex"])
pokemon_t.head(3)

Unnamed: 0_level_0,Generation,Pokemon,Classification,Percentage Male,Percentage Female,Height,Weight,Capture Rate,Base Egg Steps,HP,Attack,Defense,SP Attack,SP Defense,Speed,(Sub)Legendary or Mythical
Pokedex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
19,I,Rattata,Mouse,50.0,50.0,0.3,3.5,255,3840,30,56,35,25,35,72,False
39,I,Jigglypuff,Balloon,24.9,75.1,0.5,5.5,170,2560,115,45,20,45,25,20,False
59,I,Arcanine,Legendary,75.49,24.51,1.9,155.0,75,5120,90,110,80,100,80,95,False


In [20]:
pokemon_df.columns = map(lambda n: "Opponent " + n, pokemon_df.columns)

In [21]:
o_pokemon_t = session.read_pandas(
    pokemon_df, table_name="Opponent Pokemon", keys=["Opponent Pokedex"]
)
o_pokemon_t.head(3)

Unnamed: 0_level_0,Opponent Generation,Opponent Pokemon,Opponent Classification,Opponent Percentage Male,Opponent Percentage Female,Opponent Height,Opponent Weight,Opponent Capture Rate,Opponent Base Egg Steps,Opponent HP,Opponent Attack,Opponent Defense,Opponent SP Attack,Opponent SP Defense,Opponent Speed,Opponent (Sub)Legendary or Mythical
Opponent Pokedex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
19,I,Rattata,Mouse,50.0,50.0,0.3,3.5,255,3840,30,56,35,25,35,72,False
39,I,Jigglypuff,Balloon,24.9,75.1,0.5,5.5,170,2560,115,45,20,45,25,20,False
59,I,Arcanine,Legendary,75.49,24.51,1.9,155.0,75,5120,90,110,80,100,80,95,False


### 3.3 Create multidimensional data cube  

Choose the data table with the most granular level data to be the base table for cube creation.

In [22]:
cube_name = "Pokemon battles"
cube = session.create_cube(type_t, name=cube_name)

### 3.4 Join referenced tables

I create joins between tables as show below. It is not necessary to specify the joining conditions explicity if the column names are matching.  
Atoti automatically infers the conditional joins based on the common column names between the tables.

In [23]:
type_t.join(combat_t, type_t["Registration ID"] == combat_t["Registration ID"])
combat_t.join(pokemon_t, combat_t["Pokemon ID"] == pokemon_t["Pokedex"])
combat_t.join(
    o_pokemon_t, combat_t["Opponent Pokemon ID"] == o_pokemon_t["Opponent Pokedex"]
)
combat_t.join(combat_info_t, combat_t["Combat ID"] == combat_info_t["Combat ID"])
combat_info_t.join(
    location_t, combat_info_t["Location ID"] == location_t["Location ID"]
)

In [None]:
session.tables.schema

## 4. Cube manipulation 

Aliasing the cube attributes makes it easier to work with during measure creations and hierarchy manipulations.

In [None]:
h, m, l = cube.hierarchies, cube.measures, cube.levels

In [None]:
h

In [None]:
m

### 4.1 Hierarchy manipulation

Here I am going to clean up the hierarchies automatically created that I don't need, and create others.  
__Note:__ I can also [create the cube in `manual` mode](https://docs.atoti.io/latest/lib/atoti.session.html#atoti.session.Session.create_cube) if I prefer to create all measures and hierarchies manually.

#### 4.1.1 Create multilevels hierarchy

In [None]:
h["Location"] = [l["Region"], l["Settlement"]]

#### 4.1.2 Reassign dimensions 

In [None]:
h["Date"].dimension = "Time"
h["Season"].dimension = "Time"
h["Pokemon Type"].dimension = "Pokemon"
h["Opponent Pokemon Type"].dimension = "Opponent Pokemon"
h["Result"].dimension = "Win-Lose"
h["Location"].dimension = "Location"

#### 4.1.3 Create date hierarchy from date column

In [None]:
cube.create_date_hierarchy(
    "Date",
    column=combat_info_t["Date"],
    levels={"Year": "yyyy", "Month": "MMM", "Day": "dd"},
)

In [None]:
session.visualize()

#### 4.1.4 Sort members of a level

Using [Atoti comparators](https://docs.atoti.io/latest/lib/atoti.comparator.html#atoti.comparator.Comparator), I can order the members of a level in ascending, descending or a given order.

In [None]:
l["Result"].order = tt.CustomOrder(first_elements=["WIN", "LOSE"])

Ordering the Pokémon types by the standard:

In [None]:
order_list = [
    "Normal",
    "Fire",
    "Water",
    "Electric",
    "Grass",
    "Ice",
    "Fighting",
    "Poison",
    "Ground",
    "Flying",
    "Psychic",
    "Bug",
    "Rock",
    "Ghost",
    "Dragon",
    "Dark",
    "Steel",
    "Fairy",
]

l["Pokemon Type"].order = tt.CustomOrder(first_elements=order_list)
l["Opponent Pokemon Type"].order = tt.CustomOrder(first_elements=order_list)

In [None]:
session.visualize("Sorted members for Pokemon Type level")

#### 4.1.5 Delete hierarchies

In [None]:
del h["Opponent Pokemon ID"]
del h["Opponent Classification"]
del h["Opponent Generation"]

del h["Region"]
del h["Settlement"]

#### 4.1.6 Hide hierarchies

These are some hierarchies needed for measures later on that do not need to be visible.

In [None]:
h["ID"].visible = False
h["ID"].dimension = "Hidden"
h["Registration ID"].visible = False
h["Registration ID"].dimension = "Hidden"
h["Combat ID"].visible = False
h["Combat ID"].dimension = "Hidden"
h["Location ID"].visible = False
h["Location ID"].dimension = "Hidden"
h["Pokemon ID"].visible = False
h["Pokemon ID"].dimension = "Hidden"

In [None]:
h

### 4.2 Measures manipulation

Atoti has many [aggregation functions](https://docs.atoti.io/latest/lib/atoti.agg.html#module-atoti.agg) available. Next, I'll be using them to [create new measures](https://docs.atoti.io/latest/tutorial/tutorial.html#New-measures). 

#### 4.2.1 Measures creation

*__Data stats related measures__*  
Below are some measures that gives overall info about the dataset.  

In [None]:
m["No. of Matches"] = tt.agg.count_distinct(combat_t["Combat ID"])
m["No. of Pokemon"] = tt.agg.count_distinct(pokemon_t["Pokedex"])
m["No. of Regions"] = tt.agg.count_distinct(location_t["Region"])
m["No. of Locations"] = tt.agg.count_distinct(location_t["Location ID"])
m["No. of Seasons"] = tt.agg.count_distinct(combat_info_t["Season"])
m["No. of Days"] = tt.agg.count_distinct(combat_info_t["Date"])

After creating those measures, I get a nice overall view of the dataset like this:

In [None]:
session.visualize("Overview of Data")

`contributors.COUNT` is created automatically when the cube is created. It gives the number of facts contributing to the query.  
As we can see, there are 50k unique matches involving 784 Pokémon out of the 177k records, spanning across 500 days.

In [None]:
session.visualize("No. of Matches Held Over Seasons")

*__Battle results related measures__*  

Here I am creating the measures revolving around win/lose stats.

In [None]:
m["Win"] = tt.where(
    tt.filter(m["No. of Matches"], l["Result"] == "WIN").isnull(),
    0,
    tt.filter(m["No. of Matches"], l["Result"] == "WIN"),
)

m["Lose"] = tt.where(
    tt.filter(m["No. of Matches"], l["Result"] == "LOSE").isnull(),
    0,
    tt.filter(m["No. of Matches"], l["Result"] == "LOSE"),
)

In [None]:
m["Win Rate"] = m["Win"] / (m["Win"] + m["Lose"])
m["Win Rate"].formatter = "DOUBLE[0.00%]"

With the Win Rate measure, we can take a look at the win rate for each Pokémon type:

In [None]:
session.visualize("Win Rate For Each Pokémon Type")

Next I am going to calculate the trend win rate for a Pokémon over the seasons:

In [None]:
m["Trend Win Rate"] = tt.where(
    l["Season"] != "1",
    m["Win Rate"] - tt.shift(m["Win Rate"], h["Season"], offset=-1),
)
m["Trend Win Rate"].formatter = "DOUBLE[0.00%]"

Here is Chikorita's trend win rate:

In [None]:
session.visualize("Trend Win Rate For Chikorita")

Calculating max win rate scoped by each Pokémon in order to find the highest win rate among a group of Pokémon:

In [None]:
m["Max Win Rate (Pokemon)"] = tt.agg.max(
    m["Win Rate"], scope=tt.OriginScope(l["Pokemon"])
)
m["Max Win Rate (Pokemon)"].formatter = "DOUBLE[0.00%]"

This is something similar for max win rate by season:

In [None]:
m["Max Win Rate (Season)"] = tt.agg.max(
    m["Win Rate"], scope=tt.OriginScope(l["Season"])
)
m["Max Win Rate (Season)"].formatter = "DOUBLE[0.00%]"

So now, I can look at Psyduck's highest win rate out of all the seasons:

In [None]:
session.visualize("Max Win Rate (Season) for Psyduck")

With the max win rate measure, I can pick out all the Pokémon whose highest win rate over all the seasons have never gone above 30% :(

In [None]:
m["Boost this Pokemon"] = tt.where(
    (m["Max Win Rate (Season)"] < 0.3) & (l["Season"].isnull()), "BOOST", ""
)

Hopefully these Pokémon will get boosted soon.

In [None]:
session.visualize("Boost Pokemon")

*__Pokémon stats related measures__*

Here I will store Pokémon stat aggregates. These are scoped by Registration ID since our base dataset includes duplicated data.

In [None]:
m["Avg HP"] = tt.agg.mean(
    tt.agg.single_value(pokemon_t["HP"]), scope=tt.OriginScope(l["Registration ID"])
)

m["Avg Def"] = tt.agg.mean(
    tt.agg.single_value(pokemon_t["Defense"]),
    scope=tt.OriginScope(l["Registration ID"]),
)

m["Avg SP Def"] = tt.agg.mean(
    tt.agg.single_value(pokemon_t["SP Defense"]),
    scope=tt.OriginScope(l["Registration ID"]),
)

m["Avg Atk"] = tt.agg.mean(
    tt.agg.single_value(pokemon_t["Attack"]), scope=tt.OriginScope(l["Registration ID"])
)

m["Avg SP Atk"] = tt.agg.mean(
    tt.agg.single_value(pokemon_t["SP Attack"]),
    scope=tt.OriginScope(l["Registration ID"]),
)

m["Avg Speed"] = tt.agg.mean(
    tt.agg.single_value(pokemon_t["Speed"]), scope=tt.OriginScope(l["Registration ID"])
)

m["Avg Type Multiplier"] = tt.agg.mean(
    tt.agg.single_value(combat_t["Type Multiplier"]),
    scope=tt.OriginScope(l["Registration ID"]),
)

Now I can compare the average speed of the Pokémon in winning vs losing battles.

In [None]:
session.visualize("Average Speed")

*__Replay Views related measures__*

In [None]:
replay_views = tt.agg.single_value(combat_t["Replay Views"])

In [None]:
m["Total Replay Views"] = tt.agg.sum(
    replay_views,
    scope=tt.OriginScope(l["Registration ID"]),
)

In [None]:
m["Avg Replay Views"] = tt.agg.mean(
    replay_views,
    scope=tt.OriginScope(l["Registration ID"]),
)

In [None]:
m["Max Replay Views"] = tt.agg.max(
    replay_views, scope=tt.OriginScope(l["Registration ID"])
)
m["Min Replay Views"] = tt.agg.min(
    replay_views, scope=tt.OriginScope(l["Registration ID"])
)

In [None]:
session.visualize("Replay Views")

In [None]:
session.visualize("Most Popular Pokémon (Replay Views)")

*__Live views related measures__*

_Change_

Use the [`formatter`](https://docs.atoti.io/latest/lib/atoti.measure.html#atoti.measure.Measure.formatter) attribute to format the measure.

In [None]:
live_views_change = tt.agg.single_value(combat_t["Live Views Change"])

In [None]:
m["Avg % Change"] = tt.agg.mean(
    live_views_change,
    scope=tt.OriginScope(l["Registration ID"]),
)
m["Avg % Change"].formatter = "DOUBLE[0.00%]"

In [None]:
m["Max % Change"] = tt.agg.max(
    live_views_change,
    scope=tt.OriginScope(l["Registration ID"]),
)
m["Max % Change"].formatter = "DOUBLE[0.00%]"

In [None]:
m["Min % Change"] = tt.agg.min(
    live_views_change,
    scope=tt.OriginScope(l["Registration ID"]),
)
m["Min % Change"].formatter = "DOUBLE[0.00%]"

_Start Views_

In [None]:
live_views_start = tt.agg.single_value(combat_t["Live Views Start"])

In [None]:
m["Avg Start Views"] = tt.agg.mean(
    live_views_start,
    scope=tt.OriginScope(l["Registration ID"]),
)

In [None]:
m["Max Start Views"] = tt.agg.max(
    live_views_start,
    scope=tt.OriginScope(l["Registration ID"]),
)

m["Min Start Views"] = tt.agg.min(
    live_views_start,
    scope=tt.OriginScope(l["Registration ID"]),
)

_End Views_

In [None]:
live_views_end = tt.agg.single_value(combat_t["Live Views End"])

In [None]:
m["Avg End Views"] = tt.agg.mean(
    live_views_end,
    scope=tt.OriginScope(l["Registration ID"]),
)

In [None]:
m["Max End Views"] = tt.agg.max(
    live_views_end,
    scope=tt.OriginScope(l["Registration ID"]),
)

m["Min End Views"] = tt.agg.min(
    live_views_end,
    scope=tt.OriginScope(l["Registration ID"]),
)

In [None]:
session.visualize("Live Views by W/L")

*__Sales related measures__*

_Stadium capacity_

In [None]:
m["Tickets Sold"] = tt.agg.sum(
    tt.agg.single_value(combat_info_t["Tickets Sold"]),
    scope=tt.OriginScope(l["Combat ID"]),
)
m["Tickets Sold"].formatter = "INT[000,000]"

In [None]:
m["Stadium Capacity"] = tt.agg.max(
    tt.agg.single_value(location_t["Stadium Capacity"]),
    scope=tt.OriginScope(l["Settlement"]),
)

In [None]:
m["Avg Stadium Capacity"] = tt.agg.mean(
    m["Tickets Sold"] / m["Stadium Capacity"],
    scope=tt.OriginScope(l["Combat ID"]),
)
m["Avg Stadium Capacity"].formatter = "DOUBLE[0.00%]"

In [None]:
m["Max Stadium Capacity"] = tt.agg.max(
    m["Tickets Sold"] / m["Stadium Capacity"],
    scope=tt.OriginScope(l["Combat ID"]),
)

m["Min Stadium Capacity"] = tt.agg.min(
    m["Tickets Sold"] / m["Stadium Capacity"],
    scope=tt.OriginScope(l["Combat ID"]),
)

m["Max Stadium Capacity"].formatter = "DOUBLE[0.00%]"
m["Min Stadium Capacity"].formatter = "DOUBLE[0.00%]"

In [None]:
session.visualize("Average Stadium Capacity by Location")

_Stadium cost_

In [None]:
stadium_cost = tt.agg.single_value(location_t["Stadium Cost"])

In [None]:
m["Avg Stadium Cost"] = tt.agg.mean(stadium_cost, scope=tt.OriginScope(l["Combat ID"]))

In [None]:
m["Max Stadium Cost"] = tt.agg.max(stadium_cost, scope=tt.OriginScope(l["Settlement"]))

m["Min Stadium Cost"] = tt.agg.min(stadium_cost, scope=tt.OriginScope(l["Settlement"]))

_Ticket price_

In [None]:
ticket_price = tt.agg.single_value(location_t["Ticket Price"])

In [None]:
m["Avg Ticket Price"] = tt.agg.mean(ticket_price, scope=tt.OriginScope(l["Combat ID"]))

In [None]:
m["Max Ticket Price"] = tt.agg.max(ticket_price, scope=tt.OriginScope(l["Settlement"]))
m["Min Ticket Price"] = tt.agg.min(ticket_price, scope=tt.OriginScope(l["Settlement"]))

_Number of matches where tickets are all sold_

In [None]:
m["No. of Matches at Full Stadium Capacity"] = tt.agg.sum(
    tt.where(
        m["Tickets Sold"] == m["Stadium Capacity"],
        1,
        0,
    ),
    scope=tt.OriginScope(l["Combat ID"]),
)

In [None]:
m["Percentage of Matches at Full Stadium Capacity"] = (
    m["No. of Matches at Full Stadium Capacity"] / m["No. of Matches"]
)
m["Percentage of Matches at Full Stadium Capacity"].formatter = "DOUBLE[0.00%]"

In [None]:
session.visualize("% of Matches at Full Stadium Capacity")

_Profit_

In [None]:
m["Profit"] = tt.agg.sum(
    (m["Tickets Sold"] * m["Avg Ticket Price"]) - m["Avg Stadium Cost"],
    scope=tt.OriginScope(l["Combat ID"]),
)

In [None]:
m["Profit Location Ratio"] = m["Profit"] / tt.parent_value(
    m["Profit"], degrees={h["Location"]: 1}
)
m["Profit Location Ratio"].formatter = "DOUBLE[0.00%]"

In [None]:
session.visualize("Profit Location Ratio")

*__Match count related measures__*

In [None]:
m["Match Location Ratio"] = m["No. of Matches"] / tt.parent_value(
    m["No. of Matches"], degrees={h["Location"]: 1}
)
m["Match Location Ratio"].formatter = "DOUBLE[0.00%]"

In [None]:
session.visualize("Match Location Ratio")

In [None]:
m["Match Season Ratio"] = m["No. of Matches"] / tt.parent_value(
    m["No. of Matches"], degrees={h["Season"]: 1}
)
m["Match Season Ratio"].formatter = "DOUBLE[0.00%]"

#### 4.2.2 Organize measures into folder

Group the measures into folders based on the nature of the KPI:

In [None]:
for measure in [
    m["No. of Matches"],
    m["No. of Pokemon"],
    m["No. of Regions"],
    m["No. of Locations"],
    m["No. of Seasons"],
    m["No. of Days"],
]:
    measure.folder = "Data"

In [None]:
for measure in [
    m["Win"],
    m["Lose"],
    m["Win Rate"],
    m["Trend Win Rate"],
    m["Max Win Rate (Pokemon)"],
    m["Max Win Rate (Season)"],
    m["Boost this Pokemon"],
]:
    measure.folder = "Results"

In [None]:
for measure in [
    m["Avg HP"],
    m["Avg Def"],
    m["Avg SP Def"],
    m["Avg Atk"],
    m["Avg SP Atk"],
    m["Avg Speed"],
    m["Avg Type Multiplier"],
]:
    measure.folder = "Pokemon Stats"

In [None]:
for measure in [
    m["Total Replay Views"],
    m["Avg Replay Views"],
    m["Max Replay Views"],
    m["Min Replay Views"],
]:
    measure.folder = "Replay Views"

In [None]:
for measure in [
    m["Avg % Change"],
    m["Max % Change"],
    m["Min % Change"],
    m["Avg Start Views"],
    m["Max Start Views"],
    m["Min Start Views"],
    m["Avg End Views"],
    m["Max End Views"],
    m["Min End Views"],
]:
    measure.folder = "Live Views"

In [None]:
for measure in [
    m["Stadium Capacity"],
    m["Avg Stadium Capacity"],
    m["Max Stadium Capacity"],
    m["Min Stadium Capacity"],
    m["Avg Stadium Cost"],
    m["Max Stadium Cost"],
    m["Min Stadium Cost"],
    m["Avg Ticket Price"],
    m["Max Ticket Price"],
    m["Min Ticket Price"],
    m["No. of Matches at Full Stadium Capacity"],
    m["Percentage of Matches at Full Stadium Capacity"],
    m["Profit"],
    m["Profit Location Ratio"],
    m["Tickets Sold"],
]:
    measure.folder = "Sales"

In [None]:
for measure in [m["Match Location Ratio"], m["Match Season Ratio"]]:
    measure.folder = "Match Count"

## 5. BI Web application

Access the BI web application using the below URL. Previously I have hardcoded the port for the web application to 8080. 

In [None]:
session.link()

I can share the dashboard via its URL:

In [None]:
session.link(path="/#/dashboard/c31")

## 6. Atoti what-if simulation: Source simulation

Atoti has 2 [types of simulations](https://docs.atoti.io/latest/tutorial/tutorial.html#Simulations): 
- Source simulation
- Parameter simulation 

Using source simulation, I will modify the Pandas DataFrame containing the data on combat info to simulate a decrease in ticket sales.  
Using parameter simulation, parameter measures will be created to simulate the impact on profit when there's an increase in stadium cost and ticket price.

### 6.1 Simulate Ticket Sales

Below, I reduce the number of tickets sold for the Location ID 5 by 20%.

In [None]:
combat_info_s1 = combat_info_df.copy()

In [None]:
combat_info_s1["Tickets Sold"] = np.where(
    combat_info_s1["Location ID"] == "5",
    round(combat_info_s1["Tickets Sold"] * 0.8),
    combat_info_s1["Tickets Sold"],
)

In [None]:
combat_info_t.scenarios["Viridian Tickets Sold simulation"].load_pandas(combat_info_s1)

I can see the data changed for Kanto > Viridian city changed and the profit decreases when the number of tickets sold reduces.

In [None]:
session.visualize("Impact of tickets sold on profit")

### 6.2 Simulate Stadium Cost & Ticket Price   

I set up a simulation that allow me to adjust the cost of stadium and its ticket pricing for a given Location ID.

In [None]:
cost_simulation = cube.create_parameter_simulation(
    "Cost Simulation",
    levels=[l["Location ID"], l["Region"]],
    measures={"Simulated Stadium Cost": None, "Simulated Ticket Price": None},
)

I create a scenario in the simulation where the cost increases for the `Location ID` 5:
- Stadium cost goes from 25,000 to 28,000
- Ticket price goes from \\$20 to \\$25.

In [None]:
cost_simulation += ("Cost increase", "5", None, 28000, 25)

I need to associate the newly create measures- `m["Simulated Stadium Cost"]` and `m["Simulated Ticket Price"]` with my stadium cost and ticket price.  
To do so, I update the existing definition of the below measures to consider the simulated values if they are not `None`.

In [None]:
m["Avg Stadium Cost"] = tt.agg.mean(
    tt.where(
        ~m["Simulated Stadium Cost"].isnull(), m["Simulated Stadium Cost"], stadium_cost
    ),
    scope=tt.OriginScope(l["Combat ID"], l["Location ID"]),
)


m["Avg Ticket Price"] = tt.agg.mean(
    tt.where(
        ~m["Simulated Ticket Price"].isnull(), m["Simulated Ticket Price"], ticket_price
    ),
    scope=tt.OriginScope(l["Combat ID"], l["Location ID"]),
)

Since profit is affected by the `m["Avg Stadium Cost"]` and `m["Avg Ticket Price"]`, the simulated values caused the profit to increase accordingly.  
The below visualization stacks the source simulation against the parameter simulation.

In [None]:
session.visualize("Viridian What-if simulation")

## 7. Endpoints for custom UI plugins 

The below sections shows how I created endpoints to trigger custom behaviours from the web application to the back-end.  

### 7.1 Helper functions  

I have created some Python functions to help determine which Pokémon wins the match. These functions are available in the [helper.py](utils/helper.py).

In [None]:
from utils import helper

helper_util = helper.Helper(session, cube_name)

### 7.2 Endpoints  

I can create custom [endpoints](https://docs.atoti.io/latest/lib/atoti/atoti.session.html#atoti.Session.endpoint) with Atoti. 
This saves me from the trouble of having to add [FastAPI](https://fastapi.tiangolo.com/) or [Flask server](https://flask.palletsprojects.com/) to the project.  

__Note: Custom endpoint is available for <img src="https://img.shields.io/badge/-Atoti%20CE-AF4D61" />. However, in my use case, I'm triggering the endpoints from the custom widgets in the web application. Therefore, I would need <img src="https://img.shields.io/badge/🔒-Atoti-291A40" />.__

In [None]:
@session.endpoint("levels/{level_name}/members", method="GET")
def get_level_members(request, user, session):

    level_name = urllib.parse.unquote(request.path_parameters["level_name"])
    dimension_name = session.cubes[cube_name].levels[level_name].dimension
    hierarchy_name = session.cubes[cube_name].levels[level_name].hierarchy
    mdx = f"SELECT NON EMPTY [{dimension_name}].[{hierarchy_name}].[{level_name}].Members ON 0 FROM [{cube_name}]"
    df = session.query_mdx(mdx, keep_totals=False, timeout=dt.timedelta(seconds=30))
    members = df.index.to_list()
    if isinstance(members[0], date):
        members = list(map(lambda x: x.isoformat(), members))
    return members

In [None]:
@session.endpoint("measures/win-rate/{pokemon}/{opponent}", method="GET")
def get_win_rate(request, user, session):
    pokemon = urllib.parse.unquote(request.path_parameters["pokemon"])
    opponent = urllib.parse.unquote(request.path_parameters["opponent"])
    
    mdx = f""" SELECT [Measures].[Win Rate] ON COLUMNS FROM [{cube_name}]
                WHERE (
                    [Pokemon].[Pokemon].[AllMember].[{pokemon}],
                    [Opponent Pokemon].[Opponent Pokemon].[AllMember].[{opponent}] )"""
    df = session.query_mdx(mdx, keep_totals=False, timeout=dt.timedelta(seconds=30))

    if (df.size > 0) & (~pd.isna(df["Win Rate"][0])):
        return df["Win Rate"][0]
    else:
        return None

In [None]:
@session.endpoint("battle/{pokemon}/{opponent}/{winner}", method="GET")
def battle(request, user, session):
    pokemon = urllib.parse.unquote(request.path_parameters["pokemon"])
    opponent = urllib.parse.unquote(request.path_parameters["opponent"])
    winner = urllib.parse.unquote(request.path_parameters["winner"])
    helper_util.generate_new_battle(pokemon, opponent, pokemon == winner)
    return "OK"

### <img src="https://img.shields.io/badge/🔒-Atoti-291A40" /> Invoke endpoint from Custom UI

Refer to the [javascript extension](extensions/src) for the web application to see how I invoke the endpoints with custom widgets.  
Follow the [README.md](extensions/README.md) on how to build the javascript project. To use the custom web application, configure the `dist` when instantiating the session:  

```
session = tt.Session(
    app_extensions={"@atoti/pokemon-extension": "./extensions/dist"},
)
```

To see the customized widgets in action, click on the link below.   
Remember to run the notebook with `atoti_unlock` set to `True`. Have fun battling your favorite Pokémon!