# Introduction

This notebook is intended to explain various APIs available in PyDough to explain and explore the PyDough metadata and PyDough logical operations.

## Setup

This notebook uses the same setup configuration as the `Introduction` notebook, which much be executed so that PyDough is initialized with the desired data.

In [None]:
import pydough

pydough.active_session.load_metadata_graph("../metadata/tpch_demo_graph.json", "TPCH")
pydough.active_session.connect_database("sqlite", database="../tpch.db")
%load_ext pydough_jupyter_extensions

## Using `pydough.explain` on metadata

The API `pydough.explain` can be used to dump information about various PyDough metadata objects or logical operations. The simplest version is to call it on the entire graph, which displays basic information about the graph such as its name and the collections inside it. The `pydough.explain` API takes in an optional `verbose` argument. If this is True, then it displays more information about its arguments, and if it is False it displays a more compact summary.

`pydough.explain` can be called from either inside a normal Python cell or inside a pydough cell.

Below is an example of using this API on the TPCH graph that has been loaded into the active sesssion.

In [None]:
graph = pydough.active_session.metadata

print(pydough.explain(graph, verbose=True))

Just like the hint suggests, below is an example of calling `pydough.explain` to learn more about the `regions` collection of TPCH:

In [None]:
print(pydough.explain(graph["regions"], verbose=True))

The information displayed tells us information about this collection, including:

* The real data table that `regions` maps to in the database is called `main.REGION`.
* Every record in the `regions` collection has a `comment`, a `key`, and a `name`, which are scalar properties of the region.
* Each unique `key` value corresponds to a single unique record in `regions`.
* There is a subcollection of `regions` called `nations`.

Below is another example, this time on the collection `supply_records`:

In [None]:
print(pydough.explain(graph["supply_records"], verbose=True))

This time, the information displayed tells us the following:

* The real data table that `supply_records` maps to in the database is called `main.PARTSUPP`.
* Every record in the `supply_records` collection has an `availqty`, a `comment`, a `part_key`, a `supplier_key` and a `supplier_cost`.
* Each unique combination of the values of `part_key` and `supplier_key` corresponds to a single unique record in `supply_records`.
* There are subcollections of `supply_records` called `lines`, `part`, and `supplier`.

Just like the hint suggests, below is an example of calling `pydough.explain` to learn more about one of the scalar properties, in this case: the `name` property of `regions`.

In [None]:
print(pydough.explain(graph["regions"]["name"], verbose=True))

The information displayed tells us that the property `regions.name` corresponds to the `r_name` column of the table `main.REGION`, which has a type of `string`. 

Below is an example of using `pydough.explain` to learn about a subcollection property, in this case: the `nations` property of `regions`.

In [None]:
print(pydough.explain(graph["regions"]["nations"], verbose=True))

The information displayed explains how `regions.nations` connects the `regions` collection to the `nations` connection, including:

* Each record of `regions` can connect to multiple records in `nations`
* Each record of `nations` can connect to at most one record of `regions`
* A record of `regions` is connected to a record of `nations` if the `key` property of `regions` equals the `naiton_key` property of `nations`.
* The `nations` collection has a corresponding twin property to `regions.nations` called `nations.region`, displayed below:

In [None]:
print(pydough.explain(graph["nations"]["region"], verbose=True))

The information displayed is essentially the same as `regions.nations`, but from the perspective of each record of `nations` instead of each record of `regions`.

## Using `pydough.explain_structure`

The API `pydough.explain_structure` can be called on an entire metadata graph, just like `pydough.explain`. Instead of providing a summary of a single graph, collection, or property, `pydough.explain_structure` gives a wholistic summary of how everything in the graph connects. It displays all of the collections in the graph, the names of their properties, and for the ones that are subcollections it provides a brief summary of what other collection they map to and what the cardinality of the collection looks like.

Below is an example of using this API on the TPCH graph to gain a summary of the entire graph.

In [None]:
print(pydough.explain_structure(graph))

Focusing on the `regions` section of this information shows us some of the information observed earlier with `pydough.explain`:

* The scalar properties of `regions` are displayed (`comment`, `key` and `name`)
* The `nations` property is shown, and indicates that it connects each record of `regions` to potentially multiple records of `nations`, and that the reverse is `nations.region`.

## Using `pydough.explain` and `pydough.explain_term` on PyDough code

The API `pydough.explain` can also be called on PyDough code to display information about what it logically does. There is a key constraint: `pydough.explain` can only be called on PyDough code if it that resolves into a collection (e.g. `regions` or `nations.suppliers.WHERE(account_balance > 0)`).

To explain other PyDough code, such as an expression, `pydough.explain_term` must be used (more on that later).

Below is an example of using `pydough.explain` on simple PyDough code to learn about the `nations` collection:

In [None]:
%%pydough

print(pydough.explain(nations, verbose=True))

There are several pieces of information displayed here, including:

* The PyDough code `nations` just accesses the data from the `nations` collection, which we can learn more about as shown earlier by calling `pydough.explain` on the metadata for `nations`.
* If `pydough.to_sql` or `pydough.to_df` is called on `nations`, all four of its scalar properties will be incldued in the result.
* The properties `comment`, `key`, `name`, and `region_key` can be accessed by the collection as scalar expressions.
* The properties `customers`, `region` and `suppliers` can be accessed by the collection as subcollections (unknown if they are singular or plural, without gaining more information).

Below is an example of how to use `pydough.explain_term` to learn more about the `name` expression of `nations`:

In [None]:
%%pydough

print(pydough.explain_term(nations, name, verbose=True))

And below is an example of how to use `pydough.explain_term` to learn more about the `suppliers` collection of `nations`:

In [None]:
%%pydough

print(pydough.explain_term(nations, suppliers, verbose=True))

But `pydough.explain` and `pydough.explain_term` do not need to just be used on simple collections and columns. Below is a slightly more complex example that will be dissected in several steps by calling `pydough.explain` and `pydough.explain_term` on various snippets of it. The code in question calculates the top 3 asian countries by the number of orders made by customers in those nations in the year 1995.

In [None]:
%%pydough

asian_countries = nations.WHERE(region.name == "ASIA")

orders_1995 = customers.orders.WHERE(YEAR(order_date) == 1995)

asian_countries_info = asian_countries(country_name=LOWER(name), total_orders=COUNT(orders_1995))

top_asian_countries = asian_countries_info.TOP_K(3, by=total_orders.DESC())

pydough.to_df(top_asian_countries)

First, here is the results of calling `pydough.explain` on the final result:

In [None]:
%%pydough

print(pydough.explain(top_asian_countries, verbose=True))

There are several pieces of information displayed, including the following:

* The structure of the entire logic is shown, but the information being displayed is specifically focused on the last operation (the `TopK` at the bottom of the structure).
* The operation is ordering by `total_orders` in descending order, then keeping the top 3 entries.
* There are 6 total expressions that are accessible from `top_asian_countries`, but only 2 of them are included in the answer when executed: `country_name` and `total_orders`.

More can be learned about these expressions included in the answer with `pydough.explain_term`:

In [None]:
%%pydough

print(pydough.explain_term(top_asian_countries, country_name, verbose=True))

Here, we get the structure of everything done up until this point, and information specifically about `country_name`. In this case, we learn it is the result of claling `LOWER` on `name`. Calling `explain_term(top_asian_countries, name)` would more-or-less display the same information as `explain_term(nations, name)`. Instead, here is the result using `explain_term` to learn about `total_orders`:

In [None]:
%%pydough

print(pydough.explain_term(top_asian_countries, total_orders, verbose=True))

Here, we learn that `total_orders` counts how many records of `customers.orders` exist for each record of `nations`. Now, `explain_terms` is used to learn more about the argument to `COUNT`:

In [None]:
%%pydough

print(pydough.explain_term(asian_countries, orders_1995, verbose=True))

Here, we learn that `customers.orders` invokes a child of the current context (`nations.WHERE(region.name == 'ASIA')`) by accessing the `customers` subcollection, then accessing its `orders` collection, then filtering it on the conedition `YEAR(order_date) == 1995`. 

We also know that this resulting child is plural with regards to the context, meaning that `asian_countries(asian_countries.order_date)` woudl be illegal, but `asian_countries(MAX(asian_countries.order_date))` is legal.

More combinations of `pydough.explain` and `pydough.explain_terms` can be done to learn more about what each of these components does.