[2nd viz]: Data structure for visualization exploration: Is it possible to show the flow of inventory from one place to another over time? #878

pylipp · 2023-07-31T10:04:07Z

Requirements:

It must be possible to customize the start and end date of the period displaying flows.
It must be possible to filter the flow of inventory by one or more product types.
It must be possible to filter the flow of inventory by category type.
"Place" can be defined as a warehouse location or transfer source/destination.

Stretch goal:
Do a similar exploration but with box_state.

The text was updated successfully, but these errors were encountered:

pylipp · 2023-07-31T10:10:37Z

Main goal: show the flow of inventory from one place to another over time

Status quo

Changes to boxes (product, location, size, nr of items, state) are being tracked in the history table
Given a source location S and a time range T, the following query returns a list of all boxes that have been moved to target locations (optionally filtered by products P)

SELECT changedate,
record_id AS box_id,
from_int AS source_location_id,
to_int AS target_location_id /* ,b.product_id, b.items */
FROM history 
/* JOIN stock AS b ON b.id = box_id */
WHERE tablename = "stock" 
AND changedate > T 
AND changes = "location_id" 
AND from_int = S
/* AND b.product_id in P */ ;

With this, requirements 1, 2, and 4 are met. Requirement 3 can be fulfilled by either joining with the products table, and filtering by category; or by passing all products of a category as P
The resulting data rows will be returned as JSON; either as “single block” or using GraphQL resolvers to adjust to the fields required by the FE
The performance of the request and the size of the response depends on the query parameters, mostly on the time range T

Caveat

@aerinsol
Joining with the stock table to find product and box items brings in the current properties of the box which might not necessarily match the properties in the requested time period. E.g. say a box with 500 items was moved from A to B in February but in April 200 items were removed. When I want to query stats for the first three months of the year, it will incorrectly tell me that 300 items were moved from A to B.

I'm exploring two options:

Shadow tables (see post below)
use legacy History table with an additional JSONField column to store the (unchanged) properties of the row at the time of change. I expect this to make the eventual querying more complex but at the same time it will be precise.

Stretch goal

The SQL query above can be adjusted to filter for box state changes in the history table: AND changes = "box_state_id"

pylipp · 2023-07-31T10:56:17Z

Conclusion: shadow tables are useful to determine the state of a table at a given point in time. On the contrary, finding out about actual change is cumbersome. Hence shadow tables are not helpful for the scenario above.

Outlook: shadow tables

It was proposed earlier to introduce shadow tables to the boxtribute database schema to enable more accurate tracking of historical data (mostly stock; also product, beneficiaries, products, locations, tags, qr).
It shall be explored how the requested data can be fetched in such a scenario.

For recap: the stock shadow table contains the same columns, and an effective_from column

Given a source location S and a time range T1-T2 (old-new), the following query returns a list of all boxes that have been moved to target locations (optionally filtered by products P)
- obtain stock at beginning of time range T1 and end of time range T2
- create delta of these states (that results in the "change volume" but we have no clear information about what went where 🤔 )

SELECT * from shadow_stock
WHERE effective_from = ( SELECT MAX(effective_from) FROM shadow_stock WHERE effective_from < T1 );
SELECT * from shadow_stock
WHERE effective_from = ( SELECT MAX(effective_from) FROM shadow_stock WHERE effective_from < T2 );

Get state of stock table at T1, and then iterate through all state updates until T2, aggregating relevant changes by comparing consecutive states (rows of the shadow table) for each box; however this practice (comparing rows) is deemed impractical.

pylipp · 2023-08-09T12:47:42Z

Another user story

Can we use the current data structure of the history table for the following case:
Show me where boxes were moved (i.e. state and location changes), incl. their product and number of items, by a certain date D
of boxes that were created in my warehouse in a certain timespan T

Data format of result

Dimensions:

product ID
"target"
- location ID + InStock/Donated (box was moved within my base)
- state Lost (box was marked as lost in my base)
- state Scrap (box was marked as scrap in my base)
- MarkedForShipment + target base (box is about to be shipped to target)
- InTransit + target base (box is being shipped to target)
- location ID + InStock/Donated + target base (box was shipped to target; and then moved within that base)
- Other (box is still in my base but product or number of items changed)

Facts

number of boxes
number of items

Implementation

We expect that a basic SQL query could look like

SELECT
  GROUP_CONCAT(h.changedate) AS changedates,
  GROUP_CONCAT(h.changes) AS allchanges,
  s.id 
FROM history h
INNER JOIN stock s ON (h.tablename = "stock" AND h.record_id = s.id)
WHERE h.change_date > T1 AND h.changes NOT LIKE "comments%"
) GROUP BY s.id;

This results in a mapping of box IDs and history rows (i.e. changes). For each box ID we find out the change that has happened closest before D and is that is a location, product, or state change. Any potential changes that happened after D must be taken into account (possible change of items or change of product) in order to determine the properties of the box at D.

Conclusion

The implementation above is complex, however it can be done in Python to avoid having to write complicated SQL.
Using shadow tables would help to know about the exact state at D, however finding out about the underlying changes requires reconstruction, too.
Adding a new properties JSON column to the history table produces a faster way to obtain box properties information at the time of a change; however the data for the column is missing for old data (or would have to be reconstructed in a tedious way).

HaGuesto · 2023-08-23T09:39:58Z

please checkout #950 for final decision

pylipp self-assigned this Jul 31, 2023

HaGuesto added this to the First visualization with history data is available on staging milestone Aug 9, 2023

pylipp mentioned this issue Aug 17, 2023

[3rd viz] create user story / spec for 3rd viz #903

Closed

HaGuesto closed this as completed Aug 23, 2023

pylipp mentioned this issue Sep 10, 2023

[4th viz] BE public graphQL query #910

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[2nd viz]: Data structure for visualization exploration: Is it possible to show the flow of inventory from one place to another over time? #878

[2nd viz]: Data structure for visualization exploration: Is it possible to show the flow of inventory from one place to another over time? #878

pylipp commented Jul 31, 2023

pylipp commented Jul 31, 2023 •

edited

Loading

pylipp commented Jul 31, 2023 •

edited

Loading

pylipp commented Aug 9, 2023 •

edited

Loading

HaGuesto commented Aug 23, 2023

[2nd viz]: Data structure for visualization exploration: Is it possible to show the flow of inventory from one place to another over time? #878

[2nd viz]: Data structure for visualization exploration: Is it possible to show the flow of inventory from one place to another over time? #878

Comments

pylipp commented Jul 31, 2023

pylipp commented Jul 31, 2023 • edited Loading

Status quo

Caveat

Stretch goal

pylipp commented Jul 31, 2023 • edited Loading

Outlook: shadow tables

pylipp commented Aug 9, 2023 • edited Loading

Another user story

Data format of result

Implementation

Conclusion

HaGuesto commented Aug 23, 2023

pylipp commented Jul 31, 2023 •

edited

Loading

pylipp commented Jul 31, 2023 •

edited

Loading

pylipp commented Aug 9, 2023 •

edited

Loading