Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2nd viz]: Data structure for visualization exploration: Is it possible to show the flow of inventory from one place to another over time? #878

Closed
pylipp opened this issue Jul 31, 2023 · 4 comments
Assignees

Comments

@pylipp
Copy link
Contributor

pylipp commented Jul 31, 2023

Requirements:

  1. It must be possible to customize the start and end date of the period displaying flows.
  2. It must be possible to filter the flow of inventory by one or more product types.
  3. It must be possible to filter the flow of inventory by category type.
  4. "Place" can be defined as a warehouse location or transfer source/destination.

Stretch goal:
Do a similar exploration but with box_state.

@pylipp pylipp self-assigned this Jul 31, 2023
@pylipp
Copy link
Contributor Author

pylipp commented Jul 31, 2023

Main goal: show the flow of inventory from one place to another over time

Status quo

  • Changes to boxes (product, location, size, nr of items, state) are being tracked in the history table
  • Given a source location S and a time range T, the following query returns a list of all boxes that have been moved to target locations (optionally filtered by products P)
SELECT changedate,
record_id AS box_id,
from_int AS source_location_id,
to_int AS target_location_id /* ,b.product_id, b.items */
FROM history 
/* JOIN stock AS b ON b.id = box_id */
WHERE tablename = "stock" 
AND changedate > T 
AND changes = "location_id" 
AND from_int = S
/* AND b.product_id in P */ ;
  • With this, requirements 1, 2, and 4 are met. Requirement 3 can be fulfilled by either joining with the products table, and filtering by category; or by passing all products of a category as P
  • The resulting data rows will be returned as JSON; either as “single block” or using GraphQL resolvers to adjust to the fields required by the FE
  • The performance of the request and the size of the response depends on the query parameters, mostly on the time range T

Caveat

@aerinsol
Joining with the stock table to find product and box items brings in the current properties of the box which might not necessarily match the properties in the requested time period. E.g. say a box with 500 items was moved from A to B in February but in April 200 items were removed. When I want to query stats for the first three months of the year, it will incorrectly tell me that 300 items were moved from A to B.

I'm exploring two options:

  1. Shadow tables (see post below)
  2. use legacy History table with an additional JSONField column to store the (unchanged) properties of the row at the time of change. I expect this to make the eventual querying more complex but at the same time it will be precise.

Stretch goal

The SQL query above can be adjusted to filter for box state changes in the history table: AND changes = "box_state_id"

@pylipp
Copy link
Contributor Author

pylipp commented Jul 31, 2023

Conclusion: shadow tables are useful to determine the state of a table at a given point in time. On the contrary, finding out about actual change is cumbersome. Hence shadow tables are not helpful for the scenario above.

Outlook: shadow tables

It was proposed earlier to introduce shadow tables to the boxtribute database schema to enable more accurate tracking of historical data (mostly stock; also product, beneficiaries, products, locations, tags, qr).
It shall be explored how the requested data can be fetched in such a scenario.

  • For recap: the stock shadow table contains the same columns, and an effective_from column
  1. Given a source location S and a time range T1-T2 (old-new), the following query returns a list of all boxes that have been moved to target locations (optionally filtered by products P)
    • obtain stock at beginning of time range T1 and end of time range T2
    • create delta of these states (that results in the "change volume" but we have no clear information about what went where 🤔 )
SELECT * from shadow_stock
WHERE effective_from = ( SELECT MAX(effective_from) FROM shadow_stock WHERE effective_from < T1 );
SELECT * from shadow_stock
WHERE effective_from = ( SELECT MAX(effective_from) FROM shadow_stock WHERE effective_from < T2 );
  1. Get state of stock table at T1, and then iterate through all state updates until T2, aggregating relevant changes by comparing consecutive states (rows of the shadow table) for each box; however this practice (comparing rows) is deemed impractical.

@pylipp pylipp changed the title Philipp's data structure for visualization exploration: Is it possible to show the flow of inventory from one place to another over time? Data structure for visualization exploration: Is it possible to show the flow of inventory from one place to another over time? Aug 9, 2023
@pylipp
Copy link
Contributor Author

pylipp commented Aug 9, 2023

Another user story

Can we use the current data structure of the history table for the following case:
Show me where boxes were moved (i.e. state and location changes), incl. their product and number of items, by a certain date D
of boxes that were created in my warehouse in a certain timespan T

Data format of result

Dimensions:

  • product ID
  • "target"
    • location ID + InStock/Donated (box was moved within my base)
    • state Lost (box was marked as lost in my base)
    • state Scrap (box was marked as scrap in my base)
    • MarkedForShipment + target base (box is about to be shipped to target)
    • InTransit + target base (box is being shipped to target)
    • location ID + InStock/Donated + target base (box was shipped to target; and then moved within that base)
    • Other (box is still in my base but product or number of items changed)

Facts

  • number of boxes
  • number of items

Implementation

We expect that a basic SQL query could look like

SELECT
  GROUP_CONCAT(h.changedate) AS changedates,
  GROUP_CONCAT(h.changes) AS allchanges,
  s.id 
FROM history h
INNER JOIN stock s ON (h.tablename = "stock" AND h.record_id = s.id)
WHERE h.change_date > T1 AND h.changes NOT LIKE "comments%"
) GROUP BY s.id;

This results in a mapping of box IDs and history rows (i.e. changes). For each box ID we find out the change that has happened closest before D and is that is a location, product, or state change. Any potential changes that happened after D must be taken into account (possible change of items or change of product) in order to determine the properties of the box at D.

Conclusion

The implementation above is complex, however it can be done in Python to avoid having to write complicated SQL.
Using shadow tables would help to know about the exact state at D, however finding out about the underlying changes requires reconstruction, too.
Adding a new properties JSON column to the history table produces a faster way to obtain box properties information at the time of a change; however the data for the column is missing for old data (or would have to be reconstructed in a tedious way).

@HaGuesto HaGuesto changed the title Data structure for visualization exploration: Is it possible to show the flow of inventory from one place to another over time? [2nd viz]: Data structure for visualization exploration: Is it possible to show the flow of inventory from one place to another over time? Aug 10, 2023
@HaGuesto
Copy link
Member

please checkout #950 for final decision

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Merged to staging
Development

No branches or pull requests

2 participants