Skip to content

docs: add ordering note to shift.Rd, character example to nafill.Rd#7774

Closed
LeonidasZhak wants to merge 2 commits into
Rdatatable:masterfrom
LeonidasZhak:docs/shift-nafill-stata-migration
Closed

docs: add ordering note to shift.Rd, character example to nafill.Rd#7774
LeonidasZhak wants to merge 2 commits into
Rdatatable:masterfrom
LeonidasZhak:docs/shift-nafill-stata-migration

Conversation

@LeonidasZhak
Copy link
Copy Markdown

Changes

shift.Rd — ordering note for time-series/panel users

Added a paragraph in the Details section explaining that shift operates on row position, not time order. This is a critical gotcha for users migrating from Stata, where L.var (after xtset) automatically respects time order within panels.

Added an explicit example showing the WRONG (unsorted) vs RIGHT (sorted) approach:

DT = data.table(year=c(2012, 2010, 2011), v1=c(30, 10, 20))
DT[, lag_wrong := shift(v1, 1L)]           # wrong: lag by row position
DT[order(year), lag_right := shift(v1, 1L)] # right: sort first

nafill.Rd — character example

Added a character vector example demonstrating locf and const fill types. Character was listed as a supported type in the Details section but had no example (only numeric and factor examples existed).

Validation

  • tools::checkRd() passes on both modified Rd files
  • Examples run correctly and produce expected output
  • No other open PRs or issues on this topic

Stata connection

The shift ordering note directly addresses the most common Stata-to-R migration pitfall with lag/lead operations: Stata's xtset + L.var handles panel ordering automatically, while data.table's shift requires explicit DT[order(timevar), ...]

- shift.Rd: Add note explaining that shift operates on row position,
  not time order. Critical gotcha for Stata migrants who expect L.var
  behavior (automatic time-ordering after xtset). Add explicit example
  showing the WRONG vs RIGHT approach with unsorted data.

- nafill.Rd: Add character vector example (locf + const fill).
  Character was listed as supported type but had no example.
Add a panel data example showing the correct pattern for lagging within
groups respecting time order. This is the most common use case for Stata
users migrating to R, where they would use:
  xtset firm year
  gen lag_sales = L.sales

The example demonstrates the equivalent data.table pattern:
  DT[order(firm, year), lag_sales := shift(sales, 1L), by = firm]

This complements the existing WRONG/RIGHT single-entity example by
showing the multi-entity panel data case with by=.
@LeonidasZhak
Copy link
Copy Markdown
Author

Withdrawing this small automated PR while I consolidate an oversized batch of contributions and reduce maintainer review burden. Sorry for the noise, and thank you for maintaining the project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant