Skip to content

Commit

Permalink
;doc: update manuals
Browse files Browse the repository at this point in the history
  • Loading branch information
simonmichael committed Mar 25, 2024
1 parent 2889bb6 commit 8642db7
Show file tree
Hide file tree
Showing 3 changed files with 153 additions and 126 deletions.
37 changes: 23 additions & 14 deletions hledger/hledger.1
Expand Up @@ -9866,24 +9866,28 @@ files to your main journal, you will run
.PP
Note you can import from any file format, though CSV files are the most
common import source, and these docs focus on that case.
.SS \[dq]Deduplication\[dq]
.SS Skipping
\f[CR]import\f[R] tries to import only the transactions which are new
since the last import.
since the last import, \[dq]skipping over\[dq] any that it saw last
time.
So if your bank\[aq]s CSV includes the last three months of data, you
can download and \f[CR]import\f[R] it every month (or week, or day) and
only the new transactions will be imported each time.
.PP
It works as follows.
For each imported \f[CR]FILE\f[R] (usually a CSV file): \- It tries to
find the latest date seen previously, by reading it from a hidden
\f[CR].latest.FILE\f[R] in the same directory.
\- Then it processes \f[CR]FILE\f[R], ignoring any transactions on or
For each imported \f[CR]FILE\f[R]:
.IP \[bu] 2
It tries to find the latest date seen previously, by reading it from a
hidden \f[CR].latest.FILE\f[R] in the same directory.
.IP \[bu] 2
Then it processes \f[CR]FILE\f[R], ignoring any transactions on or
before the \[dq]latest seen\[dq] date.
.PP
And after a successful import, it updates the \f[CR].latest.FILE\f[R](s)
for next time (unless \f[CR]\-\-dry\-run\f[R] was used).
.PP
This is simple but fairly effective.
This is simple system that works fairly well for transaction data
(usually CSV, but it could be any of hledger\[aq]s input formats).
It assumes:
.IP "1." 3
new items always have the newest dates
Expand All @@ -9901,12 +9905,17 @@ more often (and in old transactions it doesn\[aq]t matter).
Note, \f[CR]import\f[R] avoids reprocessing the same dates across
successive runs, but it does not detect transactions that are duplicated
within a single run.
So eg if you downloaded but did not import \f[CR]bank.1.csv\f[R], and
later downloaded \f[CR]bank.2.csv\f[R] with overlapping data, you should
not import both of them in a single run
(\f[CR]hledger import bank.1.csv bank.2.csv\f[R]); instead, import them
one at a time (\f[CR]hledger import bank.1.csv\f[R], then
\f[CR]hledger import bank.2.csv\f[R]).
I\[aq]ll call these \[dq]skipping\[dq] and \[dq]deduplication\[dq].
.PP
So for example, say you downloaded but did not import
\f[CR]bank.1.csv\f[R], and later downloaded \f[CR]bank.2.csv\f[R] with
overlapping data.
Then you should not import both of them at once
(\f[CR]hledger import bank.1.csv bank.2.csv\f[R]), as the overlapping
data would appear twice and not be deduplicated.
Instead, import them one at a time
(\f[CR]hledger import bank.1.csv; hledger import bank.2.csv\f[R]), and
the second import will skip the overlapping data.
.PP
Normally you can ignore the \f[CR].latest.*\f[R] files, but if needed,
you can delete them (to make all transactions unseen), or
Expand All @@ -9917,7 +9926,7 @@ It means \[dq]I have seen transactions up to this date, and this many of
them occurring on that date\[dq].
.PP
(\f[CR]hledger print \-\-new\f[R] also uses and updates these
\f[CR].latest.*\f[R] files, but it is not often used.)
\f[CR].latest.*\f[R] files, but it is less often used.)
.PP
Related: CSV > Working with CSV > Deduplicating, importing.
.SS Import testing
Expand Down
203 changes: 106 additions & 97 deletions hledger/hledger.info
Expand Up @@ -9546,31 +9546,36 @@ most common import source, and these docs focus on that case.

* Menu:

* "Deduplication"::
* Skipping::
* Import testing::
* Importing balance assignments::
* Commodity display styles::


File: hledger.info, Node: "Deduplication", Next: Import testing, Up: import
File: hledger.info, Node: Skipping, Next: Import testing, Up: import

24.19.1 "Deduplication"
-----------------------
24.19.1 Skipping
----------------

'import' tries to import only the transactions which are new since the
last import. So if your bank's CSV includes the last three months of
data, you can download and 'import' it every month (or week, or day) and
only the new transactions will be imported each time.
last import, "skipping over" any that it saw last time. So if your
bank's CSV includes the last three months of data, you can download and
'import' it every month (or week, or day) and only the new transactions
will be imported each time.

It works as follows. For each imported 'FILE' (usually a CSV file):
- It tries to find the latest date seen previously, by reading it from a
hidden '.latest.FILE' in the same directory. - Then it processes
'FILE', ignoring any transactions on or before the "latest seen" date.
It works as follows. For each imported 'FILE':

* It tries to find the latest date seen previously, by reading it
from a hidden '.latest.FILE' in the same directory.
* Then it processes 'FILE', ignoring any transactions on or before
the "latest seen" date.

And after a successful import, it updates the '.latest.FILE'(s) for
next time (unless '--dry-run' was used).

This is simple but fairly effective. It assumes:
This is simple system that works fairly well for transaction data
(usually CSV, but it could be any of hledger's input formats). It
assumes:

1. new items always have the newest dates
2. item dates are stable across successive CSV downloads
Expand All @@ -9583,11 +9588,15 @@ by importing more often (and in old transactions it doesn't matter).

Note, 'import' avoids reprocessing the same dates across successive
runs, but it does not detect transactions that are duplicated within a
single run. So eg if you downloaded but did not import 'bank.1.csv',
and later downloaded 'bank.2.csv' with overlapping data, you should not
import both of them in a single run ('hledger import bank.1.csv
bank.2.csv'); instead, import them one at a time ('hledger import
bank.1.csv', then 'hledger import bank.2.csv').
single run. I'll call these "skipping" and "deduplication".

So for example, say you downloaded but did not import 'bank.1.csv',
and later downloaded 'bank.2.csv' with overlapping data. Then you
should not import both of them at once ('hledger import bank.1.csv
bank.2.csv'), as the overlapping data would appear twice and not be
deduplicated. Instead, import them one at a time ('hledger import
bank.1.csv; hledger import bank.2.csv'), and the second import will skip
the overlapping data.

Normally you can ignore the '.latest.*' files, but if needed, you can
delete them (to make all transactions unseen), or construct/modify them
Expand All @@ -9597,12 +9606,12 @@ have seen transactions up to this date, and this many of them occurring
on that date".

('hledger print --new' also uses and updates these '.latest.*' files,
but it is not often used.)
but it is less often used.)

Related: CSV > Working with CSV > Deduplicating, importing.


File: hledger.info, Node: Import testing, Next: Importing balance assignments, Prev: "Deduplication", Up: import
File: hledger.info, Node: Import testing, Next: Importing balance assignments, Prev: Skipping, Up: import

24.19.2 Import testing
----------------------
Expand Down Expand Up @@ -11717,84 +11726,84 @@ Node: help343889
Ref: #help-1343998
Node: import345371
Ref: #import345494
Node: "Deduplication"346604
Ref: #deduplication346735
Node: Import testing348911
Ref: #import-testing349078
Node: Importing balance assignments349921
Ref: #importing-balance-assignments350127
Node: Commodity display styles350776
Ref: #commodity-display-styles350949
Node: incomestatement351078
Ref: #incomestatement351220
Node: notes352551
Ref: #notes352673
Node: payees353035
Ref: #payees353150
Node: prices353669
Ref: #prices353784
Node: print354437
Ref: #print354552
Node: print explicitness355528
Ref: #print-explicitness355671
Node: print amount style356450
Ref: #print-amount-style356620
Node: print parseability357690
Ref: #print-parseability357862
Node: print other features358611
Ref: #print-other-features358790
Node: print output format359311
Ref: #print-output-format359459
Node: register362598
Ref: #register362720
Node: Custom register output367751
Ref: #custom-register-output367882
Node: rewrite369229
Ref: #rewrite369347
Node: Re-write rules in a file371245
Ref: #re-write-rules-in-a-file371408
Node: Diff output format372557
Ref: #diff-output-format372740
Node: rewrite vs print --auto373832
Ref: #rewrite-vs.-print---auto373992
Node: roi374548
Ref: #roi374655
Node: Spaces and special characters in --inv and --pnl376467
Ref: #spaces-and-special-characters-in---inv-and---pnl376707
Node: Semantics of --inv and --pnl377195
Ref: #semantics-of---inv-and---pnl377434
Node: IRR and TWR explained379284
Ref: #irr-and-twr-explained379444
Node: stats382697
Ref: #stats382805
Node: tags384319
Ref: #tags-1384426
Node: test385435
Ref: #test385528
Node: PART 5 COMMON TASKS386270
Ref: #part-5-common-tasks386416
Node: Getting help386714
Ref: #getting-help386855
Node: Constructing command lines387615
Ref: #constructing-command-lines387816
Node: Starting a journal file388473
Ref: #starting-a-journal-file388675
Node: Setting LEDGER_FILE389877
Ref: #setting-ledger_file390069
Node: Setting opening balances391026
Ref: #setting-opening-balances391227
Node: Recording transactions394368
Ref: #recording-transactions394557
Node: Reconciling395113
Ref: #reconciling395265
Node: Reporting397522
Ref: #reporting397671
Node: Migrating to a new file401656
Ref: #migrating-to-a-new-file401813
Node: BUGS402112
Ref: #bugs402202
Node: Troubleshooting403081
Ref: #troubleshooting403181
Node: Skipping346597
Ref: #skipping346707
Node: Import testing349191
Ref: #import-testing349351
Node: Importing balance assignments350194
Ref: #importing-balance-assignments350400
Node: Commodity display styles351049
Ref: #commodity-display-styles351222
Node: incomestatement351351
Ref: #incomestatement351493
Node: notes352824
Ref: #notes352946
Node: payees353308
Ref: #payees353423
Node: prices353942
Ref: #prices354057
Node: print354710
Ref: #print354825
Node: print explicitness355801
Ref: #print-explicitness355944
Node: print amount style356723
Ref: #print-amount-style356893
Node: print parseability357963
Ref: #print-parseability358135
Node: print other features358884
Ref: #print-other-features359063
Node: print output format359584
Ref: #print-output-format359732
Node: register362871
Ref: #register362993
Node: Custom register output368024
Ref: #custom-register-output368155
Node: rewrite369502
Ref: #rewrite369620
Node: Re-write rules in a file371518
Ref: #re-write-rules-in-a-file371681
Node: Diff output format372830
Ref: #diff-output-format373013
Node: rewrite vs print --auto374105
Ref: #rewrite-vs.-print---auto374265
Node: roi374821
Ref: #roi374928
Node: Spaces and special characters in --inv and --pnl376740
Ref: #spaces-and-special-characters-in---inv-and---pnl376980
Node: Semantics of --inv and --pnl377468
Ref: #semantics-of---inv-and---pnl377707
Node: IRR and TWR explained379557
Ref: #irr-and-twr-explained379717
Node: stats382970
Ref: #stats383078
Node: tags384592
Ref: #tags-1384699
Node: test385708
Ref: #test385801
Node: PART 5 COMMON TASKS386543
Ref: #part-5-common-tasks386689
Node: Getting help386987
Ref: #getting-help387128
Node: Constructing command lines387888
Ref: #constructing-command-lines388089
Node: Starting a journal file388746
Ref: #starting-a-journal-file388948
Node: Setting LEDGER_FILE390150
Ref: #setting-ledger_file390342
Node: Setting opening balances391299
Ref: #setting-opening-balances391500
Node: Recording transactions394641
Ref: #recording-transactions394830
Node: Reconciling395386
Ref: #reconciling395538
Node: Reporting397795
Ref: #reporting397944
Node: Migrating to a new file401929
Ref: #migrating-to-a-new-file402086
Node: BUGS402385
Ref: #bugs402475
Node: Troubleshooting403354
Ref: #troubleshooting403454

End Tag Table

Expand Down
39 changes: 24 additions & 15 deletions hledger/hledger.txt
Expand Up @@ -7719,21 +7719,26 @@ PART 4: COMMANDS
Note you can import from any file format, though CSV files are the most
common import source, and these docs focus on that case.

"Deduplication"
Skipping
import tries to import only the transactions which are new since the
last import. So if your bank's CSV includes the last three months of
data, you can download and import it every month (or week, or day) and
only the new transactions will be imported each time.
last import, "skipping over" any that it saw last time. So if your
bank's CSV includes the last three months of data, you can download and
import it every month (or week, or day) and only the new transactions
will be imported each time.

It works as follows. For each imported FILE (usually a CSV file): - It
tries to find the latest date seen previously, by reading it from a
hidden .latest.FILE in the same directory. - Then it processes FILE,
ignoring any transactions on or before the "latest seen" date.
It works as follows. For each imported FILE:

o It tries to find the latest date seen previously, by reading it from
a hidden .latest.FILE in the same directory.

o Then it processes FILE, ignoring any transactions on or before the
"latest seen" date.

And after a successful import, it updates the .latest.FILE(s) for next
time (unless --dry-run was used).

This is simple but fairly effective. It assumes:
This is simple system that works fairly well for transaction data (usu-
ally CSV, but it could be any of hledger's input formats). It assumes:

1. new items always have the newest dates

Expand All @@ -7749,11 +7754,15 @@ PART 4: COMMANDS

Note, import avoids reprocessing the same dates across successive runs,
but it does not detect transactions that are duplicated within a single
run. So eg if you downloaded but did not import bank.1.csv, and later
downloaded bank.2.csv with overlapping data, you should not import both
of them in a single run (hledger import bank.1.csv bank.2.csv); in-
stead, import them one at a time (hledger import bank.1.csv, then
hledger import bank.2.csv).
run. I'll call these "skipping" and "deduplication".

So for example, say you downloaded but did not import bank.1.csv, and
later downloaded bank.2.csv with overlapping data. Then you should not
import both of them at once (hledger import bank.1.csv bank.2.csv), as
the overlapping data would appear twice and not be deduplicated. In-
stead, import them one at a time (hledger import bank.1.csv; hledger
import bank.2.csv), and the second import will skip the overlapping
data.

Normally you can ignore the .latest.* files, but if needed, you can
delete them (to make all transactions unseen), or construct/modify them
Expand All @@ -7763,7 +7772,7 @@ PART 4: COMMANDS
ring on that date".

(hledger print --new also uses and updates these .latest.* files, but
it is not often used.)
it is less often used.)

Related: CSV > Working with CSV > Deduplicating, importing.

Expand Down

0 comments on commit 8642db7

Please sign in to comment.