Explain cost function in README#185
Conversation
Added detailed instructions for modifying the cost function in tsalign, including TSM base costs, jump costs, and gap-affine edit costs.
Updated README to link to main repository for cost function parameters.
There was a problem hiding this comment.
Pull request overview
This PR expands the README documentation to explain how to customize tsalign’s alignment cost model, including TSM base costs, geometry/jump costs, and the different gap-affine edit cost sections used across alignment regions.
Changes:
- Adds a new “Modifying the cost function” section with a step-by-step walkthrough of
config.tsa. - Documents TSM base cost naming, jump cost functions (including constraints), and gap-affine edit cost tables/vectors.
- Updates the Features bullet to reference the four-point model paper.
Comments suppressed due to low confidence (3)
README.md:133
- Typo: “referes” should be “refers”.
In direction, the letter `f` referes to a repeat, and `r` to a TSM.
README.md:153
- Grammar: “the first input value that the constant cost applies” is missing “to” (e.g., “applies to”). Consider rephrasing this sentence for clarity about what each row represents.
Each TSM additionally incurs cost based on its geometry.
The costs are a piecewise constant function, where the first row is the first input value that the constant cost applies, and the second row is the constant cost.
The cost functions must be V-shaped, i.e. there must be some input value X such that the function is non-ascending before X and non-descending after X.
README.md:159
- Grammar/pluralization: these sentences read incorrectly (“Length are…”, “LengthDifference are…”, “ForwardAntiPrimaryGap are…”). Consider using singular phrasing (e.g., “Length is…”) or “ costs are…”.
`Length` are costs based on the length of the 2-3-alignment of the TSM.
`LengthDifference` are costs based on the difference between the length of the 2-3-alignment and the difference between points 1 and 4.
`ForwardAntiPrimaryGap` are costs based on the difference between points 1 and 4, specifically `SP4 - SP1`.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
README.md:153
- The cost-function format has additional strict constraints that aren’t mentioned here: parsing requires the first index to equal the type’s minimum value (e.g.
-infforisize-based functions and0forusize-based ones likeLength), indices must be strictly increasing, and the number of indices must match the number of costs. Calling these out would help users avoid configs that fail to parse/verify.
Each TSM additionally incurs cost based on its geometry.
The costs are a piecewise constant function, where the first row is the first input value that the constant cost applies to, and the second row is the constant cost.
The cost functions must be V-shaped, i.e. there must be some input value X such that the function is non-ascending before X and non-descending after X.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
|
@sebschmi Do you think it would make sense to cross-link this from https://version.helsinki.fi/kraujasp/twitcher/-/blob/main/docs/costs.md and vice-versa? Or would it be best to maintain both versions independently as they target slightly different user profiles? Or that we potentially align the technical vocabulary of the two versions a bit? |
|
Ah, I totally forgot that you wrote a nice description already. Ideally the descriptions would be the same. We also thought about making an mdbook or so for tsalign, and we could even make a combined one with twitcher, and host that on readthedocs or so. About the vocabulary, the terms chosen in tsalign are often not very accurate or contradict definitions from the original TSM papers. I wanted to change that at some point, and probably will do so for writing this book. With that, I could also make the cost function definition file into a better and more flexible and human writable standard file format such as TOML. Well, there is a large front-end makeover necessary for tsalign. So let's delay this a bit, and I will continue working on this in the next month or so. |
Added detailed instructions for modifying the cost function in tsalign, including TSM base costs, jump costs, and gap-affine edit costs.