Skip to content

Commit

Permalink
[ci] prevent trailing whitespace, ensure files end with newline (#6373)
Browse files Browse the repository at this point in the history
  • Loading branch information
jameslamb committed Mar 19, 2024
1 parent 6a1ec44 commit 631e0a2
Show file tree
Hide file tree
Showing 42 changed files with 186 additions and 177 deletions.
2 changes: 1 addition & 1 deletion .ci/install-clang-devel.sh
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ cp --remove-destination /usr/lib/llvm-${CLANG_VERSION}/bin/* /usr/bin/
# per https://www.stats.ox.ac.uk/pub/bdr/Rconfig/r-devel-linux-x86_64-fedora-clang
#
# clang was built to use libc++: for a version built to default to libstdc++
# (as shipped by Fedora/Debian/Ubuntu), add -stdlib=libc++ to CXX
# (as shipped by Fedora/Debian/Ubuntu), add -stdlib=libc++ to CXX
# and install the libcxx-devel/libc++-dev package.
mkdir -p "${HOME}/.R"

Expand Down
10 changes: 10 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,12 +1,22 @@
# exclude files which are auto-generated by build tools
exclude: |
(?x)^(
build|
external_libs|
lightgbm-python|
lightgbm_r|
)$
|R-package/configure$
|R-package/inst/Makevars$
|R-package/inst/Makevars.win$
|R-package/man/.*Rd$
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: end-of-file-fixer
- id: trailing-whitespace
- repo: https://github.com/pycqa/isort
rev: 5.13.2
hooks:
Expand Down
2 changes: 1 addition & 1 deletion R-package/LICENSE
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
YEAR: 2016
COPYRIGHT HOLDER: Microsoft Corporation
COPYRIGHT HOLDER: Microsoft Corporation
2 changes: 1 addition & 1 deletion R-package/cran-comments.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ warning was not fixed within 14 days.
```text
/usr/local/clang-trunk/bin/../include/c++/v1/__fwd/string_view.h:22:41:
warning: 'char_traits<fmt::detail::char8_type>' is deprecated:
char_traits<T> for T not equal to char, wchar_t, char8_t, char16_t or char32_t is non-standard and is provided for a temporary period.
char_traits<T> for T not equal to char, wchar_t, char8_t, char16_t or char32_t is non-standard and is provided for a temporary period.
It will be removed in LLVM 19, so please migrate off of it. [-Wdeprecated-declarations]
```

Expand Down
2 changes: 1 addition & 1 deletion SECURITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Instead, please report them to the Microsoft Security Response Center (MSRC) at

If you prefer to submit without logging in, send email to [secure@microsoft.com](mailto:secure@microsoft.com). If possible, encrypt your message with our PGP key; please download it from the [Microsoft Security Response Center PGP Key page](https://aka.ms/opensource/security/pgpkey).

You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Additional information can be found at [microsoft.com/msrc](https://aka.ms/opensource/security/msrc).
You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Additional information can be found at [microsoft.com/msrc](https://aka.ms/opensource/security/msrc).

Please include the requested information listed below (as much as you can provide) to help us better understand the nature and scope of the possible issue:

Expand Down
2 changes: 1 addition & 1 deletion build-cran-package.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
# Prepare a source distribution of the R package
# to be submitted to CRAN.
#
# [arguments]
# [arguments]
#
# --r-executable Customize the R executable used by `R CMD build`.
# Useful if building the R package in an environment with
Expand Down
8 changes: 4 additions & 4 deletions docs/Advanced-Topics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -113,8 +113,8 @@ Unlike a categorical feature, however, ``positions`` are used to adjust the targ
The position file corresponds with training data file line by line, and has one position per line. And if the name of training data file is ``train.txt``, the position file should be named as ``train.txt.position`` and placed in the same folder as the data file.
In this case, LightGBM will load the position file automatically if it exists. The positions can also be specified through the ``Dataset`` constructor when using Python API. If the positions are specified in both approaches, the ``.position`` file will be ignored.

Currently, implemented is an approach to model position bias by using an idea of Generalized Additive Models (`GAM <https://en.wikipedia.org/wiki/Generalized_additive_model>`_) to linearly decompose the document score ``s`` into the sum of a relevance component ``f`` and a positional component ``g``: ``s(x, pos) = f(x) + g(pos)`` where the former component depends on the original query-document features and the latter depends on the position of an item.
During the training, the compound scoring function ``s(x, pos)`` is fit with a standard ranking algorithm (e.g., LambdaMART) which boils down to jointly learning the relevance component ``f(x)`` (it is later returned as an unbiased model) and the position factors ``g(pos)`` that help better explain the observed (biased) labels.
Similar score decomposition ideas have previously been applied for classification & pointwise ranking tasks with assumptions of binary labels and binary relevance (a.k.a. "two-tower" models, refer to the papers: `Towards Disentangling Relevance and Bias in Unbiased Learning to Rank <https://arxiv.org/abs/2212.13937>`_, `PAL: a position-bias aware learning framework for CTR prediction in live recommender systems <https://dl.acm.org/doi/10.1145/3298689.3347033>`_, `A General Framework for Debiasing in CTR Prediction <https://arxiv.org/abs/2112.02767>`_).
In LightGBM, we adapt this idea to general pairwise Lerarning-to-Rank with arbitrary ordinal relevance labels.
Currently, implemented is an approach to model position bias by using an idea of Generalized Additive Models (`GAM <https://en.wikipedia.org/wiki/Generalized_additive_model>`_) to linearly decompose the document score ``s`` into the sum of a relevance component ``f`` and a positional component ``g``: ``s(x, pos) = f(x) + g(pos)`` where the former component depends on the original query-document features and the latter depends on the position of an item.
During the training, the compound scoring function ``s(x, pos)`` is fit with a standard ranking algorithm (e.g., LambdaMART) which boils down to jointly learning the relevance component ``f(x)`` (it is later returned as an unbiased model) and the position factors ``g(pos)`` that help better explain the observed (biased) labels.
Similar score decomposition ideas have previously been applied for classification & pointwise ranking tasks with assumptions of binary labels and binary relevance (a.k.a. "two-tower" models, refer to the papers: `Towards Disentangling Relevance and Bias in Unbiased Learning to Rank <https://arxiv.org/abs/2212.13937>`_, `PAL: a position-bias aware learning framework for CTR prediction in live recommender systems <https://dl.acm.org/doi/10.1145/3298689.3347033>`_, `A General Framework for Debiasing in CTR Prediction <https://arxiv.org/abs/2112.02767>`_).
In LightGBM, we adapt this idea to general pairwise Lerarning-to-Rank with arbitrary ordinal relevance labels.
Besides, GAMs have been used in the context of explainable ML (`Accurate Intelligible Models with Pairwise Interactions <https://www.cs.cornell.edu/~yinlou/papers/lou-kdd13.pdf>`_) to linearly decompose the contribution of each feature (and possibly their pairwise interactions) to the overall score, for subsequent analysis and interpretation of their effects in the trained models.
2 changes: 1 addition & 1 deletion docs/Features.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ LightGBM uses histogram-based algorithms\ `[4, 5, 6] <#references>`__, which buc
- To get one leaf's histograms in a binary tree, use the histogram subtraction of its parent and its neighbor

- So it needs to construct histograms for only one leaf (with smaller ``#data`` than its neighbor). It then can get histograms of its neighbor by histogram subtraction with small cost (``O(#bins)``)

- **Reduce memory usage**

- Replaces continuous values with discrete bins. If ``#bins`` is small, can use small data type, e.g. uint8\_t, to store training data
Expand Down
6 changes: 3 additions & 3 deletions docs/GPU-Targets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ Example of using GPU (``gpu_platform_id = 0`` and ``gpu_device_id = 0`` in our s
[LightGBM] [Info] 40 dense feature groups (0.12 MB) transferred to GPU in 0.004211 secs. 76 sparse feature groups.
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=16 and depth=8
[1]: test's rmse:1.10643e-17
[1]: test's rmse:1.10643e-17
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=7 and depth=5
[2]: test's rmse:0
Expand Down Expand Up @@ -145,11 +145,11 @@ Example of using CPU (``gpu_platform_id = 0``, ``gpu_device_id = 1``). The GPU d
[LightGBM] [Info] 40 dense feature groups (0.12 MB) transferred to GPU in 0.004540 secs. 76 sparse feature groups.
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=16 and depth=8
[1]: test's rmse:1.10643e-17
[1]: test's rmse:1.10643e-17
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=7 and depth=5
[2]: test's rmse:0
Known issues:

Expand Down
2 changes: 1 addition & 1 deletion docs/GPU-Tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ Now we are ready to checkout LightGBM and compile it with GPU support:
cd LightGBM
mkdir build
cd build
cmake -DUSE_GPU=1 ..
cmake -DUSE_GPU=1 ..
  # if you have installed NVIDIA CUDA to a customized location, you should specify paths to OpenCL headers and library like the following:
# cmake -DUSE_GPU=1 -DOpenCL_LIBRARY=/usr/local/cuda/lib64/libOpenCL.so -DOpenCL_INCLUDE_DIR=/usr/local/cuda/include/ ..
make -j$(nproc)
Expand Down
2 changes: 1 addition & 1 deletion docs/Key-Events.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ The list includes the commits where the major feature added is considered workin
* 22/06/2017 [Microsoft/LightGBM@d862b3e](https://github.com/microsoft/LightGBM/pull/642): CIntegration: Travis OSX Support (Pull Request 642)
* 20/06/2017 [Microsoft/LightGBM@80c641c](https://github.com/microsoft/LightGBM/pull/635): Release: Python pip package (Pull Request 635)
* 18/06/2017 [Microsoft/LightGBM@4d2aa84](https://github.com/microsoft/LightGBM/pull/634): CIntegration: AppVeyor Support (Pull Request 634)
* 06/06/2017 [Microsoft/LightGBM@2c9ce59](https://github.com/microsoft/LightGBM/pull/592): Release: R-package version 0.2 (Pull Request 592)
* 06/06/2017 [Microsoft/LightGBM@2c9ce59](https://github.com/microsoft/LightGBM/pull/592): Release: R-package version 0.2 (Pull Request 592)
* 05/06/2017 [Microsoft/LightGBM@f98d75f](https://github.com/microsoft/LightGBM/pull/584): Feature: Use custom compiler for R-package (Pull Request 584)
* 29/05/2017 [Microsoft/LightGBM@993bbd5](https://github.com/microsoft/LightGBM/pull/559): Parameter: Early Stopping for predictions (Pull Request 559)
* 26/05/2017 [Microsoft/LightGBM@3abff37](https://github.com/microsoft/LightGBM/commit/3abff370bb353293e4a03e516111dd02785fbd97): Feature: Parameter to disable missing values (Commit)
Expand Down
2 changes: 1 addition & 1 deletion docs/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -17,4 +17,4 @@ help:
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
2 changes: 1 addition & 1 deletion docs/_static/images/artifacts-download.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/_static/images/artifacts-fetching.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/_static/images/artifacts-not-available.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/_static/images/dask-concat.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/_static/images/dask-initial-setup.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 631e0a2

Please sign in to comment.