[ci] prevent trailing whitespace, ensure files end with newline (#6373)

microsoft · Mar 19, 2024 · 631e0a2 · 631e0a2
1 parent 6a1ec44
commit 631e0a2
Show file tree

Hide file tree

Showing 42 changed files with 186 additions and 177 deletions.
diff --git a/.ci/install-clang-devel.sh b/.ci/install-clang-devel.sh
@@ -56,7 +56,7 @@ cp --remove-destination /usr/lib/llvm-${CLANG_VERSION}/bin/* /usr/bin/
 # per https://www.stats.ox.ac.uk/pub/bdr/Rconfig/r-devel-linux-x86_64-fedora-clang
 #
 # clang was built to use libc++: for a version built to default to libstdc++
-# (as shipped by Fedora/Debian/Ubuntu), add -stdlib=libc++ to CXX 
+# (as shipped by Fedora/Debian/Ubuntu), add -stdlib=libc++ to CXX
 # and install the libcxx-devel/libc++-dev package.
 mkdir -p "${HOME}/.R"
 

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -1,12 +1,22 @@
+# exclude files which are auto-generated by build tools
 exclude: |
   (?x)^(
       build|
       external_libs|
       lightgbm-python|
       lightgbm_r|
   )$
+  |R-package/configure$
+  |R-package/inst/Makevars$
+  |R-package/inst/Makevars.win$
+  |R-package/man/.*Rd$
 
 repos:
+  - repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v4.5.0
+    hooks:
+      - id: end-of-file-fixer
+      - id: trailing-whitespace
   - repo: https://github.com/pycqa/isort
     rev: 5.13.2
     hooks:

diff --git a/R-package/LICENSE b/R-package/LICENSE
@@ -1,2 +1,2 @@
 YEAR: 2016
-COPYRIGHT HOLDER: Microsoft Corporation
+COPYRIGHT HOLDER: Microsoft Corporation
diff --git a/R-package/cran-comments.md b/R-package/cran-comments.md
@@ -14,7 +14,7 @@ warning was not fixed within 14 days.
 ```text
 /usr/local/clang-trunk/bin/../include/c++/v1/__fwd/string_view.h:22:41:
 warning: 'char_traits<fmt::detail::char8_type>' is deprecated:
-char_traits<T> for T not equal to char, wchar_t, char8_t, char16_t or char32_t is non-standard and is provided for a temporary period. 
+char_traits<T> for T not equal to char, wchar_t, char8_t, char16_t or char32_t is non-standard and is provided for a temporary period.
 It will be removed in LLVM 19, so please migrate off of it. [-Wdeprecated-declarations]
 ```
 

diff --git a/SECURITY.md b/SECURITY.md
@@ -14,7 +14,7 @@ Instead, please report them to the Microsoft Security Response Center (MSRC) at
 
 If you prefer to submit without logging in, send email to [secure@microsoft.com](mailto:secure@microsoft.com).  If possible, encrypt your message with our PGP key; please download it from the [Microsoft Security Response Center PGP Key page](https://aka.ms/opensource/security/pgpkey).
 
-You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Additional information can be found at [microsoft.com/msrc](https://aka.ms/opensource/security/msrc). 
+You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Additional information can be found at [microsoft.com/msrc](https://aka.ms/opensource/security/msrc).
 
 Please include the requested information listed below (as much as you can provide) to help us better understand the nature and scope of the possible issue:
 

diff --git a/build-cran-package.sh b/build-cran-package.sh
@@ -4,7 +4,7 @@
 #     Prepare a source distribution of the R package
 #     to be submitted to CRAN.
 #
-# [arguments] 
+# [arguments]
 #
 #     --r-executable Customize the R executable used by `R CMD build`.
 #                    Useful if building the R package in an environment with

diff --git a/docs/Advanced-Topics.rst b/docs/Advanced-Topics.rst
@@ -113,8 +113,8 @@ Unlike a categorical feature, however, ``positions`` are used to adjust the targ
 The position file corresponds with training data file line by line, and has one position per line. And if the name of training data file is ``train.txt``, the position file should be named as ``train.txt.position`` and placed in the same folder as the data file.
 In this case, LightGBM will load the position file automatically if it exists. The positions can also be specified through the ``Dataset`` constructor when using Python API. If the positions are specified in both approaches, the ``.position`` file will be ignored.
 
-Currently, implemented is an approach to model position bias by using an idea of Generalized Additive Models (`GAM <https://en.wikipedia.org/wiki/Generalized_additive_model>`_) to linearly decompose the document score ``s`` into the sum of a relevance component ``f`` and a positional component ``g``:  ``s(x, pos) = f(x) + g(pos)`` where the former component depends on the original query-document features and the latter depends on the position of an item. 
-During the training, the compound scoring function ``s(x, pos)`` is fit with a standard ranking algorithm (e.g., LambdaMART) which boils down to jointly learning the relevance component ``f(x)`` (it is later returned as an unbiased model) and the position factors ``g(pos)`` that help better explain the observed (biased) labels. 
-Similar score decomposition ideas have previously been applied for classification & pointwise ranking tasks with assumptions of binary labels and binary relevance (a.k.a. "two-tower" models, refer to the papers: `Towards Disentangling Relevance and Bias in Unbiased Learning to Rank <https://arxiv.org/abs/2212.13937>`_, `PAL: a position-bias aware learning framework for CTR prediction in live recommender systems <https://dl.acm.org/doi/10.1145/3298689.3347033>`_, `A General Framework for Debiasing in CTR Prediction <https://arxiv.org/abs/2112.02767>`_). 
-In LightGBM, we adapt this idea to general pairwise Lerarning-to-Rank with arbitrary ordinal relevance labels. 
+Currently, implemented is an approach to model position bias by using an idea of Generalized Additive Models (`GAM <https://en.wikipedia.org/wiki/Generalized_additive_model>`_) to linearly decompose the document score ``s`` into the sum of a relevance component ``f`` and a positional component ``g``:  ``s(x, pos) = f(x) + g(pos)`` where the former component depends on the original query-document features and the latter depends on the position of an item.
+During the training, the compound scoring function ``s(x, pos)`` is fit with a standard ranking algorithm (e.g., LambdaMART) which boils down to jointly learning the relevance component ``f(x)`` (it is later returned as an unbiased model) and the position factors ``g(pos)`` that help better explain the observed (biased) labels.
+Similar score decomposition ideas have previously been applied for classification & pointwise ranking tasks with assumptions of binary labels and binary relevance (a.k.a. "two-tower" models, refer to the papers: `Towards Disentangling Relevance and Bias in Unbiased Learning to Rank <https://arxiv.org/abs/2212.13937>`_, `PAL: a position-bias aware learning framework for CTR prediction in live recommender systems <https://dl.acm.org/doi/10.1145/3298689.3347033>`_, `A General Framework for Debiasing in CTR Prediction <https://arxiv.org/abs/2112.02767>`_).
+In LightGBM, we adapt this idea to general pairwise Lerarning-to-Rank with arbitrary ordinal relevance labels.
 Besides, GAMs have been used in the context of explainable ML (`Accurate Intelligible Models with Pairwise Interactions <https://www.cs.cornell.edu/~yinlou/papers/lou-kdd13.pdf>`_) to linearly decompose the contribution of each feature (and possibly their pairwise interactions) to the overall score, for subsequent analysis and interpretation of their effects in the trained models.
diff --git a/docs/Features.rst b/docs/Features.rst
@@ -21,7 +21,7 @@ LightGBM uses histogram-based algorithms\ `[4, 5, 6] <#references>`__, which buc
    -  To get one leaf's histograms in a binary tree, use the histogram subtraction of its parent and its neighbor
 
    -  So it needs to construct histograms for only one leaf (with smaller ``#data`` than its neighbor). It then can get histograms of its neighbor by histogram subtraction with small cost (``O(#bins)``)
-   
+
 -  **Reduce memory usage**
 
    -  Replaces continuous values with discrete bins. If ``#bins`` is small, can use small data type, e.g. uint8\_t, to store training data

diff --git a/docs/GPU-Targets.rst b/docs/GPU-Targets.rst
@@ -107,7 +107,7 @@ Example of using GPU (``gpu_platform_id = 0`` and ``gpu_device_id = 0`` in our s
     [LightGBM] [Info] 40 dense feature groups (0.12 MB) transferred to GPU in 0.004211 secs. 76 sparse feature groups.
     [LightGBM] [Info] No further splits with positive gain, best gain: -inf
     [LightGBM] [Info] Trained a tree with leaves=16 and depth=8
-    [1]:    test's rmse:1.10643e-17 
+    [1]:    test's rmse:1.10643e-17
     [LightGBM] [Info] No further splits with positive gain, best gain: -inf
     [LightGBM] [Info] Trained a tree with leaves=7 and depth=5
     [2]:    test's rmse:0
@@ -145,11 +145,11 @@ Example of using CPU (``gpu_platform_id = 0``, ``gpu_device_id = 1``). The GPU d
     [LightGBM] [Info] 40 dense feature groups (0.12 MB) transferred to GPU in 0.004540 secs. 76 sparse feature groups.
     [LightGBM] [Info] No further splits with positive gain, best gain: -inf
     [LightGBM] [Info] Trained a tree with leaves=16 and depth=8
-    [1]:    test's rmse:1.10643e-17 
+    [1]:    test's rmse:1.10643e-17
     [LightGBM] [Info] No further splits with positive gain, best gain: -inf
     [LightGBM] [Info] Trained a tree with leaves=7 and depth=5
     [2]:    test's rmse:0
-    
+
 
 Known issues:
 

diff --git a/docs/GPU-Tutorial.rst b/docs/GPU-Tutorial.rst
@@ -61,7 +61,7 @@ Now we are ready to checkout LightGBM and compile it with GPU support:
     cd LightGBM
     mkdir build
     cd build
-    cmake -DUSE_GPU=1 .. 
+    cmake -DUSE_GPU=1 ..
     # if you have installed NVIDIA CUDA to a customized location, you should specify paths to OpenCL headers and library like the following:
     # cmake -DUSE_GPU=1 -DOpenCL_LIBRARY=/usr/local/cuda/lib64/libOpenCL.so -DOpenCL_INCLUDE_DIR=/usr/local/cuda/include/ ..
     make -j$(nproc)

diff --git a/docs/Key-Events.md b/docs/Key-Events.md
@@ -75,7 +75,7 @@ The list includes the commits where the major feature added is considered workin
 * 22/06/2017 [Microsoft/LightGBM@d862b3e](https://github.com/microsoft/LightGBM/pull/642): CIntegration: Travis OSX Support (Pull Request 642)
 * 20/06/2017 [Microsoft/LightGBM@80c641c](https://github.com/microsoft/LightGBM/pull/635): Release: Python pip package (Pull Request 635)
 * 18/06/2017 [Microsoft/LightGBM@4d2aa84](https://github.com/microsoft/LightGBM/pull/634): CIntegration: AppVeyor Support (Pull Request 634)
-* 06/06/2017 [Microsoft/LightGBM@2c9ce59](https://github.com/microsoft/LightGBM/pull/592): Release: R-package version 0.2 (Pull Request 592) 
+* 06/06/2017 [Microsoft/LightGBM@2c9ce59](https://github.com/microsoft/LightGBM/pull/592): Release: R-package version 0.2 (Pull Request 592)
 * 05/06/2017 [Microsoft/LightGBM@f98d75f](https://github.com/microsoft/LightGBM/pull/584): Feature: Use custom compiler for R-package (Pull Request 584)
 * 29/05/2017 [Microsoft/LightGBM@993bbd5](https://github.com/microsoft/LightGBM/pull/559): Parameter: Early Stopping for predictions (Pull Request 559)
 * 26/05/2017 [Microsoft/LightGBM@3abff37](https://github.com/microsoft/LightGBM/commit/3abff370bb353293e4a03e516111dd02785fbd97): Feature: Parameter to disable missing values (Commit)

diff --git a/docs/Makefile b/docs/Makefile
@@ -17,4 +17,4 @@ help:
 # Catch-all target: route all unknown targets to Sphinx using the new
 # "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
 %: Makefile
-	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
diff --git a/docs/_static/images/artifacts-download.svg b/docs/_static/images/artifacts-download.svg
diff --git a/docs/_static/images/artifacts-fetching.svg b/docs/_static/images/artifacts-fetching.svg
diff --git a/docs/_static/images/artifacts-not-available.svg b/docs/_static/images/artifacts-not-available.svg
diff --git a/docs/_static/images/dask-concat.svg b/docs/_static/images/dask-concat.svg
diff --git a/docs/_static/images/dask-initial-setup.svg b/docs/_static/images/dask-initial-setup.svg