# General Parameters

### booster
- **What it is:** The type of model used by XGBoost.
- **Options:**
  - **gbtree:** Uses tree-based models.
  - **gblinear:** Uses linear functions.
  - **dart:** Uses tree-based models with a dropout technique.
- **Default:** gbtree.

### device
- **What it is:** The hardware on which XGBoost will run.
- **Options:**
  - **cpu:** Use the computer’s CPU.
  - **cuda:** Use a GPU (CUDA device).
  - **cuda:<ordinal>:** Specify which GPU to use if you have more than one.
  - **gpu:** Use the default GPU available.
  - **gpu:<ordinal>:** Specify which GPU to use from the list of available GPUs.
- **Default:** cpu.

### verbosity
- **What it is:** The level of detail in the messages printed by XGBoost.
- **Options:**
  - **0:** Silent (no messages).
  - **1:** Warnings only.
  - **2:** Info messages (general information).
  - **3:** Debug messages (detailed information for debugging).
- **Default:** 1 (warnings only).

### validate_parameters
- **What it is:** Whether XGBoost checks if the input parameters are valid.
- **What it does:** When set to True, it warns if there are any unknown or unused parameters.
- **Default:** False (except for Python, R, and CLI interfaces where it’s True).

### nthread
- **What it is:** Number of CPU threads used by XGBoost.
- **What it does:** Controls the number of parallel threads for running XGBoost. More threads can speed up processing but may cause contention if too many are used.
- **Default:** Maximum number of available threads.

### disable_default_eval_metric
- **What it is:** Whether to disable the default evaluation metric.
- **What it does:** When set to 1 or true, the default metric used to evaluate the model during training is disabled.
- **Default:** False (default metric is enabled).



# Parameters for Tree Booster

### eta (learning_rate)
- **What it is:** A factor that makes the learning process slower and more cautious.
- **What it does:** It shrinks the weight of each new feature after each boosting step, helping to prevent overfitting (when the model is too closely fitted to the training data and performs poorly on new data).
- **Range:** Between 0 and 1. The default value is 0.3.

### gamma (min_split_loss)
- **What it is:** A threshold for deciding whether to split a node further.
- **What it does:** If the reduction in the loss (error) is less than gamma, the algorithm will not split the node, making the model more conservative.
- **Range:** Starts from 0 and can go up indefinitely. The default value is 0.

### max_depth
- **What it is:** The maximum depth of each tree.
- **What it does:** Deeper trees can learn more complex patterns but may also overfit. Limiting the depth can help prevent this.
- **Range:** Starts from 0 (no limit) to any positive number. The default value is 6.

### min_child_weight
- **What it is:** Minimum sum of weights needed in a child node.
- **What it does:** If a split results in a child node with a total weight less than this value, the split is discarded. Higher values make the model more conservative.
- **Range:** Starts from 0 and can go up indefinitely. The default value is 1.

### max_delta_step
- **What it is:** The maximum change allowed for each tree's leaf output.
- **What it does:** Controls the change in leaf values to make the model updates more cautious, useful for logistic regression with imbalanced classes.
- **Range:** Starts from 0 (no constraint) to any positive number. The default value is 0.

### subsample
- **What it is:** Fraction of the training data used to build each tree.
- **What it does:** Helps prevent overfitting by using only a portion of the data for each tree.
- **Range:** Between 0 and 1. The default value is 1.

### sampling_method
- **What it is:** The method used to sample training instances.
- **What it does:** Determines how the training data is sampled for each tree. The 'uniform' method gives equal chance to each instance, while 'gradient_based' focuses on instances with higher gradients.
- **Range:** Choices are 'uniform' or 'gradient_based'. The default is 'uniform'.

### colsample_bytree, colsample_bylevel, colsample_bynode
- **What it is:** Fraction of columns to sample for each tree, level, or node.
- **What it does:** Reduces the number of features considered at each split, helping to prevent overfitting.
- **Range:** Between 0 and 1. The default value is 1.

### lambda (reg_lambda)
- **What it is:** Regularization term for weights (L2 regularization).
- **What it does:** Penalizes large weights, making the model more conservative.
- **Range:** Starts from 0 and can go up indefinitely. The default value is 1.

### alpha (reg_alpha)
- **What it is:** Regularization term for weights (L1 regularization).
- **What it does:** Adds a penalty for the absolute value of the weights, promoting sparsity (more weights set to zero).
- **Range:** Starts from 0 and can go up indefinitely. The default value is 0.

### tree_method
- **What it is:** Algorithm used for constructing the trees.
- **What it does:** Determines the method used to create the decision trees, which can affect speed and accuracy.
- **Range:** Choices are 'auto', 'exact', 'approx', 'hist'. The default is 'auto'.

### scale_pos_weight
- **What it is:** Balances positive and negative classes.
- **What it does:** Helps when the training data is imbalanced by adjusting the weight of positive and negative classes.
- **Range:** Starts from 0 and can go up indefinitely. The default value is 1.

### updater
- **What it is:** Sequence of tree updaters to run.
- **What it does:** Defines the modular way trees are constructed and modified. It's usually set automatically but can be set manually for advanced use.
- **Range:** Various strings like 'grow_colmaker', 'grow_histmaker', 'sync', etc.

### refresh_leaf
- **What it is:** Parameter of the refresh updater.
- **What it does:** When set to 1, updates both leaf values and node statistics; when 0, only updates node statistics.
- **Range:** 0 or 1. The default value is 1.

### process_type
- **What it is:** Type of boosting process to run.
- **What it does:** Defines whether to create new trees (default) or update existing ones (update).
- **Range:** Choices are 'default' or 'update'. The default is 'default'.

### grow_policy
- **What it is:** Controls how new nodes are added.
- **What it does:** Determines whether to split nodes nearest the root (depthwise) or nodes with the highest loss change (lossguide).
- **Range:** Choices are 'depthwise' or 'lossguide'. The default is 'depthwise'.

### max_leaves
- **What it is:** Maximum number of nodes to be added.
- **What it does:** Limits the number of leaves (nodes at the end of branches) in the trees.
- **Range:** Starts from 0 and can go up indefinitely. The default value is 0.

### max_bin
- **What it is:** Number of bins to bucket continuous features.
- **What it does:** Affects how continuous features are split into discrete bins, impacting split quality and computation time.
- **Range:** Starts from 0 and can go up indefinitely. The default value is 256.

### num_parallel_tree
- **What it is:** Number of trees built in parallel in each iteration.
- **What it does:** Supports constructing multiple trees in parallel, typically used for boosted random forests.
- **Range:** Starts from 1 and can go up indefinitely. The default value is 1.

### monotone_constraints
- **What it is:** Constraints for variable monotonicity.
- **What it does:** Enforces specific monotonic (increasing or decreasing) relationships between features and the target.
- **Range:** N/A (user-specified).

### interaction_constraints
- **What it is:** Constraints for feature interactions.
- **What it does:** Specifies which features are allowed to interact with each other.
- **Range:** N/A (user-specified).

### multi_strategy
- **What it is:** Strategy for training multi-target models.
- **What it does:** Determines how to handle multiple outputs (targets) during training.
- **Range:** Choices are 'one_output_per_tree' or 'multi_output_tree'. The default is 'one_output_per_tree'.

### max_cached_hist_node
- **What it is:** Maximum number of cached nodes for CPU histogram.
- **What it does:** Controls the number of nodes cached to speed up training with the histogram method.
- **Range:** Starts from 0 and can go up indefinitely. The default value is 65536.


# Additional parameters for Dart Booster (booster=dart)

### max_cat_to_onehot
- **What it is:** A threshold to decide whether to use one-hot encoding for categorical features.
- **What it does:** If the number of categories in a feature is less than this threshold, XGBoost uses one-hot encoding (each category is represented by a separate binary feature). If the number of categories is greater, the categories are split into different groups (nodes).
- **Introduced in:** Version 1.6.0.

### max_cat_threshold
- **What it is:** The maximum number of categories considered for each split.
- **What it does:** Limits the number of categories used when splitting the data to prevent overfitting (making the model too closely fit to the training data).
- **Introduced in:** Version 1.7.0.

### Note
- These parameters are **experimental** and are only used when training with categorical data.
- The **exact tree method** is not yet supported for these parameters.


# Parameters for linear booster (booster=gblinear):

### lambda (reg_lambda)
- **What it is:** L2 regularization term on weights.
- **What it does:** Adds a penalty for the sum of squared weights. Higher values make the model more conservative by preventing large weights.
- **Default:** 0.

### alpha (reg_alpha)
- **What it is:** L1 regularization term on weights.
- **What it does:** Adds a penalty for the absolute values of weights. Higher values make the model more conservative by encouraging sparsity (more weights set to zero).
- **Default:** 0.

### updater
- **What it is:** The algorithm used to fit the linear model.
- **Options:**
  - **shotgun:** Uses a parallel coordinate descent algorithm. This method uses ‘hogwild’ parallelism, which means it’s fast but can produce slightly different results each time.
  - **coord_descent:** Uses a regular coordinate descent algorithm. This method is also multithreaded but produces consistent results. If using a GPU, a GPU variant is used.
- **Default:** shotgun.

### feature_selector
- **What it is:** The method for selecting and ordering features during training.
- **Options:**
  - **cyclic:** Cycles through features one at a time in a fixed order.
  - **shuffle:** Similar to cyclic but shuffles features randomly before each update.
  - **random:** Selects features randomly with replacement.
  - **greedy:** Selects the feature with the largest gradient magnitude, which is the most impactful. It’s deterministic and can be restricted to the top_k most significant features, reducing complexity.
  - **thrifty:** A more efficient approximation of greedy selection that reorders features based on their impact before each update. It can also be restricted to the top_k features.
- **Default:** cyclic.

### top_k
- **What it is:** Number of top features to select in greedy and thrifty feature selectors.
- **What it does:** Limits the number of features considered to the top_k most significant ones. A value of 0 means all features are used.
- **Default:** 0.

