XGBoost (Extreme Gradient Boosting) is a popular machine learning algorithm used for supervised learning tasks like regression and classification. It is based on the gradient boosting framework and is known for its efficiency and effectiveness in handling structured data. In this explanation, I'll provide an overview of XGBoost along with key steps and formulas, presented in LaTeX format.

### Step 1: Objective Function

The objective of XGBoost is to minimize a regularized objective function, which is defined as follows:

$$
\text{Objective Function} = \sum_{i=1}^{n} \left[ L(y_i, \hat{y}_i) + \sum_{k=1}^{K} \Omega(f_k) \right]
$$

- $n$ is the number of training examples.
- $L(y_i, \hat{y}_i)$ is the loss function measuring the difference between the true label $y_i$ and the predicted label $\hat{y}_i$.
- $K$ is the number of trees (weak learners) in the ensemble.
- $\Omega(f_k)$ is the regularization term for the $k$-th tree.

### Step 2: Initialize the Model

Initialize the model with a constant prediction, often the mean of the target variable:

$$
f_0(x) = \text{argmin}_\gamma \sum_{i=1}^{n} L(y_i, \gamma)
$$

### Step 3: For each tree in the ensemble:

#### a. Compute Residuals

Calculate the negative gradient of the loss function with respect to the predicted values $\hat{y}_i$:

$$
r_{ik} = -\left[\frac{\partial L(y_i, \hat{y}_i)}{\partial \hat{y}_i}\right]_{\hat{y}_i=f_{k-1}(x_i)}
$$

#### b. Fit a Weak Learner (Regression Tree)

Train a regression tree $h_k(x)$ to predict the residuals $r_{ik}$. This tree captures the error that the previous trees have made.

#### c. Update the Ensemble

Update the ensemble model with the new tree:

$$
f_k(x) = f_{k-1}(x) + \eta h_k(x)
$$

Where $\eta$ is the learning rate, a hyperparameter controlling the step size when adding the new tree.

#### d. Regularization Term

Add a regularization term to the objective function to prevent overfitting:

$$
\Omega(f_k) = \gamma T + \frac{1}{2} \lambda \sum_{j=1}^{T} w_j^2
$$

Where:
- $T$ is the number of leaves in the tree $h_k(x)$.
- $w_j$ is the score associated with leaf $j$.
- $\gamma$ and $\lambda$ are hyperparameters controlling regularization.

### Step 4: Model Prediction

To make a final prediction, use the ensemble of trees:

$$
\hat{y}(x) = f_K(x) = \sum_{k=1}^{K} \eta h_k(x)
$$

### Step 5: Regularization and Hyperparameter Tuning

Regularization parameters like $\eta$, $\gamma$, and $\lambda$ can be tuned using techniques such as cross-validation to find the optimal values for your specific dataset.

### Step 6: Model Evaluation

Evaluate the model's performance using appropriate evaluation metrics (e.g., RMSE for regression or accuracy for classification) on a validation or test dataset.

XGBoost iteratively adds trees to the ensemble, with each tree aiming to correct the errors of the previous ones, making it a powerful and robust algorithm for various machine learning tasks.

In [8]:
from transformers import BertTokenizer, BertForTokenClassification, pipeline

# Load the pretrained model and tokenizer
model_name = "dbmdz/bert-large-cased-finetuned-conll03-english"
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForTokenClassification.from_pretrained(model_name)

# Define a NER pipeline
nlp_ner = pipeline("ner", model=model, tokenizer=tokenizer)




Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/213k [00:02<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


Downloading (…)okenizer_config.json:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/998 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Entity: Apple, Label: I-ORG, Score: 0.9995648264884949
Entity: Inc, Label: I-ORG, Score: 0.9994916915893555
Entity: American, Label: I-MISC, Score: 0.9970537424087524
Entity: Cup, Label: I-LOC, Score: 0.9946129322052002
Entity: ##ert, Label: I-LOC, Score: 0.8585067987442017
Entity: ##ino, Label: I-LOC, Score: 0.9911662936210632
Entity: California, Label: I-LOC, Score: 0.9981574416160583


In [12]:
# Perform NER on text
text = 'Article 4 : The Parties will consult together whenever, in the opinion of any of them, the territorial integrity, political independence or security of any of the Parties is threatened.'
entities = nlp_ner(text)

# Print the extracted entities
for entity in entities:
    print(f"Entity: {entity['word']}, Label: {entity['entity']}, Score: {entity['score']}")

Entity: Parties, Label: I-MISC, Score: 0.6970688700675964
Entity: Parties, Label: I-MISC, Score: 0.8338868021965027


In [13]:
from transformers import DistilBertTokenizer, TFDistilBertForQuestionAnswering
import tensorflow as tf

tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-cased-distilled-squad")
model = TFDistilBertForQuestionAnswering.from_pretrained("distilbert-base-cased-distilled-squad")

question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"

inputs = tokenizer(question, text, return_tensors="tf")
outputs = model(**inputs)

answer_start_index = int(tf.math.argmax(outputs.start_logits, axis=-1)[0])
answer_end_index = int(tf.math.argmax(outputs.end_logits, axis=-1)[0])

predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1]
tokenizer.decode(predict_answer_tokens)


Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

All PyTorch model weights were used when initializing TFDistilBertForQuestionAnswering.

All the weights of TFDistilBertForQuestionAnswering were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForQuestionAnswering for predictions without further training.


'a nice puppet'