<a href="https://www.kaggle.com/code/ayushs9020/a-simple-beginner-friendly-approach-kaggle-llm?scriptVersionId=137737987" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# <p style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; color:#FF6C00; font-size:140%; text-align:left;padding: 0px; border-bottom: 3px solid #FF6C00">Kaggle LLM</p>

<div style="border-radius:10px; border:#FF6C00 solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">
    
The $Kaggle - LLM$ $Science$ $Exam$ is a `competition` that challenges to `answer difficult science-based questions` written by a `Large Language Model` $(LLM)$. The `Goal` of the competition is to help `researchers better understand` the `ability of LLMs` to test themselves, and the `potential of LLMs` that can be run in resource-constrained environments.

The `dataset` for the competition was generated by giving `gpt3.5 snippets` of text on a range of `scientific topics pulled` from `Wikipedia`, and asking it to `write a multiple choice question` (with a known answer), then `filtering out easy questions`.

`Participants` in the competition are asked to `develop an LLM` that can `answer the questions` in the dataset `as accurately as possible`. The competition is scored using the `average precision` at `cutoff k metric`, where $k$ is the `number of predictions` made for each question.

An estimations shays that the `largest models` run on `Kaggle` are around $10$ $Billion$ $Parameters$, whereas `gpt3.5 clocks` in at $175$ $Billion$ $Parameters$. If a `question-answering model can ace` a test written by a `question-writing model` more than $10$ `times its size`, this would be a genuinely `interesting result`; on the `other hand` if a `larger model can effectively` `stump a smaller one`, this has `compelling implications` on the `ability of LLMs` to benchmark and test themselves.
    
Thanks to **[Radek Osmulski](https://www.kaggle.com/radek1)** for providing amazing dataset

# <p style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; color:#00FFFF; font-size:140%; text-align:left;padding: 0px; border-bottom: 3px solid #00FFFF">1 | Approach</p>

<div style="border-radius:10px; border:#00FFFF solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">
    
* Replace the values in `answer` with actual `textual answers`
* Remove the columns `["id", "A" , "B" . "C" . "D" , "E"]`
* Make Embeddings for both of the columns
* Train a Simple NN that will predict answer when given the prompt
* Use the distance between $2$ vectors as a loss function
* Watch the results

# <p style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; color:#FF1493; font-size:140%; text-align:left;padding: 0px; border-bottom: 3px solid #FF1493">2 | Data 📊</p>

In [1]:
import pandas as pd

<div style="border-radius:10px; border:#FF1493 solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">
    
Lets just focus on the training data 

In [2]:
train = pd.concat(
    [
        pd.read_csv("/kaggle/input/kaggle-llm-science-exam/train.csv").drop("id" , axis = 1) , 
        pd.read_csv("/kaggle/input/additional-train-data-for-llm-science-exam/6000_train_examples.csv") , 
        pd.read_csv("/kaggle/input/additional-train-data-for-llm-science-exam/extra_train_set.csv")
    ] , axis = 0
)

train.head()

Unnamed: 0,prompt,A,B,C,D,E,answer
0,Which of the following statements accurately d...,MOND is a theory that reduces the observed mis...,MOND is a theory that increases the discrepanc...,MOND is a theory that explains the missing bar...,MOND is a theory that reduces the discrepancy ...,MOND is a theory that eliminates the observed ...,D
1,Which of the following is an accurate definiti...,Dynamic scaling refers to the evolution of sel...,Dynamic scaling refers to the non-evolution of...,Dynamic scaling refers to the evolution of sel...,Dynamic scaling refers to the non-evolution of...,Dynamic scaling refers to the evolution of sel...,A
2,Which of the following statements accurately d...,The triskeles symbol was reconstructed as a fe...,The triskeles symbol is a representation of th...,The triskeles symbol is a representation of a ...,The triskeles symbol represents three interloc...,The triskeles symbol is a representation of th...,A
3,What is the significance of regularization in ...,Regularizing the mass-energy of an electron wi...,Regularizing the mass-energy of an electron wi...,Regularizing the mass-energy of an electron wi...,Regularizing the mass-energy of an electron wi...,Regularizing the mass-energy of an electron wi...,C
4,Which of the following statements accurately d...,The angular spacing of features in the diffrac...,The angular spacing of features in the diffrac...,The angular spacing of features in the diffrac...,The angular spacing of features in the diffrac...,The angular spacing of features in the diffrac...,D


In [3]:
train.to_csv("/kaggle/working/Sample Data")
train = pd.read_csv("/kaggle/working/Sample Data")

# <p style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; color:#00FF7C; font-size:140%; text-align:left;padding: 0px; border-bottom: 3px solid #00FF7C">3 | Data Preperations 📈</p>

In [4]:
import tqdm

<div style="border-radius:10px; border:#00FF7C solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">
    
Now we will replace the `answers` with actual text data

In [5]:
for index in tqdm.tqdm(range(train.shape[0]) , total = train.shape[0]):
    train["answer"][index] = train[train["answer"][index]][index]

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  train["answer"][index] = train[train["answer"][index]][index]
100%|██████████| 6700/6700 [00:02<00:00, 2500.11it/s]


<div style="border-radius:10px; border:#00FF7C solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">

Lets remove the remaining columns as they are of no use to us for now 

In [6]:
train.drop(["Unnamed: 0" , "A" , "B" , "C" , "D" , "E"] , axis = 1 , inplace = True)

# <p style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; color:#FF0000; font-size:140%; text-align:left;padding: 0px; border-bottom: 3px solid #FF0000">4 | Tokenization 🐱‍👤</p>

In [7]:
from transformers import AutoTokenizer , AutoModel
import torch

<div style="border-radius:10px; border:#FF0000 solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">

We will use `RoBERTa` for gathering `Embeddings`. We might change this in further versions
    
## $RoBERTa$
    
$RoBERTa$ $Robustly$ $Optimized$ $BERT$ $Pretraining$ $Approach$ is a $Natural$ $Language$ $Processing$ $(NLP)$ model that was proposed in $2019$ by `Yinhan` Liu et al. It is a `reimplementation` of $BERT$ ($Bidirectional$ $Encoder$ $Representations$ from $Transformers$) with some `modifications` to the key `hyperparameters` and `minor embedding tweaks`. These modifications led to `significant performance gains` on a number of NLP tasks. $RoBERTa$ is based on the `transformer architecture`, which is a `Neural Network Architecture` that is particularly well-suited for NLP tasks. The transformer architecture uses `self-attention` to learn `long-range dependencies` between words in a sentence. This allows $RoBERTa$ to learn more `contextual representations` of words, which is important for many NLP tasks.

$RoBERTa$ is trained on a `massive dataset` of text and code. The dataset consists of `books`/`articles`/`code`. The dataset is `preprocessed` using `Byte-Level` `BPE` `(Byte Pair Encoding)`, which is a technique for tokenizing text into smaller units.

$RoBERTa$ is trained using a `Masked Language Modeling` ($MLM$) objective. In the MLM objective, some of the words in a `sentence are masked`, and the model is then trained to `predict the masked words`. This helps the model to `learn the contextual representations` of words.

In [8]:
tokenizer = AutoTokenizer.from_pretrained("roberta-base")
model = AutoModel.from_pretrained("roberta-base")

Downloading (…)lve/main/config.json:   0%|          | 0.00/481 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

caused by: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io_plugins.so: undefined symbol: _ZN3tsl6StatusC1EN10tensorflow5error4CodeESt17basic_string_viewIcSt11char_traitsIcEENS_14SourceLocationE']
caused by: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io.so: undefined symbol: _ZTVN10tensorflow13GcsFileSystemE']


Downloading model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]

Some weights of the model checkpoint at roberta-base were not used when initializing RobertaModel: ['lm_head.dense.weight', 'lm_head.dense.bias', 'lm_head.bias', 'lm_head.layer_norm.bias', 'lm_head.layer_norm.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-base and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


<div style="border-radius:10px; border:#FF0000 solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">

Now we will make embeddings from textual data 

In [9]:
for index in tqdm.tqdm(range(train.shape[0])):
    train["prompt"][index] = tokenizer(train["prompt"][index] , 
                                       return_tensors = "pt")["input_ids"]
    
    with torch.no_grad():train["answer"][index] = model(tokenizer(train["answer"][index] , 
                                                                  return_tensors = "pt")["input_ids"])[0][0][0]
    torch.cuda.empty_cache()

100%|██████████| 6700/6700 [08:21<00:00, 13.36it/s]


# <p style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; color:#FF1493; font-size:140%; text-align:left;padding: 0px; border-bottom: 3px solid #FF1493">5 | Model Setup 📊</p>

In [10]:
from transformers import AutoModel
import torch.nn as nn

<div style="border-radius:10px; border:#FF1493 solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">
    
## $5.1$ $|$ $RoBERTa$ $Base$
    
Our first model will be `RoBERTa Base`

In [11]:
class model(nn.Module):
    
    def __init__(self):
        super(model, self).__init__()
    
        self.r_model = AutoModel.from_pretrained("roberta-base")

        self.linear_1 = nn.Linear(768 , 768)
    
    def forward(self, inputs):
    
        inputs = self.r_model(inputs)[0]
        inputs = torch.mean(inputs, axis=1)
        
        output = self.linear_1(inputs)

        return output

<div style="border-radius:10px; border:#FF1493 solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">
    
## $5.2$ $|$ $RoBERTa$ $Large$

In [12]:
class model(nn.Module):

    def __init__(self):
        super(model, self).__init__()

        self.r_model = AutoModel.from_pretrained("roberta-large")

        self.linear_1 = nn.Linear(1024 , 1024)

    def forward(self, inputs):

        inputs = self.r_model(inputs)[0]
        inputs = torch.mean(inputs, axis=1)

        output = self.linear_1(inputs)

        return output

<div style="border-radius:10px; border:#FF1493 solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">
    
## $5.3$ $ALBERT$ $Base$ $V2$

In [13]:
class model(nn.Module):

    def __init__(self):
        super(model, self).__init__()

        self.r_model = AutoModel.from_pretrained("albert-base-v2")

        self.linear_1 = nn.Linear(768 , 768)

    def forward(self, inputs):

        inputs = self.r_model(inputs)[0]
        inputs = torch.mean(inputs, axis=1)

        output = self.linear_1(inputs)

        return output

<div style="border-radius:10px; border:#FF1493 solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">
    
## $5.4$ $|$ $DeBERTa$ $Base$

$DeBERTa$ is proposed as a `successor` of $BERT$ and $RoBERTa$ with `some tweeks`

* $Disantangled$ $Attention$ $Machenism$ - In BERT the attenstian heads recieved infomration as the addition of position vectors and the context vectors. De BERTa used a different mechanism, sending these in a couple. and further calculating with the help of disantangled matrics

$$A_{i , j} = (H_i , P_{i|j}) x (H_i , P_{i|J})^T$$
$$= H_iH_j^T + H_IP_{j|i}^T + P_{i|j}H_j^T + P_{i|j}P_{j|i}^T$$
* $Enhanced$ $Mask$ $Decoder$ - which aims at predicting the absoute position of the word as well

$$max logp_θ(X|̂X) = max\sum_{sEC}logP_Θ(̂x_i == x_i|̂X)$$

In [14]:
class model(nn.Module):

    def __init__(self):
        super(model, self).__init__()

        self.r_model = AutoModel.from_pretrained("microsoft/deberta-base")

        self.linear_1 = nn.Linear(768 , 768)

    def forward(self, inputs):

        inputs = self.r_model(inputs)[0]
        inputs = torch.mean(inputs, axis=1)

        output = self.linear_1(inputs)

        return output

<div style="border-radius:10px; border:#FF1493 solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">

## $5.5$ $|$ $ELECTRA$

`Famous tradition architechtures` like $BERT$/$GPT$/$T5$/$XLNET$, use the method of `Masked Language Modelling` $(MLM)$, which `corrupts` the `input randomly` $(10-20)$% putting a `mask` on the `words` and ask the `model` to `predict` the `masked words`. This type of `training objective` is `expensive` as the `network only learns` around $15$% from `each token`

$Efficiently$ $Learning$ $Encoder$ $Classifies$ $Token$ $Replecements$ $Accurately$ $(ELECTRA)$ uses the `same concept`, but with a `little tweek`. Instead of masking the words, the `words` are `replaced` by `another corrupt word`, and asked to `predict wether a given word` was `replaced`/`not`

This technique not only `decreases` the `parameter count` but also `increase` the `accuracy`.

This type of architechture can be compared to a $Generative$ $Adverserial$ $Networks$ $(GANs)$ where a $Generator$
 is a `seperate` $Neural$ $Network$ that tries to `make false samples` and $Discriminator$, another $NN$ is trained to `predict` if those are `real`/`fake`.

 A key difference between $GANs$ and $ELECTRA$ is the size of $Discriminator$/$Generator$. Whereas in $GANs$ both the models are of `same size`, in $ELECTRA$ the $Generator$ is basically a smaller version of $Discriminator$. This was made on purpose as, if not done, would take $2$ as much time as $BERT$.

 $Generater$/$Discriminitor$ also share $Embedding$ $Wegihts$, which further `decreases` the number of `Trainable Parameters`

In [15]:
class model(nn.Module):

    def __init__(self):
        super(model, self).__init__()

        self.r_model = AutoModel.from_pretrained("google/electra-small-discriminator")

        self.linear_1 = nn.Linear(256 , 256)

    def forward(self, inputs):

        inputs = self.r_model(inputs)[0]
        inputs = torch.mean(inputs, axis=1)

        output = self.linear_1(inputs)

        return output

<div style="border-radius:10px; border:#FF1493 solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">

## $5.6$ $|$ $BART$

In [16]:
class model(nn.Module):

    def __init__(self):
        super(model, self).__init__()

        self.r_model = AutoModel.from_pretrained("facebook/bart-base")

        self.linear_1 = nn.Linear(768 , 768)

    def forward(self, inputs):

        inputs = self.r_model(inputs)[0]
        inputs = torch.mean(inputs, axis=1)

        output = self.linear_1(inputs)

        return output

In [17]:
ro = model()

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.72k [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/558M [00:00<?, ?B/s]

# <p style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; color:#0047AB; font-size:140%; text-align:left;padding: 0px; border-bottom: 3px solid #0047AB">6 | Training Arguments 🤐</p>

<div style="border-radius:10px; border:#0047AB solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">

## $Loss$ $Function$
    
We will try to use a normal loss function this time, We might update the loss function further, seeing the results
    
A loss function like this 
    
```
def loss(preds , targets):
    val = 0
    for index in range(768):
        val += (preds[index] ** 2) - (targets[index] ** 2)

    return val ** (1/2)
```
gives higher $10$ times higher losses

In [18]:
def loss(preds , targets):

    return torch.sum(preds - targets)

<div style="border-radius:10px; border:#0047AB solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">

## $Optimizer$

We will use Adam optimizer

In [19]:
optim = torch.optim.Adam(ro.parameters())

# <p style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; color:#00FF00; font-size:140%; text-align:left;padding: 0px; border-bottom: 3px solid #00FF00">7 | Training Loop ➰</p>

<div style="border-radius:10px; border:#00FF00 solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">
    
Now we will start the training loop. To save time for this run, We have made many small tunes, which might counter the model efficiency a little bit.
    
I am not using Kaggle GPU, but have imported the results from Wanbd 
    
```
losses = []
for x , y in tqdm.tqdm(zip(train["prompt"] , train["answer"]) , total = train.shape[0):
    x = torch.tensor(x , dtype = torch.long).to("cuda")
    y = torch.tensor(y , dtype = torch.float32).to("cuda")
    
    x = x.reshape(shape = (1 , x.shape[0]))
    
    if x.shape[1] > 512: x = x[: , :512]

    pred = model(x)[0]

    loss_fun = loss(pred , y)
    losses.append(loss_fun)
    
    torch.cuda.empty_cache()

    optim.step
```

# <p style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; color:#006600; font-size:140%; text-align:left;padding: 0px; border-bottom: 3px solid #003300">8 | Results Visualization 💪</p>

In [20]:
from IPython.display import IFrame

<div style="border-radius:10px; border:#DEB887 solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">
    
## $8.1$ $|$ $RoBERTa$ $Base$
    
loss	▅▂▄▃▄▄▄▅▄▄▅▃▂▂▂▂▆▃▂▃▂▃▃▂▂▄▃▄▂▃▂▆▇▇▇▁▁▂█▇

loss	-14.46438

In [21]:
IFrame("https://wandb.ai//ayushsinghal659/Kaggle%20LLM%20%7C%20Simple%20Approach/reports/Simple-Approach-Kaggle-LLM--Vmlldzo0OTMxMzIz" , 1000 , 400)

<div style="border-radius:10px; border:#DEB887 solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">

## $8.2$ $|$ $RoBERTa$ $large$

loss	▅▆▆▄█▆▇▂█▇▅▇▃▃▇▃▄▂▂▆▄▄▆▆▂█▄▅▃▅▂▃▅▃▁▇▃▅▄▃

Run summary:

loss	72.52994

<div style="border-radius:10px; border:#DEB887 solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">

## $8.3$ $|$ $ALBERT$ $BASE$ $V2$

loss	▅▆▇▇▅▆█▆█▅█▆▆▆▅▆▅▅▅▆▆▅▆▇▇▆▆▆▁▇▄▆▆▆▇▆▇▂▆▇

loss	25.45859

In [22]:
IFrame("https://wandb.ai/ayushsinghal659/Kaggle%20LLM%20%7C%20Simple%20Approach%7C%20Albert/reports/ALBERT-KAGGLE-LLM--Vmlldzo0OTQ2MjA1" , 1000 , 400)

<div style="border-radius:10px; border:#DEB887 solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">

## $8.4$ $|$ $Deberta$ $Base$

loss	▄▃▃▃▅▄▂▅▃▂▄▂▃▃▂▄▁▃▂▄▅▃▃▁▄▁▂▃▅▄▁▄▂▂█▄▃▅▂▂

loss	-19.49314

In [23]:
IFrame("https://wandb.ai/ayushsinghal659/Kaggle%20LLM%20%7C%20Simple%20Approach%7C%20Deberta/reports/DEBERTA-KAGGLE-LLM--Vmlldzo0OTQ4MjQy" , 1000 , 400)

<div style="border-radius:10px; border:#DEB887 solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">

## $8.5$ $|$ $ELECTRA$

loss	▅▅▆▆▄▂▃▅▃▅▃▄▄▂▇▆▆▆▆▇▄▂▅▆▃▅▄▅▄▅▁█▃▅▂▁▆▄▄▇

loss	-13.06066

In [24]:
IFrame("https://wandb.ai/ayushsinghal659/Kaggle%20LLM%20%7C%20Simple%20Approach%7C%20ELECTRA/reports/ELECTRA-KAGGLE-LLM--Vmlldzo0OTQ5MjAw" , 1000 , 400)

<div style="border-radius:10px; border:#DEB887 solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">

## $8.6$ $|$ $BART$

loss	▅▃▄▆▅▃▄▅▃▆▂▃▄▃▄▄▄▄▃▁▅▅▂▇▂▅▄▃▃▄▄▆▂▂█▄▇▁▄▅

loss	-6.38642

In [25]:
IFrame("https://wandb.ai/ayushsinghal659/Kaggle%20LLM%20%7C%20Simple%20Approach%7C%20BART/reports/BART-KAGGLE-LLM--Vmlldzo0OTU2NzUw" , 1000 , 400)

# <p style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; color:#AEFF32; font-size:140%; text-align:left;padding: 0px; border-bottom: 3px solid #AEFF32">9 | TO DO LIST 📝</p>

<div style="border-radius:10px; border:#AEFF32 solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">

* $TO$ $DO$ $1$ $:$ $MAKE$ $BETTER$ $LOSS$ $FUNCTION$
* $TO$ $DO$ $1$ $:$ $MAKE$ $PROPER$ $NN$
* $TO$ $DO$ $1$ $:$ $DANCE$

# <p style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; color:#00F900; font-size:140%; text-align:left;padding: 0px; border-bottom: 3px solid #00F900">10 | Ending 🏁</p>

<div style="border-radius:10px; border:#00F900 solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">
    
**THAT IT FOR TODAY GUYS**

**WE WILL GO DEEPER INTO THE DATA IN THE UPCOMING VERSIONS**

**PLEASE COMMENT YOUR THOUGHTS, HIHGLY APPRICIATED**

**DONT FORGET TO MAKE AN UPVOTE, IF YOU LIKED MY WORK $:)$**
    
<img src = "https://i.imgflip.com/19aadg.jpg">
    
**PEACE OUT $!!!$**