# OCTOPUS

<p align="center">
    <br>
    <img src="https://github.com/UBC-NLP/octopus/raw/main/images/octopus.jpg" width="70%" height="70%"/>
    <br>
<p>
Octopus is a neural machine generation toolkit for Arabic Natural Lnagauge Generation (NLG) that described in our ArabiNLP 2023 paper: [**OCTOPUS: A Multitask Model and Toolkit for Arabic Natural Language Generation**](https://arxiv.org/abs/2310.16127v1).

Octopus designed for eight machine generation tasks, encompassing diacritization, grammatical error correction, news headlines generation, paraphrasing, question answering, question generation, and
transliteration. This comprehensive package includes a Python library along with associated command-line scripts.

---


https://github.com/UBC-NLP/octopus

## Install requirments

In [1]:
!pip install -U git+https://github.com/UBC-NLP/octopus.git -q

  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m106.4/106.4 kB[0m [31m12.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m265.7/265.7 kB[0m [31m30.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m521.2/521.2 kB[0m [31m42.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m115.3/115.3 kB[0m [31m16.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m19.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for octopus (setup.py) ... [?25l[?25hdone


## Octopus Interactive Command Line Interface (CLI)
- OCTOPUS's tasks:
  - **Prefix**: `diacritize` **Task Name**: Diartization
  - **Prefix**: `correct_grammar` **Task Name**: Grammatical Error Correction
  - **Prefix**: `paraphrase` **Task Name**: Paraphrase
  - **Prefix**: `answer_question` **Task Name**: Question Answering
  - **Prefix**: `generate_question` **Task Name**: Question Generation
  - **Prefix**: `summarize` **Task Name**: Summarization
  - **Prefix**: `generate_title` **Task Name**: Title Generation
  - **Prefix**: `translitrate_ar2en` **Task Name**: Translitration Arabic-to-English
  - **Prefix**: `translitrate_en2ar` **Task Name**: Translitration English-to-Arabic
- `octopus_interactive` command supports only beam search with the following default setting:
  - **-s** or **--seq_length**: The maximum sequence length value, *default value is 300*
  - **-o** or **--max_outputs**: The maximum of the output tanslations (*default value is 1*)
  - **-b** or **--num_beams NUM_BEAMS**: Number of beams (*default value is 1*)
  - **-n** or **--no_repeat_ngram_size**: Number of n-gram that doesn't appears twice (*default value is 2*)
- `octopus_interactive` command asks you you to input translate your input text. Moreover, you can write q to exsit as shown in the following image.


### (1) Usage and Arguments

In [2]:
!octopus_interactive -h

2023-12-05 00:10:08.529291: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-12-05 00:10:08.529369: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-12-05 00:10:08.529411: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
usage: octopus_interactive [-h] [-c CACHE_DIR]

octopus Interactive CLI

options:
  -h, --help            show this help message and exit
  -c CACHE_DIR, --cache_dir CACHE_DIR
                        The cache directory path, default vlaue is octopus_cache directory


## (2) Octopus Interactive

In [3]:
!octopus_interactive

2023-12-05 00:10:18.115701: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-12-05 00:10:18.115762: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-12-05 00:10:18.115807: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-12-05 00:10:21 | INFO | octopus.interactive_cli | Namespace(cache_dir='./octopus_cache')
2023-12-05 00:10:21 | INFO | octopus.interactive_cli | Loading model from UBC-NLP/octopus
tokenizer_config.json: 100% 2.40k/2.40k [00:00<00:00, 14.1MB/s]
spiece.model: 100% 2.35M/2.35M [00:00<00:00, 29.4MB/s]
tokenizer.json: 100% 8.40M/8.40M [00:00<00:00, 27.2MB/s]
sp