# Structure of the repository

In this repository, you can find three folders: Documentation, Text-generation-webui and Chatbot-interface. 
- Documentation-Folder: Within this folder you can find the documentation of the different steps conducted during the implementation and evaluation of the artefact.
- text-generation-webui: This folder provides the environment for the fine-tuning- and the perplexity- and loss-evaluation-steps.
- Chatbot-interface: This folder provides the chatbot-environment were the qualitative evaluation was conducted.


## Text-generation-webui

The Text-generation-webui-folder is based on the text-generation-webui developed by Oobabooga: [https://github.com/oobabooga/text-generation-webui]. The following parts describe the initial setup of a conda-environment for the text-generation-webui, the download of the text-generation-webui, the necessary adaptions of the underlying files and how the text-generation-webui can be started.

### Setup Conda-environment
The following commands have been executed within the terminal of jupyter-lab to setup a conda-environment for the text-generation-webui:
    
    conda create -n textgen python=3.10.9 -y
    conda activate textgen
    
### Download text-generation-webui
The following commands have been executed within the terminal of jupyter-lab to download and setup the text-generation-webui:

    pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
    git clone https://github.com/oobabooga/text-generation-webui
    cd text-generation-webui
    pip install -r requirements.txt

### Adaptions of the text-generation-webui [Can be skipped as soon as the webui works again correctly]
At the time of development, there was a bug in the text-generation-webui. If you tried to train a model, the following error message occured: UnboundLocalError: local variable 'tokens' referenced before assignment

We fixed this bug manually by adapting the tokenization_llama.py file in the transformers-package. The following code have been used to access the file via the terminal:

    cd /home/[...]/.conda/envs/textgen/lib/python3.10/site-packages/transformers/models/llama/
    edit tokenization_llama.py
    
The following adaptions have been made to the file:

Original function:


    def tokenize(self, text: "TextInput", **kwargs) -> List[str]:
        """
        Converts a string to a list of tokens. If `self.legacy` is set to `False`, a prefix token is added unless 
        the first token is special.
        """
        if self.legacy:
            return super().tokenize(text, **kwargs)

        if len(tokens)>1: 
            tokens = super().tokenize(SPIECE_UNDERLINE + text.replace(SPIECE_UNDERLINE, " "), **kwargs)

        if tokens[0] == SPIECE_UNDERLINE and tokens[1] in self.all_special_tokens:
            tokens = tokens[1:]

        return tokens


Adapted function:

    def tokenize(self, text: "TextInput", **kwargs) -> List[str]:
        """
        Converts a string to a list of tokens. If `self.legacy` is set to `False`, a prefix token is added unless 
        the first token is special.
        """
        if self.legacy:
            return super().tokenize(text, **kwargs)

        tokens = super().tokenize(SPIECE_UNDERLINE + text.replace(SPIECE_UNDERLINE, " "), **kwargs)

        if len(tokens)>1 and tokens[0] == SPIECE_UNDERLINE and tokens[1] in self.all_special_tokens:
            tokens = tokens[1:]

        return tokens


    

### Starting the Webui
The text-generation-webui can be startet via the terminal using the following commands:
    
    cd /home/[...]/text-generation-webui/
    python server.py --share --model models/Llama-2-7b-chat-hf
    
The textgeneration-webui runs on a local and public url. Copy the public url into your browser to access the webui.

## Chatbot-interface
To initialize the artefact in a chatbot interface, we firstly setup a new conda-environment. Therefore, we used the following commands in the terminal:

    conda create -n evaluation python=3.10.9 -y
    conda activate evaluation
    pip install transformers gradio torch accelerate
    
Afterwards, we created a gradio-app [https://www.gradio.app/] in the file "Gradio_app.py". The gradio-app can be started using the following commands in the terminal:

    cd /home/[...]/Chatbot-interface/
    python Gradio_app.py
    
Attention: If you want to use this gradio app, you have to enter your huggingface-token in line 6 in the document "Gradio_app.py".   
    
The gradio app runs on a local and public url. Copy the public url into your browser to access the webui. Furthermore, you can find in the sub-folder "Example_processes" one process example used in the qualitative evaluation.
