# Jupyter Notebook Annotator Indo-Javanese NLI

Welcome to Jupyter Notebook file for Indo-Javanese NLI dataset annotation!

I would like to offer my gratitute for you all to contribute to my postgrad's thesis.

NLI is an abbreviation of "Natural Language Inference", a subset study of NLP. What NLI do? Well, an NLI model is an AI model to help AI better understand semantic relation between 2 kind of sentences: a premise sentence, and hypothesis sentence. Semantic relation between the two sentences state whether a hypothesis sentence is **true**, **false**, or **can't be decided** based on the information on the premise. 

- If a hypothesis sentence is **true** based on the information on the premise, then the label is **entailment**, symbolized by zero (0) on the label column.
- If the information on the hypothesis sentence **can't be decided** based on the information on the premise, then the label is **neutral**, symbolized by one (1) on the label column.
- If a hypothesis sentence is **false** based on the information on the premise, then the label is **contradict**, symbolized by two (2) on the label column.

Your task as an annotator is simple!

I will provide the NLI data translated from Indonesian language to Javanese language using 3 different API:
- Mongosilakan.net,
- Google Translate, and
- ChatGPT 3.5

Here's what you will do:
- 3 different translated hypothesis data will be served in random order/random column in each row.
- You will rank 1-5 (5 being the best, and 1 being the lowest) as the quality of each translation.
- Every translation (all of column "1", column "2", and column "3") must be ranked.
- Pay attention whether each translated data **changes** the semantic relation with its premise sentence. I will also provide the premise sentence in Indonesian. So, pay closely whether the semantic relation change, for example, from entailment (symbolized by zero on the "label" column) to contradict.
- I will provide you with the annotation protocol guide.


If you have any other question, please do contact me via WhatsApp Group or send me an email to 23521059@std.stei.itb.ac.id.

## Import Libraries and Prepare the File

Importing required libraries

In [None]:
import os
import sys
import pandas as pd

Checking csv file

If you run this Jupyter Notebook file locally, you could change the ```os.getcwd()``` in the cell below to the path of this Jupyter Notebook file. The local path for ```current_directory``` variable should be in the form of:

```"D:\\Folder 1\\Subfolder 1\\Subfolder 2\\Subfolder 3"```

In [None]:
current_directory = os.getcwd()
filename = "test-for_annotator.csv"

current_directory = current_directory.replace("\\", "/")
full_path_to_filename = os.path.join(current_directory, filename)

if (os.path.exists(full_path_to_filename)):
    print("File checked.")
else:
    print(f"File {filename} not exist!")
    print(f"Please upload {filename} to the same directory in Google Colab or place it in the same directory as this Jupyter Notebook file.")
    sys.exit()

Load and View the file

In [None]:
df_data_testing = pd.read_csv(full_path_to_filename, sep="\t")
df_data_testing.head()

## Annotate the Data

The function below is used to help you annotate the data. Please don't change this function.

In [None]:
def annotate(the_column, the_row):
    print(f"Annotation for the {the_column} column in the row {str(the_row+1)}")
    x = input(f"Please put your mark (1-5) here for column {the_column}:")
    
    try:
        if(int(x) > 5 or int(x) < 1):
            print("Rank out of range!")
            print("Please input your mark from 1 to 5 (1 and 5 included, 5 being the best).")
        else:
            df_data_testing.loc[the_row, the_column] = int(x)
    except Exception as e:
        print("Error!")
        print(e)

**Checkpoint**

The code below is used to view the data for a given column. You can change the ```row``` variable value after you have already finished viewing and giving rank to all 3 columns in a row.

In [None]:
row = 0 # you can change this value after you have already giving rank for all 3 columns in this row.

In [None]:
print(f"Kalimat hipotesis dalam bahasa Indonesia pada row {str(row+1)}:")
print(df_data_testing["hypothesis"][row])

In [None]:
print(df_data_testing["1"][row])

After filling in the rank, please press Enter.

In [None]:
annotate("rank_1", row)

In [None]:
print(df_data_testing["2"][row])

After filling in the rank, please press Enter.

In [None]:
annotate("rank_2", row)

In [None]:
print(df_data_testing["3"][row])

After filling in the rank, please press Enter.

In [None]:
annotate("rank_3", row)

After you have done ranking all the 3 columns, you can add the ```row``` variable above by 1 to continue annotating, or you could save the annotated data now to continue annotate it later. Should you decide to continue later, please make sure to save the row value you're currently working at.

## Save the Data

Before saving, please make sure you save/remember the last ```row``` variable value you're currently working at. Then, run this code below to save the annotated data. 


**ACHTUNG! URGENT!**

DO NOT forget to download the data after you run the code below if you're using cloud service such as Google Colab, etc. If you're using local Jupyter Notebook, it's save to close the browser/tab after saving.

In [None]:
df_data_testing.to_csv(full_path_to_filename, sep='\t', index=False)