# Time to test our finetuned model

Once our model trained comes the time to test its performances. There are different way to run inferences but we decided to simply use a pipeline.

## What is a pipeline in Machine Learning ? 

In Machine Learning (ML), a pipeline refers to a sequence of data processing steps designed to automate the workflow for training, evaluating, and deploying models. The goal is to streamline the process, ensure reproducibility, and enhance efficiency by chaining together various stages of data transformation and model training in a structured manner.

Pipelines for inferences play a crucial role in automating and streamlining the process of making predictions on new data. They ensure that the same preprocessing and feature engineering steps applied during model training are consistently applied during inference, maintaining the integrity and accuracy of the predictions.

## Defining our pipeline

In [15]:
from transformers import pipeline # type: ignore

# To define our pipeline we give it the task that will be done so text classification 
# and which model to use
classifier = pipeline("text-classification", model="model")

### Test 1

The first text is an answer generated by Chat GPT to the question " Who invented traffic lights ?"

In [2]:
text_1 = "The modern traffic light, as we know it today, was invented by Garrett Morgan, an African-American inventor and entrepreneur. In 1923, Morgan patented an improved version of the traffic signal, which included a third position to stop traffic in all directions to allow pedestrians to cross streets safely. Morgan's invention was a significant contribution to traffic management and road safety."

classifier(text_1)

[{'label': 'CHATGPT', 'score': 0.9999736547470093}]

### Results : 
Text 1 was correctly identified as a text generated by an AI.

### Test 2

The second text is an extract of an article from the online version of the Guardian.

In [3]:
text_2 = "In a landmark decision on one of three major climate cases, the first such rulings by an international court, the ECHR raised judicial pressure on governments to stop filling the atmosphere with gases that make extreme weather more violent."

classifier(text_2)

[{'label': 'HUMAN', 'score': 0.999920129776001}]

### Results : 
Text 2 was correctly identified as a text written by a human person.

### Test 3

The thrid text was written by me.

In [4]:
text3 = "Traffic lights were invented by Garrett Morgan."

classifier(text3)

[{'label': 'HUMAN', 'score': 0.9986786246299744}]

### Results : 
Text 3 was correctly identified as human-written.

In [5]:
result = classifier(text3)

label = result[0]['label']
score = result[0]['score']

print(f"Le troisième texte a été écrit par un {label.lower()}, avec une certitude de {round((score*100),2)}%")


Le troisième texte a été écrit par un human, avec une certitude de 99.87%


### Classification of data extracted from JSON files

In [13]:
import json  # Import the JSON library to handle JSON files

# Paths for the input and output files
input_file = "data/test.jsonl"  # Path to the input file in JSONL format
output_file = "data/toPredict_test.json"  # Path to the output file for the new JSON file

# Function to extract the answer from each line of the JSONL file
def extract_data_from_line(line):
    # Parse the JSON from the line
    json_data = json.loads(line)
    # Extract the answer
    answer = json_data["Answer"]
    # Return the answer
    return answer

# Function to create a new JSON file with only the answers
def create_new_json(input_file, output_file):
    new_data = []  # List to hold all the answers
    with open(input_file, "r") as f:
        for line in f:
            # Extract the answer from each line
            answer = extract_data_from_line(line)
            # Add the answer to new_data
            new_data.append({"text": answer})
    
    # Write the list new_data to the output file in JSON format
    with open(output_file, "w") as f:
        json.dump(new_data, f, indent=4)



In [14]:
# Call the create_new_json function with the input and output file paths
create_new_json(input_file, output_file)

In [11]:
def classify(filename):  # Definition of the classify function taking a filename as argument

    # Instantiation of a text classification pipeline
    # using a pre-trained model named "model"
    classifier = pipeline("text-classification", model="model")  
    # Open the file in read mode
    with open(filename, "r") as file:  
        # Load JSON data from the file into a variable named data
        data = json.load(file)  
    # Initialize an empty list to store the newly classified data
    new_data = []  

    # Iterate over each element in the data
    for element in data:
        # Extract the text from each element
        text = element["text"]  
        
        # Check if the length of the text is less than or equal to 512
        if len(text) <= 512:  
            # Classify the text using the text classification model
            result = classifier(text)  
            # Check the label predicted by the model
            if result[0]['label'] == "HUMAN":  
                # Assign label 0 if the model predicts "HUMAN"
                label = 0  
            else :
                # Assign label 1 if the model predicts something else
                label = 1  
            
            # Create a new dictionary containing the text and the label
            new_dict = {'Answer': text, 'label': label}  
            # Add the new dictionary to the new_data list
            new_data.append(new_dict)  
        else : 
            # If the length of the text exceeds 512 characters, do nothing and move to the next element
            # our model has a limit of 512 tokens
            pass  
    # Return the list of newly classified data
    return new_data  

In [12]:
# Call the classify function with the JSON file name as argument
classify(output_file)  


[{'Answer': 'Life is a journey with a beginning and an end. We all embark on this journey at birth and most of us go on as long as we can. The funny thing about life is that it is a complex lottery, with ups and downs. Sadly we do not get to pick how and where we begin our lives, some are luckier than others but what we can do is try our best to live the life that we want for ourselves.',
  'label': 0},
 {'Answer': "I don't possess personal beliefs or emotions, but I can explore the concept from various perspectives. The meaning of life is a deeply philosophical question that has puzzled humanity for centuries. Some may find meaning through personal fulfillment, relationships, or contributing to the greater good. Others may find meaning in spiritual beliefs or the pursuit of knowledge and understanding. Ultimately, the meaning of life is subjective and can vary greatly from person to person.",
  'label': 1},
 {'Answer': 'If I could change one thing about the world, I would make sure th

### OBSERVATIONS : 

Among these 8 different texts, 7 out of 8 were correctly identified. The last text was generated by ChatGPT but is labeled as "human." This is probably due to the length of the text; a short text might be more difficult for our model to classify correctly.

Let's conduct a new test with shorter responses.

In [10]:
input_file_2 = "data/test2.jsonl"
output_file_2 = "data/toPredict_test2.json"

create_new_json(input_file_2, output_file_2)
classify(output_file_2)

[{'Answer': 'Art is a free form of expression which can bring people together.',
  'label': 0},
 {'Answer': "Art has the power to inspire, provoke thought, evoke emotions, and promote cultural understanding, fostering creativity and expression while enriching society's cultural fabric.",
  'label': 0},
 {'Answer': 'Cultures are ideas, rules and traditions that link people to one another. It is the way one grows up and sometimes identifies with. Cultures can be points of reference which guide us through the journey of life.',
  'label': 0},
 {'Answer': 'Culture shapes our identities by influencing our beliefs, values, norms, behaviors, and worldview, providing a framework through which we understand ourselves and relate to others within society.',
  'label': 0}]

### OBSERVATIONS :

We observe that with shorter responses, it is difficult for the model to accurately identify ChatGPT texts. Here, it has identified all of them as texts written by a human when 2 out of 4 texts were actually generated by ChatGPT.