# Access the MentalRiskEs data and interact with the server

This notebook has been developed by the [SINAI](https://sinai.ujaen.es/) research group for its usage in the [MentalRiskES](https://sites.google.com/view/mentalriskes2025/) evaluation campaign at IberLEF 2025.

**NOTE 1**: Please visit the [MentalRiskES competition website](https://sites.google.com/view/mentalriskes2025/evaluation) to read the instructions about how to download the data and interact with the server to send the predictions of your system.

**NOTE 2**: Along the code, please replace "URL" by the URL server and "TOKEN" by your personal token.

Remember this is a support to help you to develop your own system of communication with our server. We recommend you to download it as a Python script instead of working directly on colab and adapt the code to your needs.

# Install CodeCarbon package
Read the [documentation](https://mlco2.github.io/codecarbon/) about the library if necessary. Remember that we provide a [CodeCarbon notebook](https://colab.research.google.com/drive/1boavnGOir0urui8qktbZaOmOV2pS5cn6?usp=sharing) with the example in its specific use in our competition.


In [6]:
import os
import json
import random
from collections import defaultdict

import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader, random_split
from torch.nn.utils.rnn import pad_sequence

from sklearn import metrics
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import LinearSVC
from sklearn.metrics import f1_score

from transformers import AutoTokenizer, AutoModel, AutoModelForSequenceClassification


from datasets import Dataset


In [7]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Import libraries

In [8]:
import requests, zipfile, io
from requests.adapters import HTTPAdapter, Retry
from typing import List, Dict
import random
import json
import os
import pandas as pd
from codecarbon import EmissionsTracker

# Endpoints
These URL addresses are necessary for the connection to the server.

**IMPORTANT:** Replace "URL" by the URL server and "TOKEN" by your user token.

In [None]:
URL = "" #my URL
TOKEN = "" #my token 

# Download endpoints     Dùng để tải dữ liệu về máy
ENDPOINT_DOWNLOAD_TRIAL = URL+"/{TASK}/download_trial/{TOKEN}"
ENDPOINT_DOWNLOAD_TRAIN = URL+"/{TASK}/download_train/{TOKEN}"

# Trial endpoints        Dùng để gửi kết quả dự đoán lên server (lúc trial)
ENDPOINT_GET_MESSAGES_TRIAL = URL+"/{TASK}/getmessages_trial/{TOKEN}"
ENDPOINT_SUBMIT_DECISIONS_TRIAL = URL+"/{TASK}/submit_trial/{TOKEN}/{RUN}"

# Test endpoints         
# Dùng để gửi kết quả dự đoán lên server (lúc thât, cẩn thận vì khi nộp là không nộp lại được)
ENDPOINT_GET_MESSAGES = URL+"/{TASK}/getmessages/{TOKEN}"
ENDPOINT_SUBMIT_DECISIONS = URL+"/{TASK}/submit/{TOKEN}/{RUN}"

# Download Data
To download the data, you can make use of the **function defined in the following**.

The following function download the trial data. To adapt it to download the train and test data, follow the instructions given in the [website of the competition](https://sites.google.com/view/mentalriskes2024/evaluation).

In [5]:
def download_messages_trial(task: str, token: str):
    """ Allows you to download the trial data of the task.
        Args:
          task (str): task from which the data is to be retrieved
          token (str): authentication token
    """

    response = requests.get(ENDPOINT_DOWNLOAD_TRIAL.format(TASK=task, TOKEN=token)) # gửi request đến server để lấy data (trial)

    if response.status_code != 200:
        print("Trial - Status Code " + task + ": " + str(response.status_code) + " - Error: " + str(response.text))
    else:
      z = zipfile.ZipFile(io.BytesIO(response.content))
      os.makedirs("./data/{task}/trial/".format(task=task))
      z.extractall("./data/{task}/trial/".format(task=task))

In [6]:
def download_messages_train(task: str, token: str):
    """ Allows you to download the train data of the task.
        Args:
          task (str): task from which the data is to be retrieved
          token (str): authentication token
    """
    response = requests.get(ENDPOINT_DOWNLOAD_TRAIN.format(TASK=task, TOKEN=token))

    if response.status_code != 200:
        print("Train - Status Code " + task + ": " + str(response.status_code) + " - Error: " + str(response.text))
    else:
      z = zipfile.ZipFile(io.BytesIO(response.content))
      os.makedirs("./data/{task}/train/".format(task=task),exist_ok=True)
      z.extractall("./data/{task}/train/".format(task=task))

In [10]:
def map_emoji_to_spanish(emoji=None):
    emoji_map = {
        "🔝": "arriba",
        "👎": "no me gusta",
        "😳": "sorprendido",
        "4️⃣": "cuatro",
        "🖐🏼": "mano abierta",
        "💎": "diamante",
        "🤣": "riendo fuerte",
        "🤞🏻": "dedos cruzados",
        "🍺": "cerveza",
        "❣": "corazón exclamación",
        "🤡": "payaso",
        "🎅🏻": "Papá Noel",
        "⬆": "subir",
        "💸": "dinero volando",
        "🤤": "babeando",
        "❌": "cruz",
        "🙌🏻": "manos arriba",
        "🤩": "asombrado",
        "🇵🇪": "Perú",
        "🤠": "vaquero",
        "🟣": "círculo morado",
        "🖐🏽": "mano abierta",
        "🙃": "cara invertida",
        "🐸": "rana",
        "👆🏼": "señalando arriba",
        "🈚": "gratis",
        "🌐": "mundo",
        "🎁": "regalo",
        "🎉": "celebración",
        "😵‍💫": "mareado",
        "🌝": "luna llena",
        "🙋‍♂": "hombre levantando mano",
        "3️⃣": "tres",
        "🔮": "bola de cristal",
        "😰": "nervioso",
        "😨": "miedo",
        "❓": "pregunta",
        "☝🏻": "dedo arriba",
        "🥲": "lágrimas de alegría",
        "✊🏼": "puño levantado",
        "✊": "puño",
        "🧘🏻‍♂": "meditación",
        "🧐": "curioso",
        "👏🏾": "aplausos",
        "🐳": "ballena",
        "💪🏼": "fuerza",
        "✅": "aprobado",
        "🤦🏼‍♂": "vergüenza",
        "😍": "enamorado",
        "👻": "fantasma",
        "😂": "riendo",
        "💪🏻": "fuerte",
        "🫤": "decepción",
        "⚽": "fútbol",
        "🥚": "huevo",
        "🙏": "rezando",
        "🤙": "llámame",
        "🙄": "aburrido",
        "😲": "asombro",
        "♥": "corazón",
        "🍎": "manzana",
        "🐻": "oso",
        "🤪": "loco",
        "👆🏽": "señalando arriba",
        "🎢": "montaña rusa",
        "🙌": "celebrando",
        "🌘": "luna menguante",
        "🫡": "saludo",
        "🙋🏻‍♀": "mujer levantando mano",
        "🤦‍♂": "error",
        "🌊": "ola",
        "😉": "guiño",
        "🥶": "frío",
        "💋": "beso",
        "🇺🇦": "Ucrania",
        "😶‍🌫": "confundido",
        "🌬": "viento",
        "💩": "mierda",
        "👌🏼": "perfecto",
        "🙆‍♂": "hombre OK",
        "💪🏽": "fuerza",
        "😱": "gritando",
        "1️⃣": "uno",
        "🤘": "rock",
        "👉": "señalando derecha",
        "🙂": "sonriendo",
        "👁": "ojo",
        "👀": "ojos",
        "🔥": "fuego",
        "⏺": "grabar",
        "😅": "sudando",
        "❗": "exclamación",
        "😕": "confuso",
        "🥒": "pepino",
        "🎂": "torta",
        "😥": "aliviado",
        "✌🏽": "victoria",
        "🎾": "tenis",
        "💚": "corazón verde",
        "💔": "corazón roto",
        "👍": "bien",
        "🐶": "perro",
        "✔": "verificado",
        "✌🏻": "paz",
        "💪": "músculo",
        "🎈": "globo",
        "🤑": "dinero en la cara",
        "😾": "gato enfadado",
        "💵": "billete",
        "👋🏻": "saludando",
        "👈🏻": "señalando izquierda",
        "💰": "bolsa de dinero",
        "🎼": "música",
        "🐮": "vaca",
        "🇦🇷": "Argentina",
        "🤷🏼‍♀": "mujer encogiéndose",
        "💃": "bailando",
        "🤮": "vomitando",
        "🇷🇺": "Rusia",
        "😎": "genial",
        "🥳": "fiesta",
        "⚰": "ataúd",
        "💯": "cien puntos",
        "📈": "gráfico subiendo",
        "😭": "llorando",
        "😪": "somnoliento",
        "🤞🏼": "suerte",
        "🤦🏽‍♂": "hombre avergonzado",
        "▶": "reproducir",
        "⛔": "prohibido",
        "🎶": "notas musicales",
        "🙊": "mono callado",
        "🌚": "luna nueva",
        "👏": "aplaudiendo",
        "🙏🏽": "rezando",
        "😄": "feliz",
        "🤦🏻‍♂": "error hombre",
        "🇨🇳": "China",
        "👌🏻": "OK",
        "🤙🏻": "llámame",
        "🇳🇬": "Nigeria",
        "😃": "alegre",
        "ℹ️": "información",
        "🗣": "hablando",
        "🙌🏼": "manos levantadas",
        "🤞": "cruzando dedos",
        "😜": "broma",
        "🎵": "nota musical",
        "🤟": "te amo",
        "✈": "avión",
        "👌🏽": "perfecto",
        "🤦🏽": "vergüenza",
        "👍🏾": "bien",
        "🔹": "diamante azul",
        "😝": "lengua fuera",
        "💶": "euro",
        "🤓": "nerd",
        "😶": "sin expresión",
        "🐁": "ratón",
        "🐗": "jabalí",
        "🤦🏻‍♀": "mujer avergonzada",
        "🍏": "manzana verde",
        "🟢": "círculo verde",
        "🙌🏽": "celebración",
        "🇪🇸": "España",
        "✨": "brillo",
        "🤷🏻‍♂": "hombre encogiéndose",
        "🚨": "alarma",
        "🥰": "amor",
        "☺": "sonrisa",
        "🤷‍♂": "duda",
        "🤯": "cabeza explotando",
        "🥺": "suplicando",
        "🐟": "pez",
        "🇮🇳": "India",
        "😐": "neutral",
        "😁": "sonriendo amplio",
        "🙋🏻‍♂": "levantando mano",
        "😓": "sudor",
        "🕺": "bailando",
        "😯": "sorprendido",
        "👉🏻": "señalando derecha",
        "💥": "explosión",
        "😢": "llorando",
        "🦖": "T-Rex",
        "⚡": "rayo",
        "😴": "durmiendo",
        "🫣": "espiando",
        "😻": "gato enamorado",
        "🥵": "caliente",
        "👍🏻": "pulgar arriba",
        "🇧🇾": "Bielorrusia",
        "🤷🏽‍♀": "mujer dudando",
        "😋": "saboreando",
        "🚫": "prohibido",
        "👅": "lengua",
        "😆": "riendo mucho",
        "😊": "sonriendo feliz",
        "😇": "ángel",
        "😠": "enojado",
        "🌎": "Américas",
        "⬇": "bajar",
        "😞": "triste",
        "🔵": "círculo azul",
        "📨": "correo",
        "👆": "arriba",
        "😘": "besando",
        "🌖": "luna gibosa",
        "❤": "corazón rojo",
        "☝": "dedo arriba",
        "✌": "victoria",
        "🍻": "brindis",
        "🤝": "apretón de manos",
        "👋": "saludo",
        "💲": "dólar",
        "👍🏼": "bien",
        "🚶🏻‍♂": "hombre caminando",
        "🤔": "pensando",
        "😹": "gato riendo",
        "🫵": "señalando",
        "🤭": "riendo callado",
        "🪂": "paracaídas",
        "😈": "diablo",
        "🔰": "principiante",
        "🫀": "corazón",
        "😒": "molesto",
        "🤷": "no sé",
        "😀": "felicidad",
        "🍀": "trébol",
        "🔪": "cuchillo",
        "😮": "boca abierta",
        "💬": "hablar",
        "✋": "mano levantada",
        "😌": "alivio",
        "💦": "sudor",
        "🤷🏼‍♂": "duda",
        "☹": "tristeza",
        "🤨": "sospecha",
        "🤙🏽": "llámame",
        "🔻": "triángulo abajo",
        "🛍": "compras",
        "🤧": "estornudo",
        "💫": "mareo",
        "👼": "ángel",
        "🤌": "pellizco",
        "💨": "rápido",
        "😛": "lengua fuera",
        "🎄": "árbol de Navidad",
        "🥹": "lágrimas contenidas",
        "☀": "sol",
        "🌕": "luna llena",
        "🇺🇸": "Estados Unidos",
        "👏🏼": "aplausos",
        "‼": "doble exclamación",
        "🚀": "cohete",
        "😡": "furioso",
        "😬": "nervios",
        "🔴": "círculo rojo",
        "🙏🏻": "orando",
        "🙈": "mono tapándose",
        "🦥": "perezoso",
        "🌙": "luna creciente",
        "👈": "señalando izquierda",
        "🐷": "cerdo",
        "🥸": "disfrazado",
        "😏": "sonrisa pícara",
        "😚": "beso cerrado",
        "⚓": "ancla",
        "👌": "OK",
        "🤟🏻": "te amo",
        "🌌": "vía láctea",
        "⚠": "advertencia",
        "🥱": "bostezando",
        "🐬": "delfín",
        "📊": "gráfico",
        "🐀": "rata",
        "🤗": "abrazo",
        "😔": "pensativo",
        "👏🏻": "aplaudiendo",
        "🇧🇬": "Bulgaria",
        "🥴": "mareado"
    }
    if emoji is None:
        return emoji_map  
    return emoji_map.get(emoji, emoji) 

def replace_emojis_in_text(text):
    result = text
    emoji_dict = map_emoji_to_spanish()
    for emoji, spanish_text in emoji_dict.items():
        result = result.replace(emoji, f" {spanish_text} ")
    words = result.split()
    if not words:
        return ""
    cleaned_words = [words[0]]
    for i in range(1, len(words)):
        if words[i] != words[i - 1]:
            cleaned_words.append(words[i])
    return " ".join(cleaned_words).strip()

def process_data(text):
    return replace_emojis_in_text(text)

In [11]:
class LSTMClassifier(nn.Module):
    def __init__(self, input_size, h_size, output_dim, dropout=0):
        super().__init__()
        self.input_size = input_size
        self.h_size = h_size
        self.output_dim = output_dim
        self.lstm = nn.LSTM(input_size=input_size, hidden_size=h_size, num_layers=1, 
                           batch_first=False, dropout=dropout, bidirectional=True)
        self.attention = nn.Linear(2 * h_size, 1)
        self.classifier = nn.Linear(2 * h_size, output_dim)

    def forward(self, seq_data, seq_lens, state=None):
        lstm_out, _ = self.lstm(seq_data)
        attention_weights = torch.softmax(self.attention(lstm_out), dim=0)
        context_vector = torch.sum(attention_weights * lstm_out, dim=0)
        output = self.classifier(context_vector)
        return output

    def predict_all_timesteps(self, seq_data, seq_lens, state=None):
        # seq_data: [seq_len, batch_size, input_size]
        batch_size = seq_data.size(1)
        if state is None:
            lstm_out, (h_n, c_n) = self.lstm(seq_data)
        else:
            h_n, c_n = state
            lstm_out, (h_n, c_n) = self.lstm(seq_data, (h_n, c_n))
        logits_all = self.classifier(lstm_out)  # [seq_len, batch_size, output_dim]
        pred_all = torch.argmax(logits_all, dim=2)  # [seq_len, batch_size]
        ts_predictions = []
        for i in range(batch_size):
            ts_predictions.append(pred_all[:seq_lens[0, i].item(), i].cpu().numpy())
        return ts_predictions, (h_n, c_n)

In [12]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
name = 'pysentimiento/robertuito-sentiment-analysis'
tokenizer = AutoTokenizer.from_pretrained(name)
model = AutoModel.from_pretrained(name)

Some weights of RobertaModel were not initialized from the model checkpoint at pysentimiento/robertuito-sentiment-analysis and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [13]:
# Hàm lấy embedding
def get_cls_embeddings(messages, model, tokenizer, device, m_length=96):
    model.to(device)
    model.eval()
    with torch.no_grad():
        inputs = tokenizer(messages, padding=True, truncation=True, max_length=m_length, return_tensors='pt')
        inputs = {k: v.to(device) for k, v in inputs.items()}
        output = model(**inputs)
        embeddings = output.last_hidden_state[:, 0, :]
    return embeddings

In [None]:
def process_round_task1(data, embedding_func, rnn, device, state=None):
    messages_by_subject = defaultdict(list)
    for x in data:
        messages_by_subject[x['nick']].append(process_data(x['message']))
    
    subjects = list(messages_by_subject.keys())
    batch_size = len(subjects)
    
    if state is None:
        h_n = torch.zeros(2, batch_size, rnn.h_size, dtype=torch.float32)
        c_n = torch.zeros(2, batch_size, rnn.h_size, dtype=torch.float32)
        state = (h_n, c_n)
    else:
        h_n, c_n = state
        current_batch_size = h_n.size(1)
        if current_batch_size != batch_size:
            if current_batch_size < batch_size:
                h_n = torch.cat([h_n, torch.zeros(2, batch_size - current_batch_size, rnn.h_size)], dim=1)
                c_n = torch.cat([c_n, torch.zeros(2, batch_size - current_batch_size, rnn.h_size)], dim=1)
            else:
                h_n = h_n[:, :batch_size, :]
                c_n = c_n[:, :batch_size, :]
    
    predictions = {}
    h_new, c_new = [], []
    for i, subject in enumerate(subjects):
        messages = messages_by_subject[subject]
        embeddings = embedding_func(messages, model, tokenizer, device).unsqueeze(1)  # [T, 1, 768]
        seq_lens = torch.tensor([[len(messages)]], dtype=torch.long).to(device)
        h_states = h_n[:, i:i+1, :].to(device)
        c_states = c_n[:, i:i+1, :].to(device)
        pred, (h, c) = rnn.predict_all_timesteps(embeddings, seq_lens, (h_states, c_states))
        pred_seq = np.atleast_1d(pred[0])
        if len(messages) == 1:
            # Nếu chỉ có một tin nhắn, trả về dự đoán trực tiếp
            predictions[subject] = int(pred_seq[0])
        else:
            # Nếu có nhiều tin nhắn, chọn nhãn đầu tiên khác 0
            idxs = np.nonzero(pred_seq)[0]
            predictions[subject] = int(pred_seq[idxs[0]]) if len(idxs) > 0 else 0
        h_new.append(h.cpu())
        c_new.append(c.cpu())
    
    h_new = torch.cat(h_new, dim=1)
    c_new = torch.cat(c_new, dim=1)
    return predictions, (h_new, c_new)

In [153]:

TYPE_MAP = {0: 'betting', 1: 'onlinegaming', 2: 'trading', 3: 'lootboxes'}
def process_round_task2(data, embedding_func, rnn, bert_model, bert_tokenizer, device, state=None):
    from collections import defaultdict
    import torch
    import numpy as np
    
    messages_by_subject = defaultdict(list)
    for x in data:
        messages_by_subject[x['nick']].append(process_data(x['message']))
    
    subjects = list(messages_by_subject.keys())
    batch_size = len(subjects)
    
    if state is None:
        h_n = torch.zeros(2, batch_size, rnn.h_size, dtype=torch.float32)
        c_n = torch.zeros(2, batch_size, rnn.h_size, dtype=torch.float32)
        state = (h_n, c_n)
    else:
        h_n, c_n = state
        current_batch_size = h_n.size(1)
        if current_batch_size != batch_size:
            if current_batch_size < batch_size:
                h_n = torch.cat([h_n, torch.zeros(2, batch_size - current_batch_size, rnn.h_size)], dim=1)
                c_n = torch.cat([c_n, torch.zeros(2, batch_size - current_batch_size, rnn.h_size)], dim=1)
            else:
                h_n = h_n[:, :batch_size, :]
                c_n = c_n[:, :batch_size, :]
    
    predictions = {}
    type_predictions = {}
    h_new, c_new = [], []
    for i, subject in enumerate(subjects):
        messages = messages_by_subject[subject]
        # Dự đoán rủi ro (predictions) bằng LSTMClassifier
        embeddings = embedding_func(messages, model, tokenizer, device).unsqueeze(1)  # [T, 1, 768]
        seq_lens = torch.tensor([[len(messages)]], dtype=torch.long).to(device)
        h_states = h_n[:, i:i+1, :].to(device)
        c_states = c_n[:, i:i+1, :].to(device)
        pred, (h, c) = rnn.predict_all_timesteps(embeddings, seq_lens, (h_states, c_states))
        pred_seq = np.atleast_1d(pred[0])
        idxs = np.nonzero(pred_seq)[0]
        predictions[subject] = int(pred_seq[idxs[0]]) if len(idxs) > 0 else 0
        h_new.append(h.cpu())
        c_new.append(c.cpu())
        
        # Dự đoán loại nghiện (types) bằng BERT, gộp tất cả tin nhắn
        bert_model.to(device)
        bert_model.eval()
        with torch.no_grad():
            combined_text = ' '.join(messages)
            inputs = bert_tokenizer(
                combined_text, padding=True, truncation=True, max_length=256, return_tensors='pt'
            )
            inputs = {k: v.to(device) for k, v in inputs.items()}
            outputs = bert_model(**inputs)
            logits = outputs.logits
            pred = torch.argmax(logits, dim=-1).cpu().numpy()[0]
            type_predictions[subject] = TYPE_MAP[pred]
    
    h_new = torch.cat(h_new, dim=1)
    c_new = torch.cat(c_new, dim=1)
    response = {
        'predictions': predictions,
        'types': type_predictions
    }
    return response, (h_new, c_new)

# Client Server
This class simulates communication with our server. The following code established the conection with the server client and simulate the GET and POST requests.

**IMPORTANT NOTE:** Please pay attention to the basic functions and remember that it is only a base for your system.

In [17]:
from datetime import datetime, timedelta

In [None]:
class Client_task1_2:
    def __init__(self, task: str, token: str, rnn_models, embedding_func, bert_model, bert_tokenizer, submit_to_server: bool = False):
        self.task = task
        self.token = token
        self.rnn_models = rnn_models
        self.embedding_func = embedding_func
        self.bert_model = bert_model
        self.bert_tokenizer = bert_tokenizer
        self.submit_to_server = submit_to_server
        self.relevant_cols = [
            'duration', 'emissions', 'cpu_energy', 'gpu_energy', 'ram_energy',
            'energy_consumed', 'cpu_count', 'gpu_count', 'cpu_model', 'gpu_model',
            'ram_total_size', 'country_iso_code'
        ]
        self.url_base = "http://s3-ceatic.ujaen.es:8036"
        self.ENDPOINT_GET_MESSAGES = f"{self.url_base}/{{TASK}}/getmessages/{{TOKEN}}"
        self.ENDPOINT_SUBMIT_DECISIONS = f"{self.url_base}/{{TASK}}/submit/{{TOKEN}}/{{RUN}}"
        self.all_messages_task1 = []
        self.all_messages_task2 = []
        self.states_task1 = [None, None, None]
        self.states_task2 = [None, None, None]
        self.current_messages_task1 = []
        self.current_messages_task2 = []

    def get_messages_task1(self, retries: int, backoff: float) -> list:
        """GET messages for Task 1 for the current run."""
        session = requests.Session()
        retries = Retry(total=retries, backoff_factor=backoff, status_forcelist=[500, 502, 503, 504])
        session.mount('http://', HTTPAdapter(max_retries=retries))
        response = session.get(self.ENDPOINT_GET_MESSAGES.format(TASK="task1", TOKEN=self.token))
        
        if response.status_code != 200:
            print(f"GET - Task 1 - Status Code {response.status_code} - Error: {response.text}")
            self.current_messages_task1 = []
        else:
            self.current_messages_task1 = json.loads(response.content)
            self.all_messages_task1.extend(self.current_messages_task1)
            os.makedirs('./data/run0/task1', exist_ok=True)
            with open(f'./data/run0/task1/messages.json', 'w+', encoding='utf8') as json_file:
                json.dump(self.current_messages_task1, json_file, ensure_ascii=False)
            print(f"Task 1: Retrieved {len(self.current_messages_task1)} messages")
        
        return self.current_messages_task1

    def get_messages_task2(self, retries: int, backoff: float) -> list:
        """GET messages for Task 2 for the current run."""
        session = requests.Session()
        retries = Retry(total=retries, backoff_factor=backoff, status_forcelist=[500, 502, 503, 504])
        session.mount('http://', HTTPAdapter(max_retries=retries))
        response = session.get(self.ENDPOINT_GET_MESSAGES.format(TASK="task2", TOKEN=self.token))
        
        if response.status_code != 200:
            print(f"GET - Task 2 - Status Code {response.status_code} - Error: {response.text}")
            self.current_messages_task2 = []
        else:
            self.current_messages_task2 = json.loads(response.content)
            self.all_messages_task2.extend(self.current_messages_task2)
            os.makedirs('./data/run0/task2', exist_ok=True)
            with open(f'./data/run0/task2/messages.json', 'w+', encoding='utf8') as json_file:
                json.dump(self.current_messages_task2, json_file, ensure_ascii=False)
            print(f"Task 2: Retrieved {len(self.current_messages_task2)} messages")
        
        return self.current_messages_task2

    def predict_and_submit_task1(self, run: int, retries: int, backoff: float, device):
        if run not in [0, 1, 2]:
            print(f"Invalid run number: {run}. Must be 0, 1, or 2.")
            return
        if not self.current_messages_task1:
            print(f"Run {run}: No messages to process for Task 1")
            return
        print(f"------------------- Processing Task 1, Run {run}")
        
        with EmissionsTracker(
            save_to_file=True, log_level="WARNING", tracking_mode="process",
            output_dir=".", output_file=f"emissions_task1_run{run}.csv"
        ) as tracker:
            predictions_task1, self.states_task1[run] = process_round_task1(
                data=self.all_messages_task1, embedding_func=self.embedding_func, 
                rnn=self.rnn_models[run], device=device, state=self.states_task1[run]
            )

        try:
            emissions_file = f"emissions.csv"
            if os.path.exists(emissions_file) and os.path.getsize(emissions_file) > 0:
                df = pd.read_csv(emissions_file)
                if not df.empty:
                    measurements_task1 = df.iloc[-1][self.relevant_cols].to_dict()
                else:
                    print(f"Run {run}: Warning: {emissions_file} is empty, using default emissions")
                    measurements_task1 = {col: 0.0 for col in self.relevant_cols}
            else:
                print(f"Run {run}: Warning: {emissions_file} not found or empty, using default emissions")
                measurements_task1 = {col: 0.0 for col in self.relevant_cols}
        except Exception as e:
            print(f"Run {run}: Error reading {emissions_file}: {e}, using default emissions")
            measurements_task1 = {col: 0.0 for col in self.relevant_cols}

        data_run = {
            "predictions": predictions_task1,
            "emissions": measurements_task1
        }
        print(f"\nData to be sent for Task 1 - Run {run}:")
        data = [json.dumps(data_run)]
        print(data)

        if self.submit_to_server:
            session = requests.Session()
            retries = Retry(total=retries, backoff_factor=backoff, status_forcelist=[500, 502, 503, 504])
            session.mount('http://', HTTPAdapter(max_retries=retries))
            endpoint = self.ENDPOINT_SUBMIT_DECISIONS.format(TASK="task1", TOKEN=self.token, RUN=run)
            print(f"URL to be sent: {endpoint}")
            try:
                response = session.post(endpoint, json=[data[0]])
                print(f"POST - Task 1 - Run {run} - Status Code {response.status_code} - Message: {response.text}")
            except Exception as e:
                print(f"POST - Task 1 - Run {run} - Error: {str(e)}")
        else:
            print(f"Submit to server is disabled. Data saved locally.")

        os.makedirs('./data/preds/task1', exist_ok=True)
        file_path = f'./data/preds/task1/run{run}.json'
        existing_data = []
        if os.path.exists(file_path):
            with open(file_path, 'r', encoding='utf8') as f:
                existing_data = json.load(f)
        existing_data.append(data[0])
        with open(file_path, 'w+', encoding='utf8') as f:
            json.dump(existing_data, f, ensure_ascii=False, indent=2)
        print(f"Saved Task 1 - Run {run} to {file_path}")

    def predict_and_submit_task2(self, run: int, retries: int, backoff: float, device):
        if run not in [0, 1, 2]:
            print(f"Invalid run number: {run}. Must be 0, 1, or 2.")
            return
        if not self.current_messages_task2:
            print(f"Run {run}: No messages to process for Task 2")
            return
        print(f"------------------- Processing Task 2, Run {run}")
        
        with EmissionsTracker(
            save_to_file=True, log_level="WARNING", tracking_mode="process",
            output_dir=".", output_file=f"emissions_task2_run{run}.csv"
        ) as tracker:
            predictions_task2, self.states_task2[run] = process_round_task2(
                data=self.all_messages_task2, embedding_func=self.embedding_func, 
                rnn=self.rnn_models[run], bert_model=self.bert_model, 
                bert_tokenizer=self.bert_tokenizer, device=device, state=self.states_task2[run]
            )

        try:
            emissions_file = f"emissions.csv"
            if os.path.exists(emissions_file) and os.path.getsize(emissions_file) > 0:
                df = pd.read_csv(emissions_file)
                if not df.empty:
                    measurements_task2 = df.iloc[-1][self.relevant_cols].to_dict()
                else:
                    print(f"Run {run}: Warning: {emissions_file} is empty, using default emissions")
                    measurements_task2 = {col: 0.0 for col in self.relevant_cols}
            else:
                print(f"Run {run}: Warning: {emissions_file} not found or empty, using default emissions")
                measurements_task2 = {col: 0.0 for col in self.relevant_cols}
        except Exception as e:
            print(f"Run {run}: Error reading {emissions_file}: {e}, using default emissions")
            measurements_task2 = {col: 0.0 for col in self.relevant_cols}


        data_run = {
            "predictions": predictions_task2['predictions'],
            "types": predictions_task2['types'],
            "emissions": measurements_task2
        }
        print(f"\nData to be sent for Task 2 - Run {run}:")
        data = [json.dumps(data_run)]
        print(data)

        if self.submit_to_server:
            session = requests.Session()
            retries = Retry(total=retries, backoff_factor=backoff, status_forcelist=[500, 502, 503, 504])
            session.mount('http://', HTTPAdapter(max_retries=retries))
            endpoint = self.ENDPOINT_SUBMIT_DECISIONS.format(TASK="task2", TOKEN=self.token, RUN=run)
            print(f"URL to be sent: {endpoint}")
            try:
                response = session.post(endpoint, json=[data[0]])
                print(f"POST - Task 2 - Run {run} - Status Code {response.status_code} - Message: {response.text}")
            except Exception as e:
                print(f"POST - Task 2 - Run {run} - Error: {str(e)}")
        else:
            print(f"Submit to server is disabled. Data saved locally.")

        os.makedirs('./data/preds/task2', exist_ok=True)
        file_path = f'./data/preds/task2/run{run}.json'
        existing_data = []
        if os.path.exists(file_path):
            with open(file_path, 'r', encoding='utf8') as f:
                existing_data = json.load(f)
        existing_data.append(data[0])
        with open(file_path, 'w+', encoding='utf8') as f:
            json.dump(existing_data, f, ensure_ascii=False, indent=2)
        print(f"Saved Task 2 - Run {run} to {file_path}")

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
name = 'pysentimiento/robertuito-sentiment-analysis'
tokenizer = AutoTokenizer.from_pretrained(name)
model = AutoModel.from_pretrained(name)

# Tải mô hình BERT cho Task 2
bert_checkpoint = './model_1'  
bert_tokenizer = AutoTokenizer.from_pretrained("PlanTL-GOB-ES/roberta-base-bne")
bert_model = AutoModelForSequenceClassification.from_pretrained(bert_checkpoint, num_labels=4)
bert_model.eval()

# Khởi tạo RNN models
task_rnn_1 = LSTMClassifier(768, 64, 2)
task_rnn_1.load_state_dict(torch.load('pre_trained_models_task1/processed_data_h_64_bs_2_0.95_0.05/net_params.pt'))
task_rnn_1.eval()
task_rnn_1.to(device)

task_rnn_2 = LSTMClassifier(768, 32, 2)
task_rnn_2.load_state_dict(torch.load('pre_trained_models_old/processed_data_h_32_bs_2_0.95_0.05/net_params.pt'))
task_rnn_2.eval()
task_rnn_2.to(device)

task_rnn_3 = LSTMClassifier(768, 32, 2)
task_rnn_3.load_state_dict(torch.load('pre_trained_models_task1/processed_data_h_32_bs_2_0.95_0.05/net_params.pt'))
task_rnn_3.eval()
task_rnn_3.to(device)



Some weights of RobertaModel were not initialized from the model checkpoint at pysentimiento/robertuito-sentiment-analysis and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


LSTMClassifier(
  (lstm): LSTM(768, 32, bidirectional=True)
  (attention): Linear(in_features=64, out_features=1, bias=True)
  (classifier): Linear(in_features=64, out_features=2, bias=True)
)

In [None]:
client = Client_task1_2(rnn_models=[task_rnn_1, task_rnn_2, task_rnn_3], embedding_func=get_cls_embeddings, bert_model=bert_model, bert_tokenizer=bert_tokenizer, token='your_token_here', submit_to_server=True)

In [None]:
rnn_models = [task_rnn_1, task_rnn_2, task_rnn_3]
client.get_messages_task2(retries=5, backoff=0.1)
client.predict_and_submit_task2(
    run=0, retries=5, backoff=0.1, device=device)


[codecarbon ERROR @ 16:44:26] Error: Another instance of codecarbon is probably running as we find `C:\Users\PHUCNG~1\AppData\Local\Temp\.codecarbon.lock`. Turn off the other instance to be able to run this one or use `allow_multiple_runs` or delete the file. Exiting.


Task 2: Retrieved 160 messages
------------------- Processing Task 2, Run 0





Data to be sent for Task 2 - Run 0:
['{"predictions": {"user4550": 0, "user4554": 0, "user4808": 0, "user482": 0, "user485": 0, "user5158": 0, "user531": 0, "user546": 0, "user557": 0, "user6190": 0, "user6463": 1, "user6995": 0, "user7135": 0, "user7194": 0, "user997": 0, "user4174": 0, "user4203": 0, "user4218": 0, "user4222": 0, "user4333": 0, "user4545": 0, "user4581": 0, "user4840": 0, "user5188": 0, "user5354": 0, "user5913": 0, "user5975": 0, "user6062": 0, "user6128": 0, "user6576": 0, "user6927": 0, "user6937": 0, "user7061": 0, "user7144": 0, "user7244": 0, "user18721": 0, "user18730": 1, "user18759": 0, "user18891": 0, "user18944": 0, "user19224": 0, "user19263": 0, "user19444": 0, "user20483": 0, "user20639": 0, "user21289": 0, "user21294": 0, "user21422": 0, "user22380": 0, "user23816": 0, "user24387": 0, "user24513": 0, "user27647": 0, "user27670": 0, "user28298": 0, "user19643": 0, "user19737": 0, "user19767": 0, "user19891": 0, "user19956": 0, "user19982": 0, "user2007

In [132]:
print(client.all_messages_task1)

[{'id_message': 83834259853, 'nick': 'user4550', 'round': 1, 'message': 'Pues no sé', 'platform': 'Telegram', 'date': '2019-07-06 20:03:29+01:00'}, {'id_message': 40146258029, 'nick': 'user4554', 'round': 1, 'message': 'A más de 10 corners .', 'platform': 'Telegram', 'date': '2019-07-21 23:15:09+01:00'}, {'id_message': 64270929250, 'nick': 'user4808', 'round': 1, 'message': 'veis esto ?', 'platform': 'Telegram', 'date': '2020-07-11 15:33:24+01:00'}, {'id_message': 85647408793, 'nick': 'user482', 'round': 1, 'message': 'Venga coño , a ver si tiran jajajajaja', 'platform': 'Telegram', 'date': '2020-08-20 22:29:21+01:00'}, {'id_message': 84459180089, 'nick': 'user485', 'round': 1, 'message': 'Gracias', 'platform': 'Telegram', 'date': '2020-07-18 13:50:01+01:00'}, {'id_message': 24023936936, 'nick': 'user5158', 'round': 1, 'message': 'Si la paso ya seguro alguno dice es baja 😊 😊', 'platform': 'Telegram', 'date': '2020-12-24 14:12:58+01:00'}, {'id_message': 35088806108, 'nick': 'user531', '

In [130]:
client.get_messages_task1(retries=5, backoff=0.1)

Task 1: Retrieved 160 messages


[{'id_message': 83834259853,
  'nick': 'user4550',
  'round': 1,
  'message': 'Pues no sé',
  'platform': 'Telegram',
  'date': '2019-07-06 20:03:29+01:00'},
 {'id_message': 40146258029,
  'nick': 'user4554',
  'round': 1,
  'message': 'A más de 10 corners .',
  'platform': 'Telegram',
  'date': '2019-07-21 23:15:09+01:00'},
 {'id_message': 64270929250,
  'nick': 'user4808',
  'round': 1,
  'message': 'veis esto ?',
  'platform': 'Telegram',
  'date': '2020-07-11 15:33:24+01:00'},
 {'id_message': 85647408793,
  'nick': 'user482',
  'round': 1,
  'message': 'Venga coño , a ver si tiran jajajajaja',
  'platform': 'Telegram',
  'date': '2020-08-20 22:29:21+01:00'},
 {'id_message': 84459180089,
  'nick': 'user485',
  'round': 1,
  'message': 'Gracias',
  'platform': 'Telegram',
  'date': '2020-07-18 13:50:01+01:00'},
 {'id_message': 24023936936,
  'nick': 'user5158',
  'round': 1,
  'message': 'Si la paso ya seguro alguno dice es baja 😊 😊',
  'platform': 'Telegram',
  'date': '2020-12-24 1