# Parte integrativa NLP
## Validazione dei modelli all'interno dei feed

Questo notebook è un'estensione del precedente lavoro di NLP e Social Media Analysis  dove è stato affrontato il task di classificazione dei post pubblicati dagli utenti nella Home di Bluesky (al di fuori dei feed) utilizzando due modelli di linguaggio, LLaMA e Qwen.  

In questa sezione aggiuntiva, l'obiettivo è valutare le prestazioni dei modelli sui post pubblicati all'interno dei feed di Bluesky, di cui conosciamo già il corretto assegnamento al rispettivo feed.dai feed.


<span style="font-size:1.1em;">**Configuzarioni**</span>  

In [1]:
import requests
import json
import os
import time
import subprocess
import threading
from tqdm import tqdm
import pandas as pd
import random

In [2]:
# Installazione di ollama
!curl -fsSL https://ollama.com/install.sh | sh

>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
######################################################################## 100.0%                                    33.9%                                      37.4%#####################################                     73.8%#####################################                     74.0%####################################################          89.4%
>>> Creating ollama user...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.


In [3]:
# Crazione del thread demone che avvia il server locale per il servzio ollama
t = threading.Thread(target=lambda: subprocess.run(["ollama", "serve"]), daemon=True)
t.start()

Couldn't find '/root/.ollama/id_ed25519'. Generating new private key.
Your new public key is: 

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIHNPiGnA7ovIjIhL/2Yj51b5yiNbnIecjzirQeh56Zej



2024/11/23 20:42:53 routes.go:1197: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2024-11-23T20:42:53.869Z level=INFO source=images.go:753

### Selezione dei post nei feed

Il primo passo è stato selezionare i post provenienti dai feed di Bluesky, che saranno successivamente elaborati dai modelli. A tal fine, abbiamo utilizzato il dataset originale, che contiene una cartella denominata `feed_posts`, in cui ogni file rappresenta un feed e contiene i post associati ad esso.

Per garantire un minimo livello di bilanciamento tra le categorie di post, si è imposto un limite sul numero massimo di post che potevano essere selezionati per ciascun feed, in modo da evitare di avere classi con un numero eccessivo di post. Il limite è stato post pari a 4 \* il numero di post presenti nel feed con la quantità minore. Successivamente, per i feed che contavano un numero di post inferiore o uguale a questo limite, sono stati selezionati tutti i post disponibili al loro interno. Per i feed che superavano il limite, è stato effettuato un campionamento casuale, selezionando un numero di post pari al massimo consentito.

Inoltre, ogni post selezionato è stato arricchito con un attributo che ne specifica il feed di appartenenza, identificato tramite il nome del file che contiene il post.

Infine, i post così selezionati sonno stati salvati nel file `feed_posts.jsonl`, che sarà utilizzato per le fasi successive.


**Sommario input e output**:  
**Input**: Cartella feed_posts contenente i post originali per ciascun feed, disponibile in `/kaggle/input/bluesky-dataset/feed_posts`.  
**Output**: File `feed_posts.jsonl`,successivamente salvato nel dataset `feed_posts_for_validation` per un utilizzo futuro.

In [4]:
# Cartella contenente i file nei feed
feed_folder = '/kaggle/input/bluesky-dataset/feed_posts'
output_file = 'feed_posts.jsonl'


# Calcolo del numero di post presenti nel feed più piccolo
min_post = min(
    sum(1 for _ in open(os.path.join(feed_folder, filename), 'r'))
    for filename in os.listdir(feed_folder) if filename.endswith('.jsonl')
)


# Si impone una soglia sul numero massimo di post campionati dai feed per evitare classi sbilanciate
max_post = 4 * min_post


# Campionamento dei post nei feed
feed_posts = []

for filename in os.listdir(feed_folder):

    filepath = os.path.join(feed_folder, filename)
    with open(filepath, 'r') as file:
        
        # Lettura di tutti i post del file, escludendo quelli con text nullo o vuoto
        posts = [
            json.loads(line) for line in file
            if json.loads(line).get('text') is not None and json.loads(line).get('text').strip() != ""
        ]
        post_count = len(posts)

        # Se il numero di post è al più pari a max_post, si prendono tutti i post, altrimenti se ne campionano max_post
        if post_count <= max_post:
            selected_posts = list(posts)
        else:
            selected_posts = random.sample(list(posts), max_post)

        # Aggiungi il feed di appartenenza a ogni post
        for post in selected_posts:
            post['feed'] = os.path.splitext(filename)[0]

        feed_posts.extend(selected_posts)


# Rimozione dei duplicati, basandosi su 'post_id'
feed_posts = list({post['post_id']: post for post in feed_posts if post.get('post_id')}.values())

# Salva i post selezionati in un file
random.shuffle(feed_posts)
with open(output_file, 'w') as f:
    for post in feed_posts:
        f.write(json.dumps(post) + '\n')

print(f'Numero di post nei feed selezionati: {len(feed_posts)}')

Numero di post nei feed selezionati: 6193


#### Funzioni utili
Definiamo una serie di funzioni utili per facilitare il processo di generazione delle risposte:

- `ask_to_llm`: crea il prompt da sottomettere al modello e invia una richiesta all'API locale per generare una risposta in base al prompt.

In [5]:
def ask_to_llm(model, instruction, post, post_id=None):
         
    # Creazione del prompt che viene passato al modello     
    full_prompt = instruction + "Post: " + post
    #print(full_prompt)
    
    # Invio della richiesta e ottenimento della risposta
    response = requests.post('http://localhost:11434/api/generate', 
                             data=json.dumps({'model': model, 'prompt': full_prompt, 'stream': False}))
    
    return response.json().get('response')

- `generate_feed_assignments`: legge i post da processare dal file JSON in cui sono salvati, li passa al modello, ottiene la risposta del modello e salva i risultati su un nuovo file JSON. 

In [6]:
def generate_feed_assignments(post_filepath, model, instruction, results_dir, prompt_type):
    
    start_time = time.time()
    
    # Crea la directory in cui salvare i risultati se essa non esiste già
    if not os.path.isdir(results_dir):
        os.mkdir(results_dir)
    
    with open(post_filepath, 'r') as file:
        posts = [json.loads(line) for line in file]         
        
        results = [] 

        for i, post in enumerate(posts, start=1):
            post_id = post['post_id']
            record_text = post['text']             # Estae il testo del post
            
            # Controllo sui post vuoti: se il post ha testo vuoto lo si ignora
            if record_text is None or record_text.strip() == "":
                print(f"Skipping post ID {post_id} due to null text.")
                continue  

            # Chiama la funzione ask_to_llm per ottenere un'assegnazione di feed per il post corrente
            feed_assignment = ask_to_llm(model, instruction, record_text, post_id=post_id)
            
            # Memorizza il risultato
            results.append({
                'post_id': post_id,
                'text': record_text,
                'feed_assignment': feed_assignment
            })
            print(f"Post {i} assigned to feed: {feed_assignment} \n")
    
    # Salva i risultati ottenuti
    output_filepath = os.path.join(results_dir, f'{model}_{prompt_type}.json')
    with open(output_filepath, 'w') as f:
        json.dump(results, f, indent=4)

    # Stampa info sul tempo di esecuzione    
    end_time = time.time()
    duration = end_time - start_time
    print(f"Duration: {duration} seconds")

## Ottenimento delle classificazioni 

I post selezionati nella fase precedente sono stati trattati con lo stesso approccio applicato ai post fuori dai feed, ovvero provenienti dalla Home di Bluesky.   
Anche per questi post, sono stati utilizzati i modelli LLaMA e Qwen, impiegando le tecniche di prompting zeroshot, oneshot, twoshot e fewshot.

**Sommario input e output**:  \
**Input**: file `feed_posts.jsonl` contenente i post nei feed selezionati nella fase precedente, path: `/kaggle/input/feed-post-for-validation/feed_posts.jsonl`.   \
**Output**: ogni modello salva i risulati della classificazione in un file. Si ha un file per ogni modello e per ciascuna tecnica di prompting. Tutti tali file vengono infine salvati all'interno del dataset `feed_classification_results` per poterli utilizzare in seguito.

Avviamo i due modelli.

**LLAMA 3.1**

In [7]:
# Creiamo un thread demone per avvviare il modello llama3.1
!ollama pull llama3.1  
t2 = threading.Thread(target=lambda: subprocess.run(["ollama", "run", "llama3.1"]),daemon=True) # il thread avvierà il comando 'ollama run llama3.1'
t2.start()

[GIN] 2024/11/23 - 20:43:31 | 200 |      57.163µs |       127.0.0.1 | HEAD     "/"
[?25lpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest ⠦ [?25h[?25l[2K[1Gpulling manifest ⠧ [?25h[?25l[2K[1Gpulling manifest ⠇ [?25h[?25l[2K[1Gpulling manifest ⠏ [?25h[?25l[2K[1Gpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest 
pulling 8eeb52dfb3bb...   0% ▕                ▏    0 B/4.7 GB                  [?25h

time=2024-11-23T20:43:33.416Z level=INFO source=download.go:175 msg="downloading 8eeb52dfb3bb in 16 291 MB part(s)"


[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling 8eeb52dfb3bb...   0% ▕                ▏    0 B/4.7 GB                  [?25h[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling 8eeb52dfb3bb...   0% ▕                ▏    0 B/4.7 GB                  [?25h[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling 8eeb52dfb3bb...   0% ▕                ▏ 1.5 MB/4.7 GB                  [?25h[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling 8eeb52dfb3bb...   1% ▕                ▏  35 MB/4.7 GB                  [?25h[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling 8eeb52dfb3bb...   1% ▕                ▏  62 MB/4.7 GB                  [?25h[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling 8eeb52dfb3bb...   2% ▕                ▏  75 MB/4.7 GB                  [?25h[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling 8eeb52dfb3bb...   2% ▕                ▏ 106 MB/4.7 GB                  [?25h[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling 8eeb52dfb3bb...   3% ▕                ▏ 141 MB/4.7 GB

time=2024-11-23T20:43:50.896Z level=INFO source=download.go:175 msg="downloading 948af2743fc7 in 1 1.5 KB part(s)"


[?25l[2K[1G[A[2K[1G[A[2K[1Gpulling manifest 
pulling 8eeb52dfb3bb... 100% ▕████████████████▏ 4.7 GB                         
pulling 948af2743fc7...   0% ▕                ▏    0 B/1.5 KB                  [?25h[?25l[2K[1G[A[2K[1G[A[2K[1Gpulling manifest 
pulling 8eeb52dfb3bb... 100% ▕████████████████▏ 4.7 GB                         
pulling 948af2743fc7... 100% ▕████████████████▏ 1.5 KB                         [?25h[?25l[2K[1G[A[2K[1G[A[2K[1Gpulling manifest 
pulling 8eeb52dfb3bb... 100% ▕████████████████▏ 4.7 GB                         
pulling 948af2743fc7... 100% ▕████████████████▏ 1.5 KB                         [?25h[?25l[2K[1G[A[2K[1G[A[2K[1Gpulling manifest 
pulling 8eeb52dfb3bb... 100% ▕████████████████▏ 4.7 GB                         
pulling 948af2743fc7... 100% ▕████████████████▏ 1.5 KB                         [?25h[?25l[2K[1G[A[2K[1G[A[2K[1Gpulling manifest 
pulling 8eeb52dfb3bb... 100% ▕████████████████▏ 4.7 GB                

time=2024-11-23T20:43:54.788Z level=INFO source=download.go:175 msg="downloading 0ba8f0e314b4 in 1 12 KB part(s)"


[?25l[2K[1G[A[2K[1G[A[2K[1G[A[2K[1Gpulling manifest 
pulling 8eeb52dfb3bb... 100% ▕████████████████▏ 4.7 GB                         
pulling 948af2743fc7... 100% ▕████████████████▏ 1.5 KB                         
pulling 0ba8f0e314b4... 100% ▕████████████████▏  12 KB                         [?25h[?25l[2K[1G[A[2K[1G[A[2K[1G[A[2K[1Gpulling manifest 
pulling 8eeb52dfb3bb... 100% ▕████████████████▏ 4.7 GB                         
pulling 948af2743fc7... 100% ▕████████████████▏ 1.5 KB                         
pulling 0ba8f0e314b4... 100% ▕████████████████▏  12 KB                         [?25h[?25l[2K[1G[A[2K[1G[A[2K[1G[A[2K[1Gpulling manifest 
pulling 8eeb52dfb3bb... 100% ▕████████████████▏ 4.7 GB                         
pulling 948af2743fc7... 100% ▕████████████████▏ 1.5 KB                         
pulling 0ba8f0e314b4... 100% ▕████████████████▏  12 KB                         [?25h[?25l[2K[1G[A[2K[1G[A[2K[1G[A[2K[1Gpulling manifest 
pulli

time=2024-11-23T20:43:56.737Z level=INFO source=download.go:175 msg="downloading 56bb8bd477a5 in 1 96 B part(s)"


[?25l[2K[1G[A[2K[1G[A[2K[1G[A[2K[1G[A[2K[1Gpulling manifest 
pulling 8eeb52dfb3bb... 100% ▕████████████████▏ 4.7 GB                         
pulling 948af2743fc7... 100% ▕████████████████▏ 1.5 KB                         
pulling 0ba8f0e314b4... 100% ▕████████████████▏  12 KB                         
pulling 56bb8bd477a5... 100% ▕████████████████▏   96 B                         [?25h[?25l[2K[1G[A[2K[1G[A[2K[1G[A[2K[1G[A[2K[1Gpulling manifest 
pulling 8eeb52dfb3bb... 100% ▕████████████████▏ 4.7 GB                         
pulling 948af2743fc7... 100% ▕████████████████▏ 1.5 KB                         
pulling 0ba8f0e314b4... 100% ▕████████████████▏  12 KB                         
pulling 56bb8bd477a5... 100% ▕████████████████▏   96 B                         [?25h[?25l[2K[1G[A[2K[1G[A[2K[1G[A[2K[1G[A[2K[1Gpulling manifest 
pulling 8eeb52dfb3bb... 100% ▕████████████████▏ 4.7 GB                         
pulling 948af2743fc7... 100% ▕████████████

time=2024-11-23T20:43:58.639Z level=INFO source=download.go:175 msg="downloading 1a4c3c319823 in 1 485 B part(s)"


[?25l[2K[1G[A[2K[1G[A[2K[1G[A[2K[1G[A[2K[1G[A[2K[1Gpulling manifest 
pulling 8eeb52dfb3bb... 100% ▕████████████████▏ 4.7 GB                         
pulling 948af2743fc7... 100% ▕████████████████▏ 1.5 KB                         
pulling 0ba8f0e314b4... 100% ▕████████████████▏  12 KB                         
pulling 56bb8bd477a5... 100% ▕████████████████▏   96 B                         
pulling 1a4c3c319823... 100% ▕████████████████▏  485 B                         [?25h[?25l[2K[1G[A[2K[1G[A[2K[1G[A[2K[1G[A[2K[1G[A[2K[1Gpulling manifest 
pulling 8eeb52dfb3bb... 100% ▕████████████████▏ 4.7 GB                         
pulling 948af2743fc7... 100% ▕████████████████▏ 1.5 KB                         
pulling 0ba8f0e314b4... 100% ▕████████████████▏  12 KB                         
pulling 56bb8bd477a5... 100% ▕████████████████▏   96 B                         
pulling 1a4c3c319823... 100% ▕████████████████▏  485 B                         [?25h[?25l[2K[1G[

time=2024-11-23T20:44:17.457Z level=INFO source=server.go:105 msg="system memory" total="31.4 GiB" free="29.9 GiB" free_swap="0 B"
time=2024-11-23T20:44:17.458Z level=INFO source=memory.go:343 msg="offload to cpu" layers.requested=-1 layers.model=33 layers.offload=0 layers.split="" memory.available="[29.9 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.8 GiB" memory.required.partial="0 B" memory.required.kv="1.0 GiB" memory.required.allocations="[5.8 GiB]" memory.weights.total="4.7 GiB" memory.weights.repeating="4.3 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="560.0 MiB" memory.graph.partial="677.5 MiB"
time=2024-11-23T20:44:17.459Z level=INFO source=server.go:383 msg="starting llama server" cmd="/tmp/ollama1891697421/runners/cpu_avx2/ollama_llama_server --model /root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe --ctx-size 8192 --batch-size 512 --threads 2 --no-mmap --parallel 4 --port 33441"
time=2024-11-23T20:

[GIN] 2024/11/23 - 20:44:24 | 200 |  7.120452677s |       127.0.0.1 | POST     "/api/generate"


**QWEN 2.5**

In [8]:
# Creiamo un altro thread demone per avvviare il modello qwen2.5
!ollama pull qwen2.5
t3 = threading.Thread(target=lambda: subprocess.run(["ollama", "run", "qwen2.5"]),daemon=True)
t3.start()

[GIN] 2024/11/23 - 20:44:35 | 200 |      40.522µs |       127.0.0.1 | HEAD     "/"
[?25lpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest ⠦ [?25h[?25l[2K[1Gpulling manifest ⠧ [?25h[?25l[2K[1Gpulling manifest ⠇ [?25h[?25l[2K[1Gpulling manifest ⠏ [?25h[?25l[2K[1Gpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest 
pulling 2bada8a74506...   0% ▕                ▏    0 B/4.7 GB                  [?25h

time=2024-11-23T20:44:36.563Z level=INFO source=download.go:175 msg="downloading 2bada8a74506 in 16 292 MB part(s)"


[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling 2bada8a74506...   0% ▕                ▏    0 B/4.7 GB                  [?25h[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling 2bada8a74506...   0% ▕                ▏  23 MB/4.7 GB                  [?25h[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling 2bada8a74506...   1% ▕                ▏  40 MB/4.7 GB                  [?25h[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling 2bada8a74506...   2% ▕                ▏  94 MB/4.7 GB                  [?25h[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling 2bada8a74506...   3% ▕                ▏ 142 MB/4.7 GB                  [?25h[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling 2bada8a74506...   4% ▕                ▏ 171 MB/4.7 GB                  [?25h[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling 2bada8a74506...   5% ▕                ▏ 225 MB/4.7 GB                  [?25h[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling 2bada8a74506...   6% ▕                ▏ 275 MB/4.7 GB

time=2024-11-23T20:45:10.472Z level=INFO source=download.go:175 msg="downloading 66b9ea09bd5b in 1 68 B part(s)"


[?25l[2K[1G[A[2K[1G[A[2K[1Gpulling manifest 
pulling 2bada8a74506... 100% ▕████████████████▏ 4.7 GB                         
pulling 66b9ea09bd5b... 100% ▕████████████████▏   68 B                         [?25h[?25l[2K[1G[A[2K[1G[A[2K[1Gpulling manifest 
pulling 2bada8a74506... 100% ▕████████████████▏ 4.7 GB                         
pulling 66b9ea09bd5b... 100% ▕████████████████▏   68 B                         [?25h[?25l[2K[1G[A[2K[1G[A[2K[1Gpulling manifest 
pulling 2bada8a74506... 100% ▕████████████████▏ 4.7 GB                         
pulling 66b9ea09bd5b... 100% ▕████████████████▏   68 B                         [?25h[?25l[2K[1G[A[2K[1G[A[2K[1Gpulling manifest 
pulling 2bada8a74506... 100% ▕████████████████▏ 4.7 GB                         
pulling 66b9ea09bd5b... 100% ▕████████████████▏   68 B                         [?25h[?25l[2K[1G[A[2K[1G[A[2K[1Gpulling manifest 
pulling 2bada8a74506... 100% ▕████████████████▏ 4.7 GB                

time=2024-11-23T20:45:12.384Z level=INFO source=download.go:175 msg="downloading eb4402837c78 in 1 1.5 KB part(s)"


[?25l[2K[1G[A[2K[1G[A[2K[1G[A[2K[1Gpulling manifest 
pulling 2bada8a74506... 100% ▕████████████████▏ 4.7 GB                         
pulling 66b9ea09bd5b... 100% ▕████████████████▏   68 B                         
pulling eb4402837c78... 100% ▕████████████████▏ 1.5 KB                         [?25h[?25l[2K[1G[A[2K[1G[A[2K[1G[A[2K[1Gpulling manifest 
pulling 2bada8a74506... 100% ▕████████████████▏ 4.7 GB                         
pulling 66b9ea09bd5b... 100% ▕████████████████▏   68 B                         
pulling eb4402837c78... 100% ▕████████████████▏ 1.5 KB                         [?25h[?25l[2K[1G[A[2K[1G[A[2K[1G[A[2K[1Gpulling manifest 
pulling 2bada8a74506... 100% ▕████████████████▏ 4.7 GB                         
pulling 66b9ea09bd5b... 100% ▕████████████████▏   68 B                         
pulling eb4402837c78... 100% ▕████████████████▏ 1.5 KB                         [?25h[?25l[2K[1G[A[2K[1G[A[2K[1G[A[2K[1Gpulling manifest 
pulli

time=2024-11-23T20:45:14.298Z level=INFO source=download.go:175 msg="downloading 832dd9e00a68 in 1 11 KB part(s)"


[?25l[2K[1G[A[2K[1G[A[2K[1G[A[2K[1G[A[2K[1Gpulling manifest 
pulling 2bada8a74506... 100% ▕████████████████▏ 4.7 GB                         
pulling 66b9ea09bd5b... 100% ▕████████████████▏   68 B                         
pulling eb4402837c78... 100% ▕████████████████▏ 1.5 KB                         
pulling 832dd9e00a68... 100% ▕████████████████▏  11 KB                         [?25h[?25l[2K[1G[A[2K[1G[A[2K[1G[A[2K[1G[A[2K[1Gpulling manifest 
pulling 2bada8a74506... 100% ▕████████████████▏ 4.7 GB                         
pulling 66b9ea09bd5b... 100% ▕████████████████▏   68 B                         
pulling eb4402837c78... 100% ▕████████████████▏ 1.5 KB                         
pulling 832dd9e00a68... 100% ▕████████████████▏  11 KB                         [?25h[?25l[2K[1G[A[2K[1G[A[2K[1G[A[2K[1G[A[2K[1Gpulling manifest 
pulling 2bada8a74506... 100% ▕████████████████▏ 4.7 GB                         
pulling 66b9ea09bd5b... 100% ▕████████████

time=2024-11-23T20:45:16.288Z level=INFO source=download.go:175 msg="downloading 2f15b3218f05 in 1 487 B part(s)"


[?25l[2K[1G[A[2K[1G[A[2K[1G[A[2K[1G[A[2K[1G[A[2K[1Gpulling manifest 
pulling 2bada8a74506... 100% ▕████████████████▏ 4.7 GB                         
pulling 66b9ea09bd5b... 100% ▕████████████████▏   68 B                         
pulling eb4402837c78... 100% ▕████████████████▏ 1.5 KB                         
pulling 832dd9e00a68... 100% ▕████████████████▏  11 KB                         
pulling 2f15b3218f05... 100% ▕████████████████▏  487 B                         [?25h[?25l[2K[1G[A[2K[1G[A[2K[1G[A[2K[1G[A[2K[1G[A[2K[1Gpulling manifest 
pulling 2bada8a74506... 100% ▕████████████████▏ 4.7 GB                         
pulling 66b9ea09bd5b... 100% ▕████████████████▏   68 B                         
pulling eb4402837c78... 100% ▕████████████████▏ 1.5 KB                         
pulling 832dd9e00a68... 100% ▕████████████████▏  11 KB                         
pulling 2f15b3218f05... 100% ▕████████████████▏  487 B                         [?25h[?25l[2K[1G[

time=2024-11-23T20:45:34.984Z level=INFO source=server.go:105 msg="system memory" total="31.4 GiB" free="24.4 GiB" free_swap="0 B"
time=2024-11-23T20:45:34.985Z level=INFO source=memory.go:343 msg="offload to cpu" layers.requested=-1 layers.model=29 layers.offload=0 layers.split="" memory.available="[24.4 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="448.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="4.1 GiB" memory.weights.repeating="3.7 GiB" memory.weights.nonrepeating="426.4 MiB" memory.graph.full="478.0 MiB" memory.graph.partial="730.4 MiB"
time=2024-11-23T20:45:34.986Z level=INFO source=server.go:383 msg="starting llama server" cmd="/tmp/ollama1891697421/runners/cpu_avx2/ollama_llama_server --model /root/.ollama/models/blobs/sha256-2bada8a7450677000f678be90653b85d364de7db25eb5ea54136ada5f3933730 --ctx-size 8192 --batch-size 512 --threads 2 --no-mmap --parallel 4 --port 36883"
time=2024-11-23T2

[GIN] 2024/11/23 - 20:45:40 | 200 |  5.856571692s |       127.0.0.1 | POST     "/api/generate"


In [9]:
# Creiamo la cartella dei risultati per i due modelli
os.mkdir('/kaggle/working/llama3.1')
os.mkdir('/kaggle/working/qwen2.5')

result_dir_llama = '/kaggle/working/llama3.1'
result_dir_qwen = '/kaggle/working/qwen2.5'

In [10]:
# Path del file contenete i post interni ai feed selezionati
post_filepath = "/kaggle/input/feed-post-for-validation/feed_posts.jsonl"

### Zero-shot prompting  
Analizziamo il comportamento dei modelli utilizzando lo zero-shot prompting. 
Forniamo ai modelli un prompt che richiede di classificare il post ricevuto in input in uno dei feed, di cui viene fornita anche una descrizione.

In [11]:
instruction_zeroshot = """You are a classifier. Assign this post to exactly one of the following 11 topics based on their descriptions:
"AcademicSky": Posts related to academia, academic discussions, academic jobs, higher education, scholarships, scientific research, and university.
"Blacksky": Amplifying the voices of any and all Black users.
"BookSky": A feed for anyone who likes reading and books.
"Disability": Posts discussing disability, accessibility, disability rights, or issues related to disability.
"Game Dev": Posts about all aspects of game development.
"GreenSky": A big list of climate accounts, filtered loosely for keywords.
"News": Headlines from verified news organisations.
"Political Science": A feed for political science and international relations research and discussion.
"Science": The Science Feed. A curated feed from Bluesky professional scientists and science communicators.
"UkrainianView": Posts from Ukrainians about Ukraine and their experience during the war.
"Whats History": Posts by historians using :cardfilebox: or skystorians.

If the post does not fit any of these topics, respond with "Unknown". Your response must be only the topic name, without any additional text."""

In [12]:
# Processiamo i post
generate_feed_assignments(post_filepath, "llama3.1", instruction_zeroshot, result_dir_llama, "zeroshot")

llama_model_loader: loaded meta data with 29 key-value pairs and 292 tensors from /root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Meta Llama 3.1 8B Instruct
llama_model_loader: - kv   3:                           general.finetune str              = Instruct
llama_model_loader: - kv   4:                           general.basename str              = Meta-Llama-3.1
llama_model_loader: - kv   5:                         general.size_label str              = 8B
llama_model_loader: - kv   6:                            general.license str

[GIN] 2024/11/23 - 20:47:11 | 200 |         1m17s |       127.0.0.1 | POST     "/api/generate"
Post 1 assigned to feed: UkrainianView 

[GIN] 2024/11/23 - 20:47:19 | 200 |  8.080372669s |       127.0.0.1 | POST     "/api/generate"
Post 2 assigned to feed: Science 

[GIN] 2024/11/23 - 20:47:29 | 200 | 10.317453912s |       127.0.0.1 | POST     "/api/generate"
Post 3 assigned to feed: Whats History 

[GIN] 2024/11/23 - 20:47:50 | 200 | 20.910179533s |       127.0.0.1 | POST     "/api/generate"
Post 4 assigned to feed: AcademicSky 

[GIN] 2024/11/23 - 20:48:15 | 200 | 24.421021413s |       127.0.0.1 | POST     "/api/generate"
Post 5 assigned to feed: Whats History 

[GIN] 2024/11/23 - 20:48:25 | 200 |  9.946379931s |       127.0.0.1 | POST     "/api/generate"
Post 6 assigned to feed: News 

[GIN] 2024/11/23 - 20:48:38 | 200 | 12.916983947s |       127.0.0.1 | POST     "/api/generate"
Post 7 assigned to feed: BookSky 

[GIN] 2024/11/23 - 20:48:43 | 200 |  5.700279975s |       127.0.0.1 | P

In [13]:
# Processiamo i post
generate_feed_assignments(post_filepath, "qwen2.5", instruction_zeroshot, result_dir_qwen, "zeroshot")

llama_model_loader: loaded meta data with 34 key-value pairs and 339 tensors from /root/.ollama/models/blobs/sha256-2bada8a7450677000f678be90653b85d364de7db25eb5ea54136ada5f3933730 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Qwen2.5 7B Instruct
llama_model_loader: - kv   3:                           general.finetune str              = Instruct
llama_model_loader: - kv   4:                           general.basename str              = Qwen2.5
llama_model_loader: - kv   5:                         general.size_label str              = 7B
llama_model_loader: - kv   6:                            general.license str              

[GIN] 2024/11/23 - 20:50:40 | 200 |         1m24s |       127.0.0.1 | POST     "/api/generate"
Post 1 assigned to feed: UkrainianView 

[GIN] 2024/11/23 - 20:50:47 | 200 |  7.534480349s |       127.0.0.1 | POST     "/api/generate"
Post 2 assigned to feed: Science 

[GIN] 2024/11/23 - 20:50:58 | 200 |  11.17425721s |       127.0.0.1 | POST     "/api/generate"
Post 3 assigned to feed: UkrainianView 

[GIN] 2024/11/23 - 20:51:19 | 200 | 20.985000096s |       127.0.0.1 | POST     "/api/generate"
Post 4 assigned to feed: AcademicSky 

[GIN] 2024/11/23 - 20:51:44 | 200 | 24.640348637s |       127.0.0.1 | POST     "/api/generate"
Post 5 assigned to feed: Unknown 

[GIN] 2024/11/23 - 20:51:55 | 200 | 10.669537689s |       127.0.0.1 | POST     "/api/generate"
Post 6 assigned to feed: GreenSky 

[GIN] 2024/11/23 - 20:52:07 | 200 | 12.766281984s |       127.0.0.1 | POST     "/api/generate"
Post 7 assigned to feed: Game Dev 

[GIN] 2024/11/23 - 20:52:13 | 200 |  5.254096051s |       127.0.0.1 | PO

### One-shot prompting  
Analizziamo il comportamento dei modelli utilizzando il one-shot prompting. 

In [14]:
instruction_oneshot = """You are a classifier. Assign this post to exactly one of the following 11 topics based on their descriptions:
"AcademicSky": Posts related to academia, academic discussions, academic jobs, higher education, scholarships, scientific research, and university.
"Blacksky": Amplifying the voices of any and all Black users.
"BookSky": A feed for anyone who likes reading and books.
"Disability": Posts discussing disability, accessibility, disability rights, or issues related to disability.
"Game Dev": Posts about all aspects of game development.
"GreenSky": A big list of climate accounts, filtered loosely for keywords.
"News": Headlines from verified news organisations.
"Political Science": A feed for political science and international relations research and discussion.
"Science": The Science Feed. A curated feed from Bluesky professional scientists and science communicators.
"UkrainianView": Posts from Ukrainians about Ukraine and their experience during the war.
"Whats History": Posts by historians using :cardfilebox: or skystorians.

If the post does not fit any of these topics, respond with "Unknown". Your response must be only the topic name, without any additional text.

For example, the post "Seattle child abuse suspect faked death by jumping off bridge then lived in LA" should be assigned to the topic "News". """

In [15]:
# Processiamo i post
generate_feed_assignments(post_filepath, "llama3.1", instruction_oneshot, result_dir_llama, "oneshot")

[GIN] 2024/11/23 - 20:53:15 | 200 | 16.360414192s |       127.0.0.1 | POST     "/api/generate"
Post 1 assigned to feed: UkrainianView 

[GIN] 2024/11/23 - 20:53:23 | 200 |  8.569213644s |       127.0.0.1 | POST     "/api/generate"
Post 2 assigned to feed: AcademicSky 

[GIN] 2024/11/23 - 20:53:35 | 200 | 11.810866039s |       127.0.0.1 | POST     "/api/generate"
Post 3 assigned to feed: UkrainianView 

[GIN] 2024/11/23 - 20:53:57 | 200 | 21.524295419s |       127.0.0.1 | POST     "/api/generate"
Post 4 assigned to feed: AcademicSky 

[GIN] 2024/11/23 - 20:54:21 | 200 | 24.924159653s |       127.0.0.1 | POST     "/api/generate"
Post 5 assigned to feed: Whats History 

[GIN] 2024/11/23 - 20:54:31 | 200 |  9.249838679s |       127.0.0.1 | POST     "/api/generate"
Post 6 assigned to feed: Science 

Post 7 assigned to feed: BookSky 
[GIN] 2024/11/23 - 20:54:44 | 200 | 13.124305953s |       127.0.0.1 | POST     "/api/generate"

[GIN] 2024/11/23 - 20:54:49 | 200 |  5.186335249s |       127.0.

In [16]:
# Processiamo i post
generate_feed_assignments(post_filepath, "qwen2.5", instruction_oneshot, result_dir_qwen, "oneshot")

[GIN] 2024/11/23 - 20:55:37 | 200 | 15.906317696s |       127.0.0.1 | POST     "/api/generate"
Post 1 assigned to feed: UkrainianView 

[GIN] 2024/11/23 - 20:55:44 | 200 |  7.585235676s |       127.0.0.1 | POST     "/api/generate"
Post 2 assigned to feed: Science 

[GIN] 2024/11/23 - 20:55:54 | 200 | 10.035687209s |       127.0.0.1 | POST     "/api/generate"
Post 3 assigned to feed: Whats History 

[GIN] 2024/11/23 - 20:56:15 | 200 |  21.00402596s |       127.0.0.1 | POST     "/api/generate"
Post 4 assigned to feed: AcademicSky 

[GIN] 2024/11/23 - 20:56:40 | 200 | 24.491370953s |       127.0.0.1 | POST     "/api/generate"
Post 5 assigned to feed: Unknown 

[GIN] 2024/11/23 - 20:56:49 | 200 |  9.594972846s |       127.0.0.1 | POST     "/api/generate"
Post 6 assigned to feed: GreenSky 

[GIN] 2024/11/23 - 20:57:03 | 200 | 13.326167433s |       127.0.0.1 | POST     "/api/generate"
Post 7 assigned to feed: Game Dev 

[GIN] 2024/11/23 - 20:57:08 | 200 |  4.707810944s |       127.0.0.1 | PO

### Two-shot prompting  
Analizziamo il comportamento dei modelli utilizzando il two-shot prompting.   


In [17]:
instruction_twoshot = """You are a classifier. Assign this post to exactly one of the following 11 topics based on their descriptions:
"AcademicSky": Posts related to academia, academic discussions, academic jobs, higher education, scholarships, scientific research, and university.
"Blacksky": Amplifying the voices of any and all Black users.
"BookSky": A feed for anyone who likes reading and books.
"Disability": Posts discussing disability, accessibility, disability rights, or issues related to disability.
"Game Dev": Posts about all aspects of game development.
"GreenSky": A big list of climate accounts, filtered loosely for keywords.
"News": Headlines from verified news organisations.
"Political Science": A feed for political science and international relations research and discussion.
"Science": The Science Feed. A curated feed from Bluesky professional scientists and science communicators.
"UkrainianView": Posts from Ukrainians about Ukraine and their experience during the war.
"Whats History": Posts by historians using :cardfilebox: or skystorians.

If the post does not fit any of these topics, respond with "Unknown". Your response must be only the topic name, without any additional text.

For example, the post "Seattle child abuse suspect faked death by jumping off bridge then lived in LA" should be assigned to the topic "News". 
And the post "To be fair, they didn't go quiet. Instead, they blamed women and black people." should be assigned to the topic "Black Sky" """

In [18]:
# Processiamo i post
generate_feed_assignments(post_filepath, "llama3.1", instruction_twoshot, result_dir_llama, "twoshot")

[GIN] 2024/11/23 - 20:58:10 | 200 | 16.089311425s |       127.0.0.1 | POST     "/api/generate"
Post 1 assigned to feed: AcademicSky 

[GIN] 2024/11/23 - 20:58:18 | 200 |  8.000571847s |       127.0.0.1 | POST     "/api/generate"
Post 2 assigned to feed: News 

[GIN] 2024/11/23 - 20:58:30 | 200 | 12.384330966s |       127.0.0.1 | POST     "/api/generate"
Post 3 assigned to feed: #UkrainianView 

[GIN] 2024/11/23 - 20:58:52 | 200 | 21.614386965s |       127.0.0.1 | POST     "/api/generate"
Post 4 assigned to feed: AcademicSky 

[GIN] 2024/11/23 - 20:59:17 | 200 | 25.044505919s |       127.0.0.1 | POST     "/api/generate"
Post 5 assigned to feed: BookSky 

[GIN] 2024/11/23 - 20:59:26 | 200 |  9.244397478s |       127.0.0.1 | POST     "/api/generate"
Post 6 assigned to feed: Science 

[GIN] 2024/11/23 - 20:59:39 | 200 | 13.090093622s |       127.0.0.1 | POST     "/api/generate"
Post 7 assigned to feed: BookSky 

[GIN] 2024/11/23 - 20:59:46 | 200 |  6.610286745s |       127.0.0.1 | POST    

In [19]:
# Processiamo i post
generate_feed_assignments(post_filepath, "qwen2.5", instruction_twoshot, result_dir_qwen, "twoshot")

[GIN] 2024/11/23 - 21:01:27 | 200 |  16.52201522s |       127.0.0.1 | POST     "/api/generate"
Post 1 assigned to feed: UkrainianView 

[GIN] 2024/11/23 - 21:01:35 | 200 |  7.597510787s |       127.0.0.1 | POST     "/api/generate"
Post 2 assigned to feed: Unknown 

[GIN] 2024/11/23 - 21:01:46 | 200 | 11.352594917s |       127.0.0.1 | POST     "/api/generate"
Post 3 assigned to feed: UkrainianView 

[GIN] 2024/11/23 - 21:02:11 | 200 | 24.576647192s |       127.0.0.1 | POST     "/api/generate"
Post 4 assigned to feed: AcademicSky 

[GIN] 2024/11/23 - 21:02:36 | 200 | 25.321010839s |       127.0.0.1 | POST     "/api/generate"
Post 5 assigned to feed: BookSky 

[GIN] 2024/11/23 - 21:02:46 | 200 |  9.750006053s |       127.0.0.1 | POST     "/api/generate"
Post 6 assigned to feed: GreenSky 

[GIN] 2024/11/23 - 21:02:59 | 200 | 13.376224886s |       127.0.0.1 | POST     "/api/generate"
Post 7 assigned to feed: Game Dev 

[GIN] 2024/11/23 - 21:03:04 | 200 |   4.79454198s |       127.0.0.1 | PO

### Few-shot prompting  

Analizziamo il comportamento del modello quando gli forniamo un esempio per ogni feed.

In [20]:
instruction_fewshot = """You are a classifier. Assign this post to exactly one of the following 11 topics based on their descriptions:
"AcademicSky": Posts related to academia, academic discussions, academic jobs, higher education, scholarships, scientific research, and university.
"Blacksky": Amplifying the voices of any and all Black users.
"BookSky": A feed for anyone who likes reading and books.
"Disability": Posts discussing disability, accessibility, disability rights, or issues related to disability.
"Game Dev": Posts about all aspects of game development.
"GreenSky": A big list of climate accounts, filtered loosely for keywords.
"News": Headlines from verified news organizations.
"Political Science": A feed for political science and international relations research and discussion.
"Science": The Science Feed. A curated feed from Bluesky professional scientists and science communicators.
"UkrainianView": Posts from Ukrainians about Ukraine and their experience during the war.
"Whats History": Posts by historians using :cardfilebox: or skystorians.

If the post does not fit any of these topics, respond with "Unknown". 

Examples of posts assigned to a particular topic:
- Academic Sky: "A colleague suggested not doing group projects bc students can't align their schedules & they're too busy. I already have them work in small groups for activities & I cant give up more class time for group projects during class. Thoughts?"
- Black Sky: "It's tiring cuz Black people are giving so much politically & always have. Those invested in electoralism continue hold up the democratic party, & still get treated as its mules. As radicals fighting US imperialism, Black Americans are at the forefront on Haiti, Palestine, now Cuba"
- Book Sky: "Very much recommend #thehemaneffect by Brian Brown, changed my Perfektion of the (capitalist) world for good. "
- Disability: "Need every Opposition party to come out loud long and strong on this because attacking disability funding is pushing the disabled out of public life."
- Game Dev: "I tried playing a couple of those games once but I kept getting frustrated that they wanted you to dress the character a specific way to get the points to go on. Like I care about what I wanted fashion not what the game devs wanted"
- Green Sky: "I don't think that's true -- offshore wind in NY has now hit the price of the troubled Vogtle nuclear power plant. Full scale seasonal storage doesn't exist yet. And the Royal Society estimate of supply for the UK (a hard case) is 1 kW per capita"
- News: "NASA is marking the first anniversary of the James Webb Space Telescope\u2019s scientific debut with the release of a new image, demonstrating the telescope\u2019s ability to re-envision the universe.\n\n\ud83c\udf0c See more discoveries from the telescope: https://wapo.st/3XQTIWc"
- Political Science: "Looking for recent studies on the use of palate cleansers as a distraction after pre-tests in social science experiments. Wading through a whole bunch of search results on taste tests & tongues. Hoping someone here might be able to more efficiently point me in the right direction"
- Science: "More shots fired in the seaweed-carbon sequestration debate.\n\n\"Without sound science and sufficient knowledge on impacts to these fragile ecosystems, it distracts from more rational and effective blue-carbon interventions."
- Ukrainian View: "I have a memory from my childhood (~2010) when a russian on the Internet said to me:\n\"You don't know exactly who will win elections in your country, but we know who will in ours for sure. It's simple, it's always been like that"
- What's History: "The sculpture park is the third site created by the Equal Justice Initiative in Montgomery, Ala., which is dedicated to taking an unflinching look at the country\u2019s history of slavery, racism and discriminatory policing.\" \ud83d\uddc3\ufe0f www.latimes.com/world-nation..."

Your response must be only the topic name, without any additional text."""

In [21]:
# Processiamo i post  
generate_feed_assignments(post_filepath, "llama3.1", instruction_fewshot, result_dir_llama, "fewshot")

[GIN] 2024/11/23 - 21:07:41 | 200 |         3m43s |       127.0.0.1 | POST     "/api/generate"
Post 1 assigned to feed: Ukrainian View 

[GIN] 2024/11/23 - 21:07:50 | 200 |  8.898971425s |       127.0.0.1 | POST     "/api/generate"
Post 2 assigned to feed: Science 

[GIN] 2024/11/23 - 21:08:03 | 200 | 12.813376149s |       127.0.0.1 | POST     "/api/generate"
Post 3 assigned to feed: Ukrainian View 

[GIN] 2024/11/23 - 21:08:26 | 200 | 23.412686288s |       127.0.0.1 | POST     "/api/generate"
Post 4 assigned to feed: AcademicSky 

[GIN] 2024/11/23 - 21:08:53 | 200 | 26.721659025s |       127.0.0.1 | POST     "/api/generate"
Post 5 assigned to feed: Whats History 

[GIN] 2024/11/23 - 21:09:03 | 200 |  10.34196868s |       127.0.0.1 | POST     "/api/generate"
Post 6 assigned to feed: Science 

[GIN] 2024/11/23 - 21:09:18 | 200 | 14.148148092s |       127.0.0.1 | POST     "/api/generate"
Post 7 assigned to feed: Game Dev 

[GIN] 2024/11/23 - 21:09:23 | 200 |  5.667064616s |       127.0.0

In [22]:
# Processiamo i post
generate_feed_assignments(post_filepath, "qwen2.5", instruction_fewshot, result_dir_qwen, "fewshot")

time=2024-11-23T21:09:57.640Z level=INFO source=server.go:105 msg="system memory" total="31.4 GiB" free="24.2 GiB" free_swap="0 B"
time=2024-11-23T21:09:57.645Z level=INFO source=memory.go:343 msg="offload to cpu" layers.requested=-1 layers.model=29 layers.offload=0 layers.split="" memory.available="[24.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="448.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="4.1 GiB" memory.weights.repeating="3.7 GiB" memory.weights.nonrepeating="426.4 MiB" memory.graph.full="478.0 MiB" memory.graph.partial="730.4 MiB"
time=2024-11-23T21:09:57.667Z level=INFO source=server.go:383 msg="starting llama server" cmd="/tmp/ollama1891697421/runners/cpu_avx2/ollama_llama_server --model /root/.ollama/models/blobs/sha256-2bada8a7450677000f678be90653b85d364de7db25eb5ea54136ada5f3933730 --ctx-size 8192 --batch-size 512 --threads 2 --no-mmap --parallel 4 --port 36137"
time=2024-11-23T2

[GIN] 2024/11/23 - 21:14:38 | 200 |         4m40s |       127.0.0.1 | POST     "/api/generate"
Post 1 assigned to feed: UkrainianView 

[GIN] 2024/11/23 - 21:14:46 | 200 |  8.067849962s |       127.0.0.1 | POST     "/api/generate"
Post 2 assigned to feed: Science 

[GIN] 2024/11/23 - 21:14:59 | 200 | 12.988247123s |       127.0.0.1 | POST     "/api/generate"
Post 3 assigned to feed: UkrainianView 

[GIN] 2024/11/23 - 21:15:21 | 200 |   22.5158324s |       127.0.0.1 | POST     "/api/generate"
Post 4 assigned to feed: AcademicSky 

[GIN] 2024/11/23 - 21:15:47 | 200 | 26.054994614s |       127.0.0.1 | POST     "/api/generate"
Post 5 assigned to feed: Unknown 

[GIN] 2024/11/23 - 21:15:58 | 200 | 10.831200168s |       127.0.0.1 | POST     "/api/generate"
Post 6 assigned to feed: GreenSky 

[GIN] 2024/11/23 - 21:16:11 | 200 | 13.234506655s |       127.0.0.1 | POST     "/api/generate"
Post 7 assigned to feed: Unknown 

[GIN] 2024/11/23 - 21:16:17 | 200 |  5.158276719s |       127.0.0.1 | POS

## Confronto tra i modelli

Procediamo ora con il confronto delle classificazioni assegnate dai modelli LLaMA 3.1 e Qwen 2.5.

L'analisi include sia il numero di occasioni in cui i due modelli concordano o discordano nelle loro previsioni, sia l'osservazione di metriche aggiuntive come accuracy, precision, recall e F1-score, per valutare in modo completo le prestazioni di entrambi i modelli.

**Sommario input e output**:  \
**Input**: file con le classificazioni di ciascun modello e ciascuna tecnica di prompting, contenuti nel dataset `feed_classification_results`.   \
**Output**: i risultati vengono visualizzati in tabelle.

Definiamo una serie di funzioni utili per il confronto: 
- Funzione che restituisce le classificazioni concordi e discordi tra i due modelli

In [23]:
def confronta_classificazioni(file_lama, file_qwen):

    with open(file_lama, 'r') as f:
        lama_data = json.load(f)
    with open(file_qwen, 'r') as f:
        qwen_data = json.load(f)
    
    df_lama = pd.DataFrame(lama_data)
    df_qwen = pd.DataFrame(qwen_data)
        
    # Effettua il merge sui post_id
    df = pd.merge(df_lama, df_qwen, on="post_id", suffixes=('_lama', '_qwen'))

    df['Concordanza'] = df['feed_assignment_lama'] == df['feed_assignment_qwen']
    
    # Suddividi i post concordi e discordi
    post_concordi = df[df['Concordanza'] == True]
    post_discordi = df[df['Concordanza'] == False]
    
    return post_concordi, post_discordi


- Funzione che calcola la percentuale di concordanza dei modelli in ogni topic.

In [24]:
def concordanza_per_topic(post_concordi, post_discordi):
    
    post = pd.concat([post_concordi, post_discordi])
    
    # Numero totale di post per ciascun topic
    post_per_topic = post['feed_assignment_lama'].value_counts()

    # Numero di post concordi per ciascun topic
    concordi_per_topic = post_concordi['feed_assignment_lama'].value_counts()
    
    percentuale_concordanza = pd.DataFrame({
        "Topic": concordi_per_topic.index,
        "Post Concordi": concordi_per_topic.values,
        "Totale Post": post_per_topic[concordi_per_topic.index].values
    })
    percentuale_concordanza["Percentuale Concordanza"] = (
        percentuale_concordanza["Post Concordi"] / percentuale_concordanza["Totale Post"] * 100
    )

    percentuale_concordanza = percentuale_concordanza.sort_values(by="Percentuale Concordanza", ascending=False)
    
    return percentuale_concordanza


- Funzioni per il calcolo delle metriche di valutazione

In [25]:
def get_ground_truth_vector(ground_truth_file):
    ground_truth = []
    post_ids = []

    with open(ground_truth_file, 'r') as file:
        for line in file:
            post = json.loads(line)
            post_ids.append(post['post_id'])
            ground_truth.append(post['feed'])
    
    # Ordina i vettori in base all'id dei post
    sorted_indices = sorted(range(len(post_ids)), key=lambda i: post_ids[i])
    ground_truth_sorted = [ground_truth[i] for i in sorted_indices]
    
    return ground_truth_sorted



def get_predictions_vector(predictions_file):
    predictions = []
    post_ids = []
    
    with open(predictions_file, 'r') as file:
        predictions_data = json.load(file)

    for entry in predictions_data:
        post_ids.append(entry['post_id'])
        predictions.append(entry['feed_assignment'])
    
    # Ordina i vettori in base all'id dei post
    sorted_indices = sorted(range(len(post_ids)), key=lambda i: post_ids[i])
    predictions_sorted = [predictions[i] for i in sorted_indices]
    
    return predictions_sorted

- Funzioni per stampare i risultati in tabelle

In [26]:
!pip install rich



In [27]:
from rich.console import Console
from rich.table import Table

def summary(post_concordi, post_discordi, titolo):
    totale = len(post_concordi) + len(post_discordi)
    
    table = Table(title="Confronto Classificazioni "+titolo)
    table.add_column("Tipologia", style="black", justify="left")
    table.add_column("Conteggio", style="cyan", justify="center")
    
    table.add_row("Post Concordi", str(len(post_concordi)))
    table.add_row("Post Discordi", str(len(post_discordi)))
    table.add_row("Totale Post", str(totale))
    
    console = Console()
    console.print(table)
    
def summary_per_topic(percentuale_concordanza, titolo):
        
    table = Table(title="Percentuale di Concordanza per Topic "+titolo)
    
    table.add_column("Topic", style="blue", justify="left")
    table.add_column("Post Concordi", style="blue", justify="right")
    table.add_column("Totale Post", style="blue", justify="right")
    table.add_column("Percentuale Concordanza (%)", style="cyan", justify="right")
    
    for _, row in percentuale_concordanza.iterrows():
        table.add_row(
            row["Topic"],
            str(row["Post Concordi"]),
            str(row["Totale Post"]),
            f"{row['Percentuale Concordanza']:.2f}"  # Arrotonda a due decimali
        )
    
    console = Console()
    console.print(table)

In [28]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

def classification_report(ground_truth, predictions, modello, prompt_type):
    
    # Calcolo delle metriche
    accuracy = accuracy_score(ground_truth, predictions)
    precision = precision_score(ground_truth, predictions, average='weighted')
    recall = recall_score(ground_truth, predictions, average='weighted',zero_division=0)
    f1 = f1_score(ground_truth, predictions, average='weighted')
    
    # Creazione della tabella
    table = Table(title="Performance del modello " + modello + " - "+ prompt_type)
    table.add_column("Metrica", style="black", justify="left")
    table.add_column("Valore", style="cyan", justify="center")
    
    table.add_row("Accuracy", f"{accuracy:.3f}")
    table.add_row("Precision", f"{precision:.3f}")
    table.add_row("Recall", f"{recall:.3f}")
    table.add_row("F1-Score", f"{f1:.3f}")
    
    # Stampa la tabella
    console = Console()
    console.print(table)


Otteniamo il vettore contenente le ground truth.

In [29]:
feed_post_file = '/kaggle/input/feed-post-for-validation/feed_posts.jsonl'
ground_truth = get_ground_truth_vector(feed_post_file)

#### Confronto dei risultati ottenuti con lo zero-shot prompting

In [30]:
file_lama = '/kaggle/input/feed-classification-results/feed_classification_results/llama3.1/llama3.1_zeroshot.json'
file_qwen = '/kaggle/input/feed-classification-results/feed_classification_results/qwen2.5/qwen2.5_zeroshot.json'

post_concordi, post_discordi = confronta_classificazioni(file_lama, file_qwen)

# Stampa i risultati
summary(post_concordi,post_discordi,"Zeroshot")

percentuale_concordanza = concordanza_per_topic(post_concordi, post_discordi)
summary_per_topic(percentuale_concordanza, "Zeroshot")

I modelli dimostrano un buon livello di accordo, assegnando i post agli stessi feed nella maggior parte dei casi. Inoltre, la percentuale di concordanza risulta elevata tra i vari feed. Tuttavia, si nota un'eccezione nel feed News, dove la percentuale di concordanza scende al di sotto del 50%.

Visualizziamo anche le metriche di valutazione per i duei modelli.

In [31]:
predictions_llama = get_predictions_vector(file_lama)
predictions_qwen = get_predictions_vector(file_qwen)

# Performance llama3.1
classification_report(ground_truth, predictions_llama, "llama3.1", "Zeroshot")

# Performance qwen2.5
classification_report(ground_truth, predictions_qwen, "qwen2.5", "Zeroshot")

I modelli mostrano performance complessivamente buone e metriche simili. Qwen 2.5 presenta una precision maggiore, indicando una migliore capacità di ridurre i falsi positivi. Llama 3.1, invece, presenta un F1-Score leggermente migliore, evidenziando un miglior bilanciamento tra precision e recall.

### Confronto dei risultati ottenuti con il one-shot prompting 

In [32]:
file_lama = '/kaggle/input/feed-classification-results/feed_classification_results/llama3.1/llama3.1_oneshot.json'
file_qwen = '/kaggle/input/feed-classification-results/feed_classification_results/qwen2.5/qwen2.5_oneshot.json'

post_concordi, post_discordi = confronta_classificazioni(file_lama, file_qwen)

# Stampa i risultati
summary(post_concordi,post_discordi,"Oneshot")

percentuale_concordanza = concordanza_per_topic(post_concordi, post_discordi)
summary_per_topic(percentuale_concordanza, "Oneshot")

Il numero di post sui quali i modelli concordano rimane maggiore rispetto a quelli su cui si trovano in disaccordo, sebbene la differenza è diminuita. Anche le percentuali di concordanza tra i feed sono in generale più basse, con ben 5 feed che ora presentano una percentuale di concordanza inferiore al 50%. Inoltre, è emerso un errore nei modelli, che hanno assegnato un post a un feed non previsto tra quelli proposti, ossia "PublicHealth".

In [33]:
predictions_llama = get_predictions_vector(file_lama)
predictions_qwen = get_predictions_vector(file_qwen)

# Performance llama3.1
classification_report(ground_truth, predictions_llama, "llama3.1", "Oneshot")

# Performance qwen2.5
classification_report(ground_truth, predictions_qwen, "qwen2.5", "Oneshot")

Le metriche di valutazione mostrano un peggioramento rispetto al caso precedente. Il modello Llama 3.1 presenta valori più bassi rispetto al modello Qwen 2.5, con performance intorno al 58% in tutte le metriche, sebbene la precisione rimanga relativamente buona. D'altra parte, Qwen 2.5 continua a presentare buone prestazioni, ma con valori complessivamente inferiori rispetto alla situazione precedente.

### Confronto dei risultati ottenuti con il two-shot prompting 

In [34]:
file_lama = '/kaggle/input/feed-classification-results/feed_classification_results/llama3.1/llama3.1_twoshot.json'
file_qwen = '/kaggle/input/feed-classification-results/feed_classification_results/qwen2.5/qwen2.5_twoshot.json'

post_concordi, post_discordi = confronta_classificazioni(file_lama, file_qwen)

# Stampa i risultati
summary(post_concordi,post_discordi,"Twoshot")

percentuale_concordanza = concordanza_per_topic(post_concordi, post_discordi)
summary_per_topic(percentuale_concordanza, "Twoshot")

I risultati sono simili a quelli precedenti: i modelli continuano a registrare un numero maggiore di post concordi rispetto a quelli in disaccordo, con valori che rimangono sostanzialmente invariati. La percentuale di concordanza nei feed risulta piuttosto bassa, con un numero crescente di feed che mostrano percentuali molto ridotte. Come nei casi precedenti, i feed con le percentuali di concordanza più alte sono UkrainianView e Game Dev, mentre quelli con le percentuali più basse restano Blacksky e News. 

In [35]:
predictions_llama = get_predictions_vector(file_lama)
predictions_qwen = get_predictions_vector(file_qwen)

# Performance llama3.1
classification_report(ground_truth, predictions_llama, "llama3.1", "Twoshot")

# Performance qwen2.5
classification_report(ground_truth, predictions_qwen, "qwen2.5", "Twoshot")

Le metriche evidenziano un peggioramento nelle performance del modello Llama, che ora si aggirano intorno al 50%. Al contrario, il modello Qwen mostra performance migliori, ma nonostante ciò, rimangono comunque inferiori rispetto ai risultati ottenuti in precedenza.

### Confronto dei risultati ottenuti con il few-shot prompting 

In [36]:
file_lama = '/kaggle/input/feed-classification-results/feed_classification_results/llama3.1/llama3.1_fewshot.json'
file_qwen = '/kaggle/input/feed-classification-results/feed_classification_results/qwen2.5/qwen2.5_fewshot.json'

post_concordi, post_discordi = confronta_classificazioni(file_lama, file_qwen)

# Stampa i risultati
summary(post_concordi,post_discordi,"Fewshot")

percentuale_concordanza = concordanza_per_topic(post_concordi, post_discordi)
summary_per_topic(percentuale_concordanza, "Fewshot")

Nonostante il numero di post concordi continui a essere maggiore rispetto a quelli discordi, i modelli mostrano un livello di accordo ancora più basso rispetto ai casi precedenti. Le percentuali di accordo nei feed sono generalmente migliori rispetto ai casi "twoshot" e "oneshot", ma si osservano alcuni errori nei modelli, che classificano i post in feed non previsti tra quelli proposti, come "PublicHealth" e "What's History".

In [37]:
predictions_llama = get_predictions_vector(file_lama)
predictions_qwen = get_predictions_vector(file_qwen)

# Performance llama3.1
classification_report(ground_truth, predictions_llama, "llama3.1", "Fewshot")

# Performance qwen2.5
classification_report(ground_truth, predictions_qwen, "qwen2.5", "Fewshot")

Le metriche mostrano un lieve miglioramento, con il modello Qwen 2.1 che continua a presentare performance superiori rispetto al modello Llama 3.1. 


In definitiva, i migliori risultati si ottengono quando viene utilizzato lo *zeroshot prompting*. Si osserva inoltre che fornire un numero maggiore di esempi nel prompt non sembra aiutare i modelli nelle classificazioni.