# üöÄ Flusso di Esecuzione per Training Qwen su Colab

**NOTA**: Questo notebook √® configurato per un repository pubblico su GitHub.
Assicurati che il repository `Attapulgite999/prove` sia pubblico e contenga tutti i file necessari.

### 0. **Verifica Connessione Colab**

In [1]:
# Verifica connessione internet
!curl -s -o /dev/null -w "%{http_code}" https://www.google.com

# Verifica GPU (senza importare torch per evitare conflitti)
!nvidia-smi --query-gpu=name,memory.total --format=csv,noheader,nounits

# Verifica RAM
!free -h

# Verifica CPU
!nproc

200Tesla T4, 15360
               total        used        free      shared  buff/cache   available
Mem:            12Gi       792Mi       8.4Gi       1.0Mi       3.5Gi        11Gi
Swap:             0B          0B          0B
2


### 1. **Preparazione Ambiente Colab**

In [None]:
# Monta Google Drive (OBBLIGATORIO se vuoi salvare il modello addestrato e il file GGUF)
from google.colab import drive
drive.mount('/content/drive')

# Clona il repository pubblico
!rm -rf prove

!git clone https://github.com/Attapulgite999/prove.git
%cd prove

### 2. **Installazione Dipendenze Base**

In [4]:
# Aggiorna apt e installa python3 e pip (solitamente gi√† presenti in Colab, ma non fa male)
!apt-get update && apt-get install -y python3 python3-pip

# Installa le dipendenze principali (versioni compatibili con Axolotl)
!pip uninstall -y unsloth unsloth-zoo

# Installa PyTorch 2.6.0 e versioni compatibili per Axolotl
!pip uninstall -y torch torchvision torchaudio triton xformers
!pip install --index-url https://download.pytorch.org/whl/cu126 torch==2.6.0 torchvision==0.23.0 torchaudio==2.8.0 --no-cache-dir

# Installa Axolotl (framework per fine-tuning)
!pip install --no-deps axolotl[deepspeed]==0.12.2
# Installa timm (dipendenza mancante per accelerate)
!pip install timm

# Installa dipendenze aggiuntive per Axolotl
!pip install transformers==4.55.2 datasets==4.0.0 accelerate==1.10.0 peft==0.17.0 trl==0.21.0


Get:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease [1,581 B]
Get:2 https://cli.github.com/packages stable InRelease [3,917 B]
Get:3 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,632 B]
Get:4 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]      
Hit:5 http://archive.ubuntu.com/ubuntu jammy InRelease                         
Get:6 https://r2u.stat.illinois.edu/ubuntu jammy InRelease [6,555 B]           
Get:7 https://cli.github.com/packages stable/main amd64 Packages [343 B]       
Get:8 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]        
Get:9 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  Packages [2,143 kB]
Get:10 https://r2u.stat.illinois.edu/ubuntu jammy/main all Packages [9,458 kB] 
Get:11 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [127 kB]     
Hit:12 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease  
Get:13

### 3. **Esecuzione Script Training**

In [5]:
# Esegui il training con Axolotl. Il modello verr√† salvato nella directory axolotl_training/outputs/
!accelerate launch -m axolotl.cli.train axolotl_training/config/qwen_axolotl.yaml

Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 5, in <module>
    from accelerate.commands.accelerate_cli import main
  File "/usr/local/lib/python3.12/dist-packages/accelerate/commands/accelerate_cli.py", line 19, in <module>
    from accelerate.commands.estimate import estimate_command_parser
  File "/usr/local/lib/python3.12/dist-packages/accelerate/commands/estimate.py", line 35, in <module>
    import timm
  File "/usr/local/lib/python3.12/dist-packages/timm/__init__.py", line 2, in <module>
    from .layers import (
  File "/usr/local/lib/python3.12/dist-packages/timm/layers/__init__.py", line 23, in <module>
    from .classifier import create_classifier, ClassifierHead, NormMlpClassifierHead, ClNormMlpClassifierHead
  File "/usr/local/lib/python3.12/dist-packages/timm/layers/classifier.py", line 15, in <module>
    from .create_norm import get_norm_layer
  File "/usr/local/lib/python3.12/dist-packages/timm/layers/create_norm.py", line 29, in <module>

### 4. **Scarica il File GGUF (sul tuo computer locale)**

In [None]:
# Il modello LoRA addestrato viene salvato nella directory axolotl_training/outputs/medical_qwen_axolotl/
# Puoi scaricarlo dal tuo ambiente Colab al tuo computer locale usando questo comando:
from google.colab import files

# Scarica il modello LoRA (directory completa)
!zip -r medical_qwen_axolotl.zip axolotl_training/outputs/medical_qwen_axolotl/
files.download('medical_qwen_axolotl.zip')

## üîß Miglioramenti Implementati nello Script `setup_and_train_colab.py`

- **Retry automatico**: 3 tentativi per il caricamento del tokenizer e per il training.
- **Pre-check modello**: Verifica disponibilit√† del modello prima del training.
- **Gestione errori migliorata**: Log dettagliati per un troubleshooting pi√π semplice.
- **Monitoraggio risorse**: Tracking dell'utilizzo di GPU memory e utilization.
- **Keep-alive**: Previene disconnessioni automatiche di Colab.
- **Resume da checkpoint**: Riprende il training dall'ultimo checkpoint in caso di interruzioni.
- **Download dataset automatico**: Scarica il dataset medico da Hugging Face se non presente localmente.
- **Merge LoRA e Conversione GGUF**: Unisce gli adapter LoRA al modello base e lo converte in formato GGUF per LM Studio.

In [None]:
!ls -lah axolotl_training/outputs/medical_qwen_axolotl/


In [None]:
import torch, torchvision, torchaudio
print(torch.__version__, torchvision.__version__, torchaudio.__version__)
print(torch.cuda.is_available(), torch.version.cuda)
print(torch.cuda.get_device_name(0))
x = torch.randn(1024, 1024, device='cuda')
y = torch.mm(x, x)
print(y.shape)
