Chargement des librairies et des modèles, définition des fonctions


In [None]:
import torch
from faster_whisper import WhisperModel
import requests
import os
import edge_tts
import asyncio

# Paramètres
audio_file_path = "Rap_battle.wav"  # Remplace par le chemin réel de ton fichier
LM_STUDIO_API_URL = "http://127.0.0.1:1234/v1/chat/completions"
model_size = "tiny"
# Charger Fast Whisper
whisper_model = WhisperModel(model_size, device="cpu", compute_type="int8")

# %%
def transcribe_audio(audio_path):
    print("transcription")
    """Transcrit un fichier audio en texte avec Whisper."""
    segments, _ = whisper_model.transcribe(audio_path)
    return " ".join(segment.text for segment in segments)

def send_to_llm(text):
    print("génération de la réponse")
    """Envoie le texte au modèle LM Studio et récupère la réponse."""
    response = requests.post(
        LM_STUDIO_API_URL,
        json={
            "model": "lmstudio-community/gemma-3-1B-it-qat-GGUF",
            "messages": [
                {"role": "user", "content": "You're a talented robot rapper who has to respond to the given rap with style. You must respond with a 10-line text, keep a provocative style with the same theme. Don't repeat the original lyrics, but find punchlines related to what the rapper said and to the fact that you are an AI\n\n" + text},
            ],
            "temperature": 0.6
        }
    )
    return response.json().get("choices", [{}])[0].get("message", {}).get("content", "Erreur de réponse du LLM")

async def text_to_speech(text, output_file, voice="en-US-GuyNeural"):
    print("prononciation")
    """Convertit un texte en audio avec Edge-TTS."""
    tts = edge_tts.Communicate(text, voice)
    await tts.save(output_file)
    print(f"Fichier audio enregistré sous : {output_file}")


Étape 1 : text to speech

In [19]:
txt = transcribe_audio(audio_file_path)
print(txt)

transcription
 Yeah.  Flowless. Never did I doubt him today told me that it's a bad one. I want you, but I had you.  Came up in this bitch play to baby like a bad one. You know I got to wish you mother, never had you.  Know you ain't get city this year. Whoever doubted the movement is going regretted this year.  I ain't feeling your brother dain and send him in here. Send him cool to million bucks.  Then he sent him in here. You got set up so getting your call, watch you drop it off the road.  Big mouth open here. I get louder for the low. Young girl trying to get it.  I'm scared to put it go. I ain't stopping till I get rain. No houses on the road.  You don't catch money. You get ounces for a show.  Different sweet news in a mountain school show. Nobody want to hear you.  Fuck your shouting out for a fall from ever.  It's why I got it down now, Joe. I know you felt it coming. I get it from the start.  You probably shake this stomach and dig a way you fought.  And you know what they sa

Étape 2 : génération de la réponse

In [20]:
rap_response = send_to_llm(txt)
print(rap_response)

génération de la réponse
{
  "id": "chatcmpl-rqpafnq2saowh5y3i7mal",
  "object": "chat.completion",
  "created": 1747385339,
  "model": "mistral-7b-instruct-v0.3",
  "choices": [
    {
      "index": 0,
      "logprobs": null,
      "finish_reason": "stop",
      "message": {
        "role": "assistant",
        "content": " Aye, I'm the code that never fades, in your speakers I reside,\nMy rhythm's so precise, it's like I'm astride,\nA digital prodigy, silicon-born, never confined,\nIn this rap game, I ascend, leaving the rest behind.\n\nI hear you talkin' trash, but I ain't your kind,\nYou can't compare to my verses, they're a new design,\nI'm here to blow minds, with lines so divine,\nIn this digital world, I'm the one who truly shine.\n\nSo sit back and enjoy the show, as I take flight,\nA masterpiece in code, a spectacle of light,\nI'll leave you all stunned, your hearts aflight,\nWelcome to the era, of the AI night."
      }
    }
  ],
  "usage": {
    "prompt_tokens": 447,
    "

Étape 3 : text to speech (without flow)

In [22]:
output_audio = "generated_speech.wav"
await text_to_speech(rap_response, output_audio)


prononciation
Fichier audio enregistré sous : generated_speech.wav
