<a href="https://colab.research.google.com/github/Kavinass004/Ai-Driven/blob/main/news_groq.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# Core libraries
!pip install numpy librosa soundfile requests

# Groq API client
!pip install groq

# PyAnnote for speaker diarization
!pip install pyannote.audio

# HuggingFace libraries (for fallback functionality)
!pip install transformers torch

# FFmpeg is needed for video processing (must be installed on system)
# In Colab, it's already installed, but on local systems:
# apt-get install ffmpeg



In [2]:
!pip install torch torchaudio
!pip install https://github.com/pyannote/pyannote-audio/archive/develop.zip

Collecting https://github.com/pyannote/pyannote-audio/archive/develop.zip
  Downloading https://github.com/pyannote/pyannote-audio/archive/develop.zip
[2K     [32m|[0m [32m14.7 MB[0m [31m4.7 MB/s[0m [33m0:00:03[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  [1;31merror[0m: [1msubprocess-exited-with-error[0m
  
  [31m×[0m [32mPreparing metadata [0m[1;32m([0m[32mpyproject.toml[0m[1;32m)[0m did not run successfully.
  [31m│[0m exit code: [1;36m1[0m
  [31m╰─>[0m See above for output.
  
  [1;35mnote[0m: This error originates from a subprocess, and is likely not a problem with pip.
  Preparing metadata (pyproject.toml) ... [?25l[?25herror
[1;31merror[0m: [1mmetadata-generation-failed[0m

[31m×[0m Encountered error while generating package metadata.
[31m╰─>[0m See above for output.

[1;35mnote[0m: This is an issue with the package mentioned above, not pip.
[1;36mhint[0m: S

In [None]:
# AI News Company - Complete Pipeline
# This notebook combines speech-to-text conversion with speaker diarization and content generation
# Using Groq API for language models instead of running models directly

import os
import json
import numpy as np
import librosa
import soundfile as sf
import warnings
from groq import Groq
import requests
from pyannote.audio import Pipeline
warnings.filterwarnings("ignore")  # Suppress warnings for cleaner output

# Create directories for storing files
os.makedirs("/content/input", exist_ok=True)
os.makedirs("/content/output", exist_ok=True)

# Set up Groq API with key directly in code
GROQ_API_KEY = "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"  # Replace with your actual Groq API key
groq_client = Groq(api_key=GROQ_API_KEY)

# Function to upload files to Colab
from google.colab import files

def upload_file_to_colab():
    """
    Upload a file to Colab environment

    Returns:
    - Path to the uploaded file
    """
    print("Please upload your audio or video file...")
    uploaded = files.upload()

    if not uploaded:
        raise ValueError("No file was uploaded!")

    filename = list(uploaded.keys())[0]
    filepath = f"/content/input/{filename}"

    # Save the uploaded file
    with open(filepath, 'wb') as f:
        f.write(uploaded[filename])

    print(f"File saved to {filepath}")
    return filepath

#######################
# STEP 1: Speech-to-Text Conversion with Speaker Diarization
#######################

def transcribe_with_speaker_diarization(audio_path, output_path, use_whisper_api=True):
    """
    Transcribe audio file with speaker diarization for code-mixed Tamil

    Parameters:
    - audio_path: Path to the audio file
    - output_path: Path to save the transcript
    - use_whisper_api: If True, use Whisper API via Groq
    """
    try:
        print("Loading speaker diarization model...")
        # Initialize the speaker diarization pipeline - we still need PyAnnote locally
        diarization_pipeline = Pipeline.from_pretrained(
            "pyannote/speaker-diarization@2.1",
            use_auth_token=True  # Uses the token from huggingface_hub login
        )

        print("Running speaker diarization...")
        # Run speaker diarization on the audio file
        diarization_result = diarization_pipeline(audio_path)

        # Load audio file
        audio, sr = librosa.load(audio_path, sr=16000)

        # Create a dictionary to store segments for each speaker
        speaker_segments = {}

        # Process diarization result
        for turn, _, speaker in diarization_result.itertracks(yield_label=True):
            # Extract audio segment for the current speaker
            start_sample = int(turn.start * sr)
            end_sample = int(turn.end * sr)

            # Skip invalid segments
            if start_sample >= end_sample or start_sample >= len(audio) or end_sample > len(audio):
                continue

            segment = audio[start_sample:end_sample]

            # Append segment to the speaker's dictionary
            if speaker not in speaker_segments:
                speaker_segments[speaker] = []
            speaker_segments[speaker].append({
                "start": turn.start,
                "end": turn.end,
                "audio": segment
            })

        # Save each speaker's segments as separate audio files for batch processing
        temp_segment_files = {}
        for speaker, segments in speaker_segments.items():
            speaker_transcripts = []

            # Process segments in batches to avoid too many API calls
            batch_size = 5  # Adjust as needed
            for i in range(0, len(segments), batch_size):
                batch = segments[i:i+batch_size]

                # Concatenate batch segments for efficiency
                batch_audio = np.concatenate([seg["audio"] for seg in batch])
                batch_file = f"/content/input/temp_speaker_{speaker}_batch_{i}.wav"
                sf.write(batch_file, batch_audio, sr)
                temp_segment_files[batch_file] = True

                # Use Whisper via Groq API
                if use_whisper_api:
                    # Upload audio file to a temporary storage service to get a URL
                    # For this example, we'll use local file and ASR model access
                    # In a real implementation, you'd use a proper ASR API service

                    # Simulate ASR API call
                    print(f"Transcribing batch for speaker {speaker}...")

                    # This is where you'd call an ASR API service
                    # For now, we're using local Whisper through HF transformers
                    from transformers import WhisperProcessor, WhisperForConditionalGeneration

                    processor = WhisperProcessor.from_pretrained("openai/whisper-small")
                    model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-small")

                    # Set language to Tamil
                    model.config.forced_decoder_ids = processor.get_decoder_prompt_ids(language="tamil", task="transcribe")

                    # Process audio
                    import torch
                    input_features = processor(batch_audio, sampling_rate=sr, return_tensors="pt").input_features

                    # Generate token ids
                    with torch.no_grad():
                        predicted_ids = model.generate(input_features, max_length=256)

                    # Decode the token ids to text
                    transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True, normalize=True)[0]

                else:
                    # Use local Whisper for fallback
                    transcription = "Transcription API unavailable. Use API method."

                # Calculate segment durations for the batch
                total_duration = sum(seg["end"] - seg["start"] for seg in batch)

                # Add the transcription to this speaker's list
                speaker_transcripts.append({
                    "start": batch[0]["start"],
                    "end": batch[-1]["end"],
                    "text": transcription
                })

            # Store this speaker's transcripts
            speaker_segments[speaker] = speaker_transcripts

        # Write transcripts to file
        with open(output_path, 'w', encoding='utf-8') as f:
            for speaker, transcripts in speaker_segments.items():
                f.write(f"Speaker {speaker}:\n")
                for transcript in transcripts:
                    f.write(f"[{transcript['start']:.2f} - {transcript['end']:.2f}] {transcript['text']}\n")
                f.write("\n")

        # Clean up temporary files
        for temp_file in temp_segment_files:
            if os.path.exists(temp_file):
                os.remove(temp_file)

        print(f"Transcript saved to {output_path}")
        return speaker_segments

    except Exception as e:
        print(f"Error in transcribe_with_speaker_diarization: {str(e)}")
        # Fall back to basic transcription without diarization
        return basic_transcription(audio_path, output_path)

def basic_transcription(audio_path, output_path):
    """
    Perform basic transcription without speaker diarization as a fallback
    """
    print("Falling back to basic transcription without speaker diarization...")

    try:
        # Load audio
        audio, sr = librosa.load(audio_path, sr=16000)

        # Process in chunks to avoid memory issues
        chunk_length_s = 30  # Process 30 seconds at a time
        chunk_length = chunk_length_s * sr

        chunks = [audio[i:i+chunk_length] for i in range(0, len(audio), chunk_length)]

        full_transcript = []

        # For demonstration purposes, using transformers locally
        # In production, replace this with proper ASR API calls
        from transformers import WhisperProcessor, WhisperForConditionalGeneration
        import torch

        processor = WhisperProcessor.from_pretrained("openai/whisper-small")
        model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-small")

        # Set language to Tamil
        model.config.forced_decoder_ids = processor.get_decoder_prompt_ids(language="tamil", task="transcribe")

        for i, chunk in enumerate(chunks):
            # Save chunk to temp file
            temp_chunk_file = f"/content/input/temp_chunk_{i}.wav"
            sf.write(temp_chunk_file, chunk, sr)

            # In production, upload this file to get a URL for your ASR API

            # For now, using local processing
            input_features = processor(chunk, sampling_rate=sr, return_tensors="pt").input_features

            # Generate token ids
            with torch.no_grad():
                predicted_ids = model.generate(input_features, max_length=256)

            # Decode the token ids to text
            transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True, normalize=True)[0]

            # Clean up temp file
            if os.path.exists(temp_chunk_file):
                os.remove(temp_chunk_file)

            start_time = i * chunk_length_s
            end_time = min((i + 1) * chunk_length_s, len(audio) / sr)

            full_transcript.append({
                "start": start_time,
                "end": end_time,
                "text": transcription
            })

        # Create a simple speaker transcript with all text assigned to one speaker
        speaker_transcripts = {"UNKNOWN": full_transcript}

        # Write transcript to file
        with open(output_path, 'w', encoding='utf-8') as f:
            f.write("Speaker UNKNOWN:\n")
            for transcript in full_transcript:
                f.write(f"[{transcript['start']:.2f} - {transcript['end']:.2f}] {transcript['text']}\n")

        print(f"Basic transcript saved to {output_path}")
        return speaker_transcripts

    except Exception as e:
        print(f"Error in basic_transcription: {str(e)}")
        # Create a minimal transcript to allow the pipeline to continue
        speaker_transcripts = {"UNKNOWN": [{"start": 0, "end": 1, "text": "Transcription failed."}]}

        with open(output_path, 'w', encoding='utf-8') as f:
            f.write("Speaker UNKNOWN:\n")
            f.write("[0.00 - 1.00] Transcription failed. Please check the audio file.")

        return speaker_transcripts

def process_video_input(video_path, output_path):
    """
    Extract audio from video and transcribe with speaker diarization
    """
    try:
        # Extract audio from video
        audio_path = video_path.rsplit('.', 1)[0] + '.wav'

        print("Extracting audio from video...")
        # Use ffmpeg to extract audio
        !ffmpeg -i "{video_path}" -vn -acodec pcm_s16le -ar 16000 -ac 1 "{audio_path}" -y -loglevel error

        if not os.path.exists(audio_path) or os.path.getsize(audio_path) == 0:
            raise Exception("Failed to extract audio from video")

        # Transcribe the extracted audio
        speaker_transcripts = transcribe_with_speaker_diarization(audio_path, output_path)

        return speaker_transcripts

    except Exception as e:
        print(f"Error processing video: {str(e)}")
        # Create a minimal transcript to allow the pipeline to continue
        speaker_transcripts = {"UNKNOWN": [{"start": 0, "end": 1, "text": "Video processing failed."}]}

        with open(output_path, 'w', encoding='utf-8') as f:
            f.write("Speaker UNKNOWN:\n")
            f.write("[0.00 - 1.00] Video processing failed. Please check the video file.")

        return speaker_transcripts

#######################
# STEP 2: Content Generation using Groq API
#######################

def generate_news_content(transcript, speaker_names=None):
    """
    Generate news content based on the transcript using Groq API

    Parameters:
    - transcript: Dictionary of speaker transcripts
    - speaker_names: Dictionary mapping speaker IDs to names
    """
    try:
        print("Setting up Groq API for content generation...")

        # If speaker names are not provided, ask for them
        if not speaker_names:
            speaker_names = {}
            print("\nEnter names for each speaker (or press Enter to use default):")
            for speaker in transcript.keys():
                name = input(f"Enter the name for Speaker {speaker}: ").strip()
                speaker_names[speaker] = name if name else f"Speaker {speaker}"

        # Format the transcript for the model
        formatted_transcript = ""
        for speaker, segments in transcript.items():
            speaker_name = speaker_names.get(speaker, f"Speaker {speaker}")
            for segment in segments:
                formatted_transcript += f"{speaker_name}: {segment['text']}\n"

        # Truncate transcript if it's too long
        if len(formatted_transcript) > 5000:
            print("Transcript is too long. Truncating to 5000 characters...")
            formatted_transcript = formatted_transcript[:5000] + "\n[Transcript truncated due to length]"

        generated_content = {}

        # Generate newspaper article
        print("Generating newspaper article...")
        article_prompt = (
            "You are a professional journalist. Write a formal newspaper article based on the following interview transcript. "
            "The article should be in code-mixed Tamil (mix of Tamil and English), formatted properly with a headline, "
            "introduction, body, and conclusion. Make it informative and engaging.\n\n"
            f"TRANSCRIPT:\n{formatted_transcript}\n\n"
            "Write the newspaper article:"
        )

        try:
            # Using Groq API to access LLaMA 3
            response = groq_client.chat.completions.create(
                model="llama3-70b-8192",  # Using LLaMA 3 70B model
                messages=[
                    {"role": "system", "content": "You are a skilled journalist who writes articles in code-mixed Tamil and English."},
                    {"role": "user", "content": article_prompt}
                ],
                temperature=0.7,
                max_tokens=1500,
                top_p=0.9
            )
            newspaper_article = response.choices[0].message.content
            generated_content["newspaper_article"] = newspaper_article
        except Exception as e:
            print(f"Error generating newspaper article: {str(e)}")
            generated_content["newspaper_article"] = "Failed to generate newspaper article."

        # Generate social media bite
        print("Generating social media post...")
        social_prompt = (
            "You are a social media content creator for a news channel. Write a short, engaging social media post "
            "(around 280 characters) based on the following interview transcript. The post should be in code-mixed Tamil "
            "(mix of Tamil and English) and should capture the essence of the interview.\n\n"
            f"TRANSCRIPT:\n{formatted_transcript}\n\n"
            "Write the social media post:"
        )

        try:
            response = groq_client.chat.completions.create(
                model="llama3-70b-8192",
                messages=[
                    {"role": "system", "content": "You are a social media content creator who writes in code-mixed Tamil and English."},
                    {"role": "user", "content": social_prompt}
                ],
                temperature=0.8,
                max_tokens=300,
                top_p=0.9
            )
            social_media_bite = response.choices[0].message.content
            generated_content["social_media_bite"] = social_media_bite
        except Exception as e:
            print(f"Error generating social media post: {str(e)}")
            generated_content["social_media_bite"] = "Failed to generate social media post."

        # Generate news reader script
        print("Generating news reader script...")
        script_prompt = (
            "You are a script writer for a news channel. Write a script for news readers based on the following "
            "interview transcript. The script should be in code-mixed Tamil (mix of Tamil and English) and should include "
            "prompts for two news readers (Anchor 1 and Anchor 2) to read alternately.\n\n"
            f"TRANSCRIPT:\n{formatted_transcript}\n\n"
            "Write the news reader script:"
        )

        try:
            response = groq_client.chat.completions.create(
                model="llama3-70b-8192",
                messages=[
                    {"role": "system", "content": "You are a script writer who writes in code-mixed Tamil and English."},
                    {"role": "user", "content": script_prompt}
                ],
                temperature=0.7,
                max_tokens=1500,
                top_p=0.9
            )
            news_reader_script = response.choices[0].message.content
            generated_content["news_reader_script"] = news_reader_script
        except Exception as e:
            print(f"Error generating news reader script: {str(e)}")
            generated_content["news_reader_script"] = "Failed to generate news reader script."

        print("Content generation completed successfully.")
        return generated_content, speaker_names

    except Exception as e:
        print(f"Error in generate_news_content: {str(e)}")
        # Return minimal content to allow the pipeline to continue
        return {
            "newspaper_article": "Content generation failed. Please check the model and transcript.",
            "social_media_bite": "Content generation failed.",
            "news_reader_script": "Content generation failed."
        }, speaker_names or {"UNKNOWN": "Unknown Speaker"}

def save_generated_content(generated_content, output_path):
    """
    Save generated content to files
    """
    try:
        # Create output directory if it doesn't exist
        os.makedirs(os.path.dirname(output_path), exist_ok=True)

        # Save newspaper article
        article_path = f"{output_path}_article.txt"
        with open(article_path, 'w', encoding='utf-8') as f:
            f.write(generated_content["newspaper_article"])

        # Save social media bite
        social_path = f"{output_path}_social.txt"
        with open(social_path, 'w', encoding='utf-8') as f:
            f.write(generated_content["social_media_bite"])

        # Save news reader script
        script_path = f"{output_path}_script.txt"
        with open(script_path, 'w', encoding='utf-8') as f:
            f.write(generated_content["news_reader_script"])

        print(f"Generated content saved to {output_path}_*.txt")
        return True

    except Exception as e:
        print(f"Error saving generated content: {str(e)}")
        return False

# Function to download files from Colab
def download_generated_files(base_path):
    """
    Download generated files from Colab

    Parameters:
    - base_path: Base path of the generated files
    """
    try:
        print("Downloading generated files...")
        files_to_download = [
            f"{base_path}_article.txt",
            f"{base_path}_social.txt",
            f"{base_path}_script.txt",
            f"{base_path}_speakers.json"
        ]

        for file_path in files_to_download:
            if os.path.exists(file_path):
                files.download(file_path)
                print(f"Downloaded {file_path}")
            else:
                print(f"File {file_path} not found")

    except Exception as e:
        print(f"Error downloading files: {str(e)}")
        print("You can manually download the files from the Colab file browser.")

#######################
# Main Function to Run the Complete Pipeline
#######################

def run_ai_news_pipeline():
    """
    Complete pipeline for AI News Company using Groq API
    """
    try:
        print("\n======= AI News Company Pipeline with Groq API =======")
        print("Step 1: Speech-to-Text Conversion with Speaker Diarization")

        # Upload input file
        input_path = upload_file_to_colab()
        transcript_path = "/content/output/transcript.txt"

        # Check if Groq API key is set
        if not GROQ_API_KEY:
            print("Warning: Groq API key not set. Some features may not work correctly.")

        # Process input file
        if input_path.lower().endswith(('.mp4', '.avi', '.mov', '.mkv')):
            print("Processing video input...")
            speaker_transcripts = process_video_input(input_path, transcript_path)
        else:
            print("Processing audio input...")
            speaker_transcripts = transcribe_with_speaker_diarization(input_path, transcript_path)

        # Check if transcription succeeded
        if not speaker_transcripts or all(len(segments) == 0 for speaker, segments in speaker_transcripts.items()):
            print("Transcription failed or produced empty results. Please check your audio/video file.")
            return

        print("\nStep 2: Content Generation using Groq API")
        # Generate content
        output_path = "/content/output/news_content"
        generated_content, speaker_names = generate_news_content(speaker_transcripts)

        # Save generated content
        success = save_generated_content(generated_content, output_path)

        if success:
            # Save speaker names for future reference
            speaker_names_path = f"{output_path}_speakers.json"
            with open(speaker_names_path, 'w', encoding='utf-8') as f:
                json.dump(speaker_names, f, ensure_ascii=False, indent=2)

            print("\n======= Pipeline Completed Successfully! =======")
            print(f"Transcript saved to: {transcript_path}")
            print(f"Newspaper article saved to: {output_path}_article.txt")
            print(f"Social media content saved to: {output_path}_social.txt")
            print(f"News reader script saved to: {output_path}_script.txt")
            print(f"Speaker names saved to: {output_path}_speakers.json")

            # Download all generated files
            download_generated_files(output_path)
            files.download(transcript_path)
        else:
            print("\n======= Pipeline Completed with Errors =======")
            print("Some files may have been generated. Check the Colab file browser.")

    except Exception as e:
        print(f"\nError running the pipeline: {str(e)}")
        print("Pipeline execution failed. Please check the error messages above.")

# Function to check Groq API access and start the pipeline
def start_pipeline():
    """
    Start the AI News Company pipeline with Groq API
    """
    print("AI News Company Pipeline with Groq API")
    print("Checking Groq API access...")

    try:
        # Test Groq API connection
        test_response = groq_client.chat.completions.create(
            model="llama3-8b-8192",  # Using smaller model for quick test
            messages=[
                {"role": "user", "content": "Hello, can you hear me?"}
            ],
            max_tokens=10
        )
        print("Groq API connection successful!")
    except Exception as e:
        print(f"Warning: Could not connect to Groq API: {str(e)}")
        print("The pipeline will continue but may fall back to local processing.")

    print("\nRunning AI News pipeline with Groq API...")
    run_ai_news_pipeline()

# Run the pipeline
if __name__ == "__main__":
    start_pipeline()

AI News Company Pipeline with Groq API
Checking Groq API access...
Groq API connection successful!

Running AI News pipeline with Groq API...

Step 1: Speech-to-Text Conversion with Speaker Diarization
Please upload your audio or video file...


Saving Actor Vijay Antony latest fun interview Hitler Sunnews.wav to Actor Vijay Antony latest fun interview Hitler Sunnews (1).wav
File saved to /content/input/Actor Vijay Antony latest fun interview Hitler Sunnews (1).wav
Processing audio input...
Loading speaker diarization model...
Error in transcribe_with_speaker_diarization: Token is required (`token=True`), but no token found. You need to provide a token or be logged in to Hugging Face with `huggingface-cli login` or `huggingface_hub.login`. See https://huggingface.co/settings/tokens.
Falling back to basic transcription without speaker diarization...


preprocessor_config.json:   0%|          | 0.00/185k [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/283k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/836k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.48M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/494k [00:00<?, ?B/s]

normalizer.json:   0%|          | 0.00/52.7k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/34.6k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.19k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.97k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/967M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/3.87k [00:00<?, ?B/s]

Due to a bug fix in https://github.com/huggingface/transformers/pull/28687 transcription using a multilingual Whisper will default to language detection followed by transcription instead of translation to English.This might be a breaking change for your use case. If you want to instead always translate your audio to English, make sure to pass `language='en'`.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


Basic transcript saved to /content/output/transcript.txt

Step 2: Content Generation using Groq API
Setting up Groq API for content generation...

Enter names for each speaker (or press Enter to use default):
Enter the name for Speaker UNKNOWN: antony
Generating newspaper article...
Generating social media post...
Generating news reader script...
Content generation completed successfully.
Generated content saved to /content/output/news_content_*.txt

Transcript saved to: /content/output/transcript.txt
Newspaper article saved to: /content/output/news_content_article.txt
Social media content saved to: /content/output/news_content_social.txt
News reader script saved to: /content/output/news_content_script.txt
Speaker names saved to: /content/output/news_content_speakers.json
Downloading generated files...


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Downloaded /content/output/news_content_article.txt


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Downloaded /content/output/news_content_social.txt


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Downloaded /content/output/news_content_script.txt


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Downloaded /content/output/news_content_speakers.json


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>