<a href="https://colab.research.google.com/github/Sadhu2005/VaniSetu/blob/develop/VaniSetuAI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# In a Colab cell
!git clone https://github.com/Sadhu2005/VaniSetu.git

Cloning into 'VaniSetu'...
remote: Enumerating objects: 38, done.[K
remote: Counting objects: 100% (38/38), done.[K
remote: Compressing objects: 100% (25/25), done.[K
remote: Total 38 (delta 10), reused 27 (delta 4), pack-reused 0 (from 0)[K
Receiving objects: 100% (38/38), 20.25 KiB | 20.25 MiB/s, done.
Resolving deltas: 100% (10/10), done.


In [2]:
# In a Colab cell
%cd VaniSetu


/content/VaniSetu


In [11]:
print("--> Step 1: Installing Google's Generative AI library...")
!pip install -q -U google-generativeai
print("--> Installation complete.")
print("-" * 30)

--> Step 1: Installing Google's Generative AI library...
--> Installation complete.
------------------------------


In [15]:
import google.generativeai as genai
from google.colab import userdata

# IMPORTANT: To keep your key secret, click the "Key" icon (🔑) on the left
# side of Colab, add a new secret named "GOOGLE_API_KEY", and paste your key there.
try:
    api_key = userdata.get('GOOGLE_API_KEY')
    genai.configure(api_key=api_key)
    print("--> Step 2: API Key configured successfully.")
except userdata.SecretNotFoundError as e:
    print("--> SECRET NOT FOUND: Please add your GOOGLE_API_KEY to Colab Secrets (🔑).")
print("-" * 30)

--> Step 2: API Key configured successfully.
------------------------------


In [16]:
model = genai.GenerativeModel('gemini-1.5-flash')

# The English text from our Whisper test
english_text = "Hello, this is a test of the VaniSetu project. We are checking if the Whisper model can transcribe this audio correctly."

# This is our specific instruction to the model
prompt = f"""
Translate the following English text into conversational Hinglish for a video dubbing project.
It is very important that you DO NOT translate the following types of words:
- Proper nouns (like 'VaniSetu', 'Whisper')
- Common technical and English words (like 'project', 'model', 'transcribe', 'audio')

Keep them in their original English form.

English Text: "{english_text}"
Hinglish Translation:
"""

print("--> Step 3: Sending prompt to the Gemini model...")
response = model.generate_content(prompt)
hinglish_text = response.text
print("--> Translation complete.")
print("-" * 30)

--> Step 3: Sending prompt to the Gemini model...
--> Translation complete.
------------------------------


In [17]:
print("✅ Smart Translation Complete! Here are the results:\n")
print(f"English Input   : {english_text}")
print(f"Hinglish Output : {hinglish_text}")

✅ Smart Translation Complete! Here are the results:

English Input   : Hello, this is a test of the VaniSetu project. We are checking if the Whisper model can transcribe this audio correctly.
Hinglish Output : Hello, yeh VaniSetu project ka ek test hai.  Hum check kar rahe hain ki Whisper model yeh audio sahi se transcribe kar pata hai ki nahin.



In [28]:
print("--> Step 1: Installing gTTS...")
!pip install -q gTTS
print("--> Installation complete.")
print("-" * 30)

--> Step 1: Installing gTTS...
--> Installation complete.
------------------------------


In [29]:
from gtts import gTTS
import IPython.display as ipd

# The Hinglish text from our last step
hinglish_text = "Hello, yeh VaniSetu project ka ek test hai. Hum check kar rahe hain ki Whisper model yeh audio sahi se transcribe kar pata hai ki nahin."
output_audio_path = "final_dubbed_gtts.mp3"

print("--> Step 2: Generating audio using gTTS...")
# We set the language to Hindi ('hi'). gTTS is good at pronouncing
# English words (like 'Hello', 'project') with an Indian accent.
tts = gTTS(text=hinglish_text, lang='hi', slow=False)
tts.save(output_audio_path)
print("--> Audio generation complete.")
print("-" * 30)

--> Step 2: Generating audio using gTTS...
--> Audio generation complete.
------------------------------


In [31]:
print("✅ gTTS Test Complete! Listen to the final result below:")
ipd.Audio(output_audio_path)

✅ gTTS Test Complete! Listen to the final result below:


In [36]:
print("--> Installing all libraries...")
# We add nest_asyncio to fix the event loop conflict
!pip install -q fastapi uvicorn pyngrok google-generativeai openai-whisper gTTS nest_asyncio
print("--> Installation complete.")

--> Installing all libraries...
--> Installation complete.


In [37]:
import nest_asyncio
import os
import whisper
import google.generativeai as genai
from gtts import gTTS
from fastapi import FastAPI, File, UploadFile
from fastapi.responses import FileResponse
from google.colab import userdata

# This is the crucial fix
nest_asyncio.apply()

# --- App Initialization ---
app = FastAPI(title="VaniSetu API")

# --- AI Model & Client Initialization ---
print("--> Loading AI models...")
# 1. Load Whisper Model
whisper_model = whisper.load_model("base")

# 2. Configure Gemini Model
try:
    api_key = userdata.get('GOOGLE_API_KEY')
    genai.configure(api_key=api_key)
    gemini_model = genai.GenerativeModel('gemini-1.5-flash')
    print("--> AI Models loaded successfully.")
except Exception as e:
    print(f"Error loading models: {e}")
    gemini_model = None

# --- API Endpoints ---
@app.get("/")
def read_root():
    return {"status": "ok", "message": "VaniSetu API is running!"}

@app.post("/api/v1/dub-audio")
async def dub_audio_pipeline(audio: UploadFile = File(...)):
    temp_audio_path = f"temp_{audio.filename}"
    with open(temp_audio_path, "wb") as buffer:
        buffer.write(await audio.read())

    transcription_result = whisper_model.transcribe(temp_audio_path, fp16=False)
    english_text = transcription_result["text"]

    if not gemini_model: return {"error": "Gemini model not configured"}

    prompt = f"Translate the following English text into conversational Hinglish for a video dubbing project. Keep proper nouns and technical words like 'project' or 'model' in English. Text: \"{english_text}\""
    response = gemini_model.generate_content(prompt)
    hinglish_text = response.text.strip()

    output_audio_path = f"dubbed_{audio.filename}.mp3"
    tts = gTTS(text=hinglish_text, lang='hi', slow=False)
    tts.save(output_audio_path)

    os.remove(temp_audio_path)
    return FileResponse(path=output_audio_path, media_type="audio/mpeg", filename=output_audio_path)


--> Loading AI models...
--> AI Models loaded successfully.


In [39]:
print("--> Installing all libraries...")
!pip install -q fastapi uvicorn pyngrok google-generativeai openai-whisper gTTS nest_asyncio
print("--> Installation complete.")

--> Installing all libraries...


ERROR:asyncio:Task exception was never retrieved
future: <Task finished name='Task-1' coro=<Server.serve() done, defined at /usr/local/lib/python3.12/dist-packages/uvicorn/server.py:69> exception=KeyboardInterrupt()>
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/uvicorn/main.py", line 580, in run
    server.run()
  File "/usr/local/lib/python3.12/dist-packages/uvicorn/server.py", line 67, in run
    return asyncio.run(self.serve(sockets=sockets))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/nest_asyncio.py", line 30, in run
    return loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/nest_asyncio.py", line 92, in run_until_complete
    self._run_once()
  File "/usr/local/lib/python3.12/dist-packages/nest_asyncio.py", line 133, in _run_once
    handle._run()
  File "/usr/lib/python3.12/asyncio/events.py", line 88, in _run
    se

--> Installation complete.


In [40]:
import nest_asyncio
import os
import whisper
import google.generativeai as genai
from gtts import gTTS
from fastapi import FastAPI, File, UploadFile
from fastapi.responses import FileResponse
from fastapi.middleware.cors import CORSMiddleware # Import the CORS middleware
from google.colab import userdata

# This is the crucial fix for the asyncio error
nest_asyncio.apply()

# --- App Initialization ---
app = FastAPI(title="VaniSetu API")

# --- THIS IS THE NEW CODE TO FIX THE CORS ERROR ---
# We add the CORS middleware to allow cross-origin requests
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],  # Allows all origins
    allow_credentials=True,
    allow_methods=["*"],  # Allows all methods
    allow_headers=["*"],  # Allows all headers
)
# --- END OF NEW CODE ---


# --- AI Model & Client Initialization ---
print("--> Loading AI models...")
# (The rest of your code is the same)
whisper_model = whisper.load_model("base")

try:
    api_key = userdata.get('GOOGLE_API_KEY')
    genai.configure(api_key=api_key)
    gemini_model = genai.GenerativeModel('gemini-1.5-flash')
    print("--> AI Models loaded successfully.")
except Exception as e:
    print(f"Error loading models: {e}")
    gemini_model = None

# --- API Endpoints ---
@app.get("/")
def read_root():
    return {"status": "ok", "message": "VaniSetu API is running!"}

@app.post("/api/v1/dub-audio")
async def dub_audio_pipeline(audio: UploadFile = File(...)):
    temp_audio_path = f"temp_{audio.filename}"
    with open(temp_audio_path, "wb") as buffer:
        buffer.write(await audio.read())

    transcription_result = whisper_model.transcribe(temp_audio_path, fp16=False)
    english_text = transcription_result["text"]

    if not gemini_model: return {"error": "Gemini model not configured"}

    prompt = f"Translate the following English text into conversational Hinglish for a video dubbing project. Keep proper nouns and technical words like 'project' or 'model' in English. Text: \"{english_text}\""
    response = gemini_model.generate_content(prompt)
    hinglish_text = response.text.strip()

    output_audio_path = f"dubbed_{audio.filename}.mp3"
    tts = gTTS(text=hinglish_text, lang='hi', slow=False)
    tts.save(output_audio_path)

    os.remove(temp_audio_path)
    return FileResponse(path=output_audio_path, media_type="audio/mpeg", filename=output_audio_path)


--> Loading AI models...
--> AI Models loaded successfully.


In [41]:
import uvicorn
from pyngrok import ngrok

print("--> Launching server with ngrok...")
try:
    ngrok.set_auth_token(userdata.get('NGROK_AUTHTOKEN'))
    public_url = ngrok.connect(8000)
    print(f"✅ Your VaniSetu API is live at: {public_url}")
    uvicorn.run(app, port=8000)
except Exception as e:
    print(f"Error launching server: {e}. Did you add your NGROK_AUTHTOKEN to Colab Secrets?")

--> Launching server with ngrok...
✅ Your VaniSetu API is live at: NgrokTunnel: "https://b856a0eec2a2.ngrok-free.app" -> "http://localhost:8000"


INFO:     Started server process [544]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)


INFO:     14.194.76.34:0 - "OPTIONS / HTTP/1.1" 200 OK
INFO:     14.194.76.34:0 - "GET / HTTP/1.1" 200 OK
INFO:     14.194.76.34:0 - "OPTIONS /api/v1/dub-audio HTTP/1.1" 200 OK
INFO:     14.194.76.34:0 - "POST /api/v1/dub-audio HTTP/1.1" 200 OK
INFO:     14.194.76.34:0 - "OPTIONS /api/v1/dub-audio HTTP/1.1" 200 OK
INFO:     14.194.76.34:0 - "POST /api/v1/dub-audio HTTP/1.1" 200 OK
INFO:     14.194.76.34:0 - "OPTIONS //api/v1/dub-audio HTTP/1.1" 200 OK
INFO:     14.194.76.34:0 - "POST //api/v1/dub-audio HTTP/1.1" 404 Not Found
INFO:     14.194.76.34:0 - "POST //api/v1/dub-audio HTTP/1.1" 404 Not Found
INFO:     14.194.76.34:0 - "POST //api/v1/dub-audio HTTP/1.1" 404 Not Found
INFO:     14.194.76.34:0 - "POST //api/v1/dub-audio HTTP/1.1" 404 Not Found
INFO:     14.194.76.34:0 - "OPTIONS /api/v1/dub-audio HTTP/1.1" 200 OK
INFO:     14.194.76.34:0 - "POST /api/v1/dub-audio HTTP/1.1" 200 OK
INFO:     14.194.76.34:0 - "POST /api/v1/dub-audio HTTP/1.1" 200 OK
INFO:     14.194.76.34:0 - "POST 

INFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [544]


In [43]:
%ls


[0m[01;34mbackend[0m/                        LICENSE           test_audio.mp3
dubbed_captured_audio.webm.mp3  README.md
final_dubbed_gtts.mp3           requirements.txt


In [44]:
%cd ..


/content


In [45]:
%ls

[0m[01;34msample_data[0m/  [01;34mVaniSetu[0m/


In [46]:
%cd VaniSetu

/content/VaniSetu


In [48]:
!git branch

* [32mmain[m


In [53]:
!git checkout develop

Already on 'develop'
Your branch is up to date with 'origin/develop'.


In [54]:
!git add .
!git commit -m "feat: Complete end-to-end AI pipeline prototype in Colab"
!git push

Author identity unknown

*** Please tell me who you are.

Run

  git config --global user.email "you@example.com"
  git config --global user.name "Your Name"

to set your account's default identity.
Omit --global to set the identity only in this repository.

fatal: unable to auto-detect email address (got 'root@e122b28a68e9.(none)')
fatal: could not read Username for 'https://github.com': No such device or address


In [56]:
!git config --global user.email "sadhuj2005@gmail.com"

In [57]:
!git config --global user.name "sadhu2005"

In [58]:
!git add .
!git commit -m "feat: Complete end-to-end AI pipeline prototype in Colab"
!git push

[develop 6f78d8b] feat: Complete end-to-end AI pipeline prototype in Colab
 3 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 dubbed_captured_audio.webm.mp3
 create mode 100644 final_dubbed_gtts.mp3
 create mode 100644 test_audio.mp3
fatal: could not read Username for 'https://github.com': No such device or address
