# Question: 1

In this question, you are tasked with enhancing the resolution of a video. The goal is to improve the quality of individual frames. You are expected to use basic algorithms for achieving this goal. 

### Task 1: Frame Extraction

Extract frames from the video using OpenCV.

### Task 2: Resolution Enhancement

Apply the following enhancement algorithms to scale the extracted frames by a factor of 2:

1) Nearest-neighbor Interpolation <br>
2) Bilinear Interpolation <br>
3) Bicubic Interpolation <br>

Explore these approaches by your self. These are just builtin parameters in resize function.
https://theailearner.com/2018/11/15/image-interpolation-using-opencv-python/

### Task 3: Video Reconstruction

After enhancing the frames, reconstruct the video by merging the enhanced frames while ensuring that the frame rate of the reconstructed video matches that of the original video. Generate a separate video for each interpolation method.

<b>Bonus</b>: Apply a self-selected algorithm to improve video quality. 

In [1]:
#Creating frames in frame file
import cv2
import os

# Video file path
video_path = 'Q1.mp4'  # Replace with the path to your video file

# Output folder for frames
output_folder = 'frames_output'  # Replace with the path to your output folder

#Create the output folder if it doesn't exist
if not os.path.exists(output_folder):
    os.makedirs(output_folder)

# Open the video file
cap = cv2.VideoCapture(video_path)

# Initialize frame count
frame_count = 0

while True:
    # Read a frame from the video
    ret, frame = cap.read()

    if not ret:
        break

    # Define the filename for the frame
    frame_filename = f'{output_folder}/frame_{frame_count:04d}.jpg'

    # Save the frame as an image file
    cv2.imwrite(frame_filename, frame)

    # Increment the frame count
    frame_count += 1

# Release the video capture object
cap.release()

print(f'Frames extracted and saved to {output_folder}')


Frames extracted and saved to frames_output


In [3]:
#Resolution enhancement and saving into other file 

import cv2
import os

# Define the path to the folder containing enhanced frames for each interpolation method
interpolation_methods = ['nearest', 'bilinear', 'bicubic']  # Update with your methods
output_filenames = ['enhanced_nearest.avi', 'enhanced_bilinear.avi', 'enhanced_bicubic.avi']  # Output video filenames

for method, output_filename in zip(interpolation_methods, output_filenames):
    frames_folder = f'D:/Semester 2/Programming For AI(python)/Assignment 5/nearest_frames'
    frame_files = [f for f in os.listdir(frames_folder) if f.endswith('.png')]

    #Reconstruction of video 

    
    # Assuming all frames have the same dimensions (use the dimensions from the first frame)
    sample_frame = cv2.imread(os.path.join(frames_folder, frame_files[0]))
    height, width, _ = sample_frame.shape

    fourcc = cv2.VideoWriter_fourcc(*'XVID')
    fps = 30 # Update with the actual frame rate of your original video
    output_video = cv2.VideoWriter(output_filename, fourcc, fps, (width, height))

    for frame_file in frame_files:
        frame = cv2.imread(os.path.join(frames_folder, frame_file))
        output_video.write(frame)

    output_video.release()



In [27]:
#Bonus MARKS

import cv2
import os

# Function to enhance a video using the nearest-neighbor interpolation method
def enhance_video(method):
    if method != 'NNInter':
        raise ValueError("Invalid method provided. Use 'NNInter'.")

    # Open the video file for processing
    video_path = 'Q1.mp4'
    cap = cv2.VideoCapture(video_path)
    fps = int(cap.get(5))

    # Create a folder to store the enhanced frames
    output_path = f'{method} frames'
    os.makedirs(output_path, exist_ok=True)
    frame_number = 0

    # Loop through each frame in the video
    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break

        # Enhance the frame by resizing it with nearest-neighbor interpolation
        enhanced_frame = cv2.resize(frame, None, fx=2, fy=2, interpolation=cv2.INTER_NEAREST)
        frame_number += 1

        # Save the enhanced frame to a file
        frame_filename = f"{output_path}/frame_{frame_number:04d}.png"
        cv2.imwrite(frame_filename, enhanced_frame)

    # Release the video capture object
    cap.release()
    cv2.destroyAllWindows()

    # Process the enhanced frames to create an output video
    input_folder = output_path
    output_video_path = f'{method}.mp4'

    # Get the dimensions of a sample frame in the folder
    sample_frame = cv2.imread(os.path.join(input_folder, os.listdir(input_folder)[0]))
    frame_height, frame_width, _ = sample_frame.shape

    # Define the codec for the output video and create a VideoWriter
    fourcc = cv2.VideoWriter_fourcc(*'mp4v')
    out = cv2.VideoWriter(output_video_path, fourcc, fps, (frame_width, frame_height))

    # Write the enhanced frames to the output video
    for filename in sorted(os.listdir(input_folder), key=lambda x: int(x.split('_')[1].split('.')[0])):
        frame = cv2.imread(os.path.join(input_folder, filename))
        out.write(frame)

    # Release the output video
    out.release()
    cv2.destroyAllWindows()

# Example usage with 'NNInter' method
enhance_video('NNInter')


# Question: 2

In this question, you are tasked with enhancing the audio quality of the video. Follow the given procedure to increase audio quality.

### Step 1: Short-Time Fourier Transform (STFT)
Compute the Short-Time Fourier Transform (STFT) of the audio signal. This operation transforms the audio into the frequency domain over short time intervals.

### Step 2: Magnitude and Phase Extraction
From the STFT, get the magnitude and phase using the np.abs() and np.angle() functions.

### Step 3: Noise Profile Creation
Load the noisy audio and calculate its STFT and magnitude from the STFT. Afterward, compute the average magnitude of the audio along axis=1 to generate a noise profile. This profile is essential for identifying and removing noise.

### Step 4: Adjusting with a Hyperparameter
Multiply the noise profile array by a hyperparameter represented as alpha. Experiment with various values of alpha to fine-tune the results. A good starting point is to set alpha to 2.

### Step 5: Audio Denoising
Subtract the mean noise array from the original audio (You may need to adjust the dimensions of the mean noise array to match with original audio). Ensure that any negative values in the resulting array are replaced with 0. This step effectively reduces noise in the audio.

### Step 6: Incorporating Phase Information
Multiply the modified audio by the complex exponential of the phase information obtained in step 3, which can be expressed as np.exp(1.0j * phase).

### Step 7: Inverse Short-Time Fourier Transform (ISTFT)
Reconstruct the audio by performing the Inverse Short Time Fourier Transform (ISTFT) on the modified audio signal using librosa. Save the resulting audio file.

In [5]:
import numpy as np
import soundfile as sf
import librosa

def enhance_audio(noisy_audio_path, clean_audio_path, alpha=2):
    # Load noisy and clean audio
    noisy_audio, sr = librosa.load(noisy_audio_path)
    clean_audio, _ = librosa.load(clean_audio_path, sr=sr)

    # Step 1: Short-Time Fourier Transform (STFT)
    noisy_stft = librosa.stft(noisy_audio)
    clean_stft = librosa.stft(clean_audio)

    # Step 2: Magnitude and Phase Extraction
    clean_magnitude = np.abs(clean_stft)
    phase = np.angle(clean_stft)
    
    # Step 3: Noise Profile Creation
    noise_magnitude = np.abs(noisy_stft)
    noise_profile = np.mean(noise_magnitude, axis=1)

    # Step 4: Adjusting with a Hyperparameter
    adjusted_noise_profile = alpha * noise_profile

    # Step 5: Audio Denoising
    enhanced_magnitude = clean_magnitude - adjusted_noise_profile[:, np.newaxis]
    enhanced_magnitude = np.maximum(enhanced_magnitude, 0)

    # Step 6: Incorporating Phase Information
    enhanced_stft = enhanced_magnitude * np.exp(1.0j * phase)

    # Step 7: Inverse Short-Time Fourier Transform (ISTFT)
    enhanced_audio = librosa.istft(enhanced_stft)

    # Save the resulting audio file
    enhanced_path = 'enhanced_audio.wav'
    sf.write(enhanced_path, enhanced_audio, sr)

    return enhanced_path

# Example usage
noisy_audio_path = 'Q2-Noise.wav'
clean_audio_path = 'Q2.wav'
enhanced_audio_path = enhance_audio(noisy_audio_path, clean_audio_path, alpha=4)
print(f"Enhanced audio saved at: {enhanced_audio_path}")


[[ 3.1415927   3.1415927   3.1415927  ...  0.          0.
   0.        ]
 [-1.5985851  -0.03730292 -1.7571936  ...  0.          0.
   0.        ]
 [-0.05547407  3.0678165  -0.16213235 ... -0.         -0.
  -0.        ]
 ...
 [-2.0734298  -0.78755456  0.16003922 ...  0.          0.
   0.        ]
 [ 0.6399098  -0.6499279   0.01788781 ...  0.          0.
   0.        ]
 [ 3.1415927   3.1415927   0.         ...  0.          0.
   0.        ]]
Enhanced audio saved at: enhanced_audio.wav


In [17]:
print('Real Audio')
IPython.display.Audio(f'Q2.wav')

Real Audio


In [18]:
print('Audio Completely Enhanced')
IPython.display.Audio(f'enhanced_audio.wav')

Audio Completely Enhanced


# Question: 3

For this task, use whisper inference to generate text from the audio file. Use any translation library to translate the text into another language, and then utilize a TTS system to produce audio from the translated text. Supported Languages are :English, Urdu, Arabic

In [6]:
import speech_recognition as sr

sound = "Q3.wav"
r = sr.Recognizer()

with sr.AudioFile(sound) as source:
    audio = r.record(source)  
string = []
print("Converting Audio File ......")
try:
    print("Converted Audio is: \n" + r.recognize_google(audio))
    string.append(r.recognize_google(audio))
except Exception as exceptional_case:
    
    print(exceptional_case)
result_string = " ".join(string)


Converting Audio File ......
Converted Audio is: 
for this task use whisper inference to generate text from my audio file use any translation library to translate the text into another language and then utilise a TTS system to produce ODI from the translated text


In [7]:
from translate import Translator

translator_1 = Translator(to_lang="ur")  
translator_2 = Translator(to_lang="ar")
translated_text_urdu = translator_1.translate(result_string)
translated_text_arabic= translator_2.translate(result_string)
print(f"Translated text in urdu : {translated_text_urdu}")
print(f"Translated text in arabic: {translated_text_arabic}")

Translated text in urdu : اس کام کے لئے میری آڈیو فائل سے متن پیدا کرنے کے لئے سرگوشی کا استعمال کریں متن کو کسی دوسری زبان میں ترجمہ کرنے کے لئے کسی بھی ترجمے کی لائبریری کا استعمال کریں اور پھر ترجمہ شدہ متن سے ون ڈے تیار کرنے کے لئے ٹی ٹی ایس سسٹم کا استعمال کریں
Translated text in arabic: لهذه المهمة استخدم استدلال الهمس لإنشاء نص من ملفي الصوتي استخدم أي مكتبة ترجمة لترجمة النص إلى لغة أخرى ثم استخدم نظام TTS لإنتاج ODI من النص المترجم


In [9]:
from translate import Translator
from gtts import gTTS
tts_urdu = gTTS(text=translated_text_urdu, lang="ur")
tts_arabic = gTTS(text=translated_text_arabic, lang="ar")

# Save audio files
tts_urdu.save("translated_urdu.mp3")
tts_arabic.save("translated_arabic.mp3")

In [11]:
print('Englih Audio')
IPython.display.Audio(f'Q3.wav')

Englih Audio


In [14]:
print('Arabic Translate Audio')
IPython.display.Audio(f'translated_arabic.mp3')

Arabic Translate Audio


In [13]:
print('Urdu Translate Audio')
IPython.display.Audio(f'translated_urdu.mp3')

Urdu Translate Audio
