In [1]:
from packages.transcripts import transcribe_with_timestamps
from packages.PII_detection import detect_pii_entities
from packages.mapTimeStamps import map_pii_to_timestamps
from packages.saveRedactedAudo import redact_audio

In [2]:
input_audio_path = "inputs/in-70001-919417328881-20250530-160219-1748601139.42162.wav"

### üéôÔ∏è Step 1: Transcribe Hindi Audio with Word-Level Timestamps

This function uses Azure's Speech SDK to transcribe a WAV audio file in Hindi (`hi-IN`) and extract:

- Recognized speech as text.
- Word-level timestamps including start and end time (in seconds).

The results are collected and returned in a structured dictionary with `text` and `chunks`.


In [None]:
transcript = transcribe_with_timestamps(input_audio_path)
print(transcript)

{'text': '‡§π‡•à‡§≤‡•ã ‡§π‡§æ‡§Å ‡§ú‡•Ä ‡§π‡§æ‡§Å ‡§ú‡•Ä ‡§∏‡§∞ ‡§´‡§º‡•ã‡§® ‡§Ü‡§Ø‡§æ ‡§•‡§æ, ‡§´‡§º‡•ã‡§® ‡§Ü‡§Ø‡§æ ‡§•‡§æ, ‡§è‡§ï ‡§∏‡•á‡§ï‡§Ç‡§° ‡§π‡•ã‡§≤‡•ç‡§° ‡§ï‡§∞‡•ã ‡§ú‡•Ä ‡§Æ‡•à‡§Ç ‡§®‡§æ‡§Æ ‡§ú‡§æ‡§® ‡§∏‡§ï‡§§‡•Ä ‡§π‡•Ç‡§Å ‡§•‡•ã‡§°‡§º‡§æ ‡§∞‡§æ‡§ú‡•á‡§Ç‡§¶‡•ç‡§∞ ‡§∏‡§ø‡§Ç‡§π ‡§ú‡•Ä‡•§ ‡§∞‡§æ‡§ú‡•á‡§Ç‡§¶‡•ç‡§∞ ‡§∏‡§ø‡§Ç‡§π, ‡§∞‡§æ‡§ú‡•á‡§Ç‡§¶‡•ç‡§∞ ‡§∏‡§ø‡§Ç‡§π, ‡§∞‡§æ‡§ú‡•á‡§Ç‡§¶‡•ç‡§∞‡•§ ‡§è‡§≤‡§è ‡§ú‡•á ‡§Ü‡§à ‡§è‡§®‡§°‡•Ä ‡§µ‡§ø ‡§Ü‡§∞ ‡§ì‡§ï‡•á ‡§ì‡§ï‡•á ‡§ì‡§ï‡•á ‡§ì‡§ï‡•á ‡§∞‡§æ‡§ú‡•á‡§Ç‡§¶‡•ç‡§∞ ‡§∏‡§ø‡§Ç‡§π ‡§ú‡•Ä, ‡§Æ‡•à‡§Ç ‡§ï‡§≤‡•ç‡§ü‡•Ä‡§Æ‡•á‡§ü ‡§ï‡§Ç‡§™‡§®‡•Ä ‡§™‡§ü‡§ø‡§Ø‡§æ‡§≤‡•á ‡§§‡•ã ‡§ï‡§≤ ‡§ï‡§∞ ‡§∞‡§π‡•Ä ‡§π‡•Ç‡§Å‡•§ ‡§•‡•ã‡§°‡§º‡•á ‡§ï‡•â‡§≤ ‡§π‡•à ‡§®‡§æ ‡§õ‡•ã‡§ü‡§æ ‡§ï‡•ã‡§á‡§®‡§æ ‡§ï‡•ã ‡§´‡•Ä‡§≤‡•ç‡§° ‡§ë‡§´‡§ø‡§∏‡§∞ ‡§Ü‡§Ø‡§æ ‡§π‡•ã‡§®‡§æ ‡§â‡§®‡•ç‡§π‡•á‡§Ç ‡§•‡•ã‡§°‡§º‡•Ä ‡§®‡§æ ‡§ï‡§æ‡§∞‡•ç‡§¨‡§® ‡§ï‡•ç‡§∞‡•á‡§°‡§ø‡§ü ‡§™‡•ç‡§∞‡•ã‡§ó‡•ç‡§∞‡§æ‡§Æ ‡§¨‡§æ‡§∞‡•á ‡§ú‡§æ‡§®‡§ï‡§æ‡§∞‡•Ä ‡§¶‡•á‡§§‡•Ä ‡§π‡•ã‡§®‡•Ä ‡§∏‡§∞‡•§ ‡§ú‡•Å‡§°‡§º‡•á ‡§ñ‡•á‡§§‡•ã

### üïµÔ∏è Step 2: Detect PII Entities in Hindi Text

This function uses the Azure client to recognize PII entities in a Hindi document. It:

- Calls Azure's `recognize_pii_entities()` method with language set to `"hi"` (Hindi).
- Prints redacted text and detailed info for each detected entity.
- Returns a list of all PII words detected in the input.


In [None]:
asr_text = [transcript['text']]
pii_result = detect_pii_entities(asr_text)
pii_words = []
for doc in pii_result:
    print("Redacted Text: {}".format(doc.redacted_text))
    for entity in doc.entities:
        print("Entity: {}".format(entity.text))
        pii_words.append(entity.text)
        print("\tCategory: {}".format(entity.category))
        print("\tConfidence Score: {}".format(entity.confidence_score))
        print("\tOffset: {}".format(entity.offset))
        print("\tLength: {}".format(entity.length))

Redacted Text: ‡§π‡•à‡§≤‡•ã ‡§π‡§æ‡§Å ‡§ú‡•Ä ‡§π‡§æ‡§Å ‡§ú‡•Ä ‡§∏‡§∞ ‡§´‡§º‡•ã‡§® ‡§Ü‡§Ø‡§æ ‡§•‡§æ, ‡§´‡§º‡•ã‡§® ‡§Ü‡§Ø‡§æ ‡§•‡§æ, ‡§è‡§ï ‡§∏‡•á‡§ï‡§Ç‡§° ‡§π‡•ã‡§≤‡•ç‡§° ‡§ï‡§∞‡•ã ‡§ú‡•Ä ‡§Æ‡•à‡§Ç ‡§®‡§æ‡§Æ ‡§ú‡§æ‡§® ‡§∏‡§ï‡§§‡•Ä ‡§π‡•Ç‡§Å ‡§•‡•ã‡§°‡§º‡§æ ************* ‡§ú‡•Ä‡•§ *************, *************, ********‡•§ ‡§è‡§≤‡§è ‡§ú‡•á ‡§Ü‡§à ‡§è‡§®‡§°‡•Ä ‡§µ‡§ø ‡§Ü‡§∞ ‡§ì‡§ï‡•á ‡§ì‡§ï‡•á ‡§ì‡§ï‡•á ‡§ì‡§ï‡•á ************* ‡§ú‡•Ä, ‡§Æ‡•à‡§Ç ‡§ï‡§≤‡•ç‡§ü‡•Ä********* ‡§™‡§ü‡§ø‡§Ø‡§æ‡§≤‡•á ‡§§‡•ã ‡§ï‡§≤ ‡§ï‡§∞ ‡§∞‡§π‡•Ä ‡§π‡•Ç‡§Å‡•§ ‡§•‡•ã‡§°‡§º‡•á ‡§ï‡•â‡§≤ ‡§π‡•à ‡§®‡§æ ‡§õ‡•ã‡§ü‡§æ ***** ‡§ï‡•ã ‡§´‡•Ä********* ‡§Ü‡§Ø‡§æ ‡§π‡•ã‡§®‡§æ ‡§â‡§®‡•ç‡§π‡•á‡§Ç ‡§•‡•ã‡§°‡§º‡•Ä ‡§®‡§æ ‡§ï‡§æ‡§∞‡•ç‡§¨‡§® ‡§ï‡•ç‡§∞‡•á‡§°‡§ø‡§ü ‡§™‡•ç‡§∞‡•ã‡§ó‡•ç‡§∞‡§æ‡§Æ ‡§¨‡§æ‡§∞‡•á ‡§ú‡§æ‡§®‡§ï‡§æ‡§∞‡•Ä ‡§¶‡•á‡§§‡•Ä ‡§π‡•ã‡§®‡•Ä ‡§∏‡§∞‡•§ ‡§ú‡•Å‡§°‡§º‡•á ‡§ñ‡•á‡§§‡•ã‡§Ç ‡§ï‡•á ‡§¨‡•Ä‡§ö ‡§™‡§æ‡§á‡§™ ‡§≤‡§ó ‡§ó‡§è ‡§® ‡§∏‡§∞‡•§ ‡§™‡§æ‡§á‡§™ ‡§ï‡§æ ‡§Æ‡•à‡§∏‡•á‡§ú ‡§≤‡§æ‡§â‡§° ‡§¶‡•á‡§®‡•á‡•§ ‡§®‡§π‡•Ä‡§Ç, ‡§Æ‡•á‡§

### üß† Step 3: Map PII Words to Audio Word Timestamps

This function attempts to align each detected PII word to a timestamped word chunk in the ASR output using:

- Fuzzy string matching (`SequenceMatcher`) to allow for slight mismatches.
- Returns a list of timestamps corresponding to each PII word.

This is crucial for knowing **where** in the audio to apply redaction.


In [None]:
pii_segments = map_pii_to_timestamps(pii_words, transcript['chunks'], threshold=0.6)
print(pii_segments)

[(13.47, 13.91), (17.2, 17.72), (18.68, 19.32), (20.28, 20.68), (25.07, 25.55), (26.55, 26.91), (30.6, 30.92), (31.24, 31.68), (44.59, 44.79), (58.36, 59.32), (73.75, 74.43), (87.19, 87.71), (110.35, 110.87), (122.83, 123.15), (123.27, 123.79), (158.91, 159.19), (159.19, 159.47), (242.23, 242.35), (246.15, 247.23), (257.0, 257.28)]


### üîá Step 4: Mute Audio Segments Containing PII

This function takes the original audio file and the timestamps of detected PII, and:

- Loads the audio using `torchaudio`.
- Silences the waveform between each PII segment (sets amplitude to zero).
- Saves the redacted audio file to `output_audio/` directory, replacing `"in-"` with `"out-"` in the filename.


In [None]:
redact_audio(input_audio_path,pii_segments)

‚úÖ Redacted audio saved at: output_audio\out-70001-919417328881-20250530-160219-1748601139.42162.wav
