Skip to content

Commit

Permalink
Add missing audio_filepath validation for Canary (#8119)
Browse files Browse the repository at this point in the history
* Add missing audio_filepath validation for Canary

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
  • Loading branch information
pzelasko and pre-commit-ci[bot] committed Jan 3, 2024
1 parent 9fc3ae5 commit 1e7cfd6
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions nemo/collections/common/data/lhotse/nemo_adapters.py
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,9 @@ def __iter__(self):
tar_path = self.shard_id_to_tar_path[sid]
with tarfile.open(fileobj=open_best(tar_path, mode="rb"), mode="r|*") as tar:
for data, tar_info in zip(shard_manifest, tar):
assert (
data["audio_filepath"] == tar_info.name
), f"Mismatched JSON manifest and tar file. {data['audio_filepath']=} != {tar_info.name=}"
raw_audio = tar.extractfile(tar_info).read()
# Note: Lhotse has a Recording.from_bytes() utility that we won't use here because
# the profiling indicated significant overhead in torchaudio ffmpeg integration
Expand Down

0 comments on commit 1e7cfd6

Please sign in to comment.