<a href="https://colab.research.google.com/github/armandossrecife/teste/blob/main/extract_audio_video_info.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Extrai informações básicas de arquivos de vídeo e audio

Vídeos e Aúdio

## Arquivos de teste

In [68]:
!rm -rf *.mp4 && rm -rf *.mp3

In [69]:
!wget https://raw.githubusercontent.com/armandossrecife/teste/main/Adrienne.mp4
!wget https://raw.githubusercontent.com/armandossrecife/teste/main/Obama_Speech_about_education.mp4
!wget https://raw.githubusercontent.com/armandossrecife/teste/main/Kalimba.mp3

--2024-07-22 16:08:06--  https://raw.githubusercontent.com/armandossrecife/teste/main/Adrienne.mp4
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14944332 (14M) [application/octet-stream]
Saving to: ‘Adrienne.mp4’


2024-07-22 16:08:06 (122 MB/s) - ‘Adrienne.mp4’ saved [14944332/14944332]

--2024-07-22 16:08:07--  https://raw.githubusercontent.com/armandossrecife/teste/main/Obama_Speech_about_education.mp4
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.111.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 18613317 (18M) [application/octet-stream]
Saving to: ‘Obama_Speech_about_educ

## Instala programas e bibliotecas

**FFmpeg**:

FFmpeg is a free and open-source software project consisting of a suite of libraries and programs for handling video, audio, and other multimedia files and streams.

- https://ffmpeg.org/
- https://en.wikipedia.org/wiki/FFmpeg

**Pyffmpeg**:

FFmpeg wrapper for python. It uses the most up-to-date FFmpeg binary to provide both FFmpeg and FFprobe functionality. So you can kill two birds with one stone.

- https://pypi.org/project/pyffmpeg

**Soundfile**:
The soundfile module can read and write sound files.

- https://pypi.org/project/soundfile

In [70]:
!sudo apt install ffmpeg

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
ffmpeg is already the newest version (7:4.4.2-0ubuntu0.22.04.1).
0 upgraded, 0 newly installed, 0 to remove and 45 not upgraded.


In [71]:
!ffmpeg -version

ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers
built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-l

In [72]:
!pip install ffmpeg



In [73]:
!pip install --upgrade pyffmpeg



## Importa dependências

In [74]:
import ffmpeg
import os
import subprocess
import json
import soundfile as sf

## Extrai dados básicos de arquivos de vídeo

In [75]:
def extract_video_info(video_path):
  try:
    # Use ffmpeg command-line tool for probing
    command = ["ffprobe", "-v", "quiet", "-print_format", "json", "-show_format", "-show_streams", video_path]
    output = subprocess.check_output(command).decode()
    probe = json.loads(output)

    # Extract information (adjust based on your needs)
    video_stream = next(stream for stream in probe['streams'] if stream['codec_type'] == 'video')
    info = {
      "filename": os.path.basename(video_path),
      "size": os.path.getsize(video_path),
      "format": probe['format']['format_name'],
      "duration": video_stream['duration'],  # In seconds (float)
      "width": video_stream['width'],
      "height": video_stream['height'],
      "frame_rate": video_stream['avg_frame_rate'],  # Format: fps (string)
    }

    return info

  except subprocess.CalledProcessError as e:
    print(f"Error processing video: {video_path} - {e}")
    return None

## Extrai dados básicos de arquivos de áudio

In [76]:
def extract_audio_info(audio_path):
  try:
    # Use ffprobe for metadata extraction
    command = ["ffprobe", "-v", "quiet", "-print_format", "json", "-show_format", "-show_streams", audio_path]
    output = subprocess.check_output(command).decode()
    probe_data = json.loads(output)

    # Extract data information using soundfile
    info = sf.info(audio_path)
    data_info = {
      "filename": os.path.basename(audio_path),
      "size": os.path.getsize(audio_path),
      "format": info.format,
      "length": probe_data['format']['duration'],  # In seconds (float) from ffprobe
      "channels": info.channels,
      "sample_rate": info.samplerate,
    }

    # Extract metadata (tags) from ffprobe data
    metadata = {stream.get('tags', {}).get(key) for stream in probe_data['streams'] for key in stream.get('tags', {}).keys()}

    # Combine data and metadata information
  except subprocess.CalledProcessError as e:
    print(f"Error processing audio: {audio_path} - {e}")
    return None

  return info, data_info, metadata

## Exemplos de extração de informações de arquivos de vídeo e aúdio

### Arquivos de vídeo (mpeg4)

https://pt.wikipedia.org/wiki/Mp4

In [77]:
lista_video_paths = []
lista_audio_paths = []

current_path = os.getcwd()
print(current_path)

my_path1 = current_path + '/' + 'Adrienne.mp4'
print(my_path1)
my_path2 = current_path + '/' + 'Obama_Speech_about_education.mp4'
print(my_path2)
my_path3 = current_path + '/' + 'Kalimba.mp3'
print(my_path3)

lista_video_paths.append(my_path1)
lista_video_paths.append(my_path2)
lista_audio_paths.append(my_path3)

/content
/content/Adrienne.mp4
/content/Obama_Speech_about_education.mp4
/content/Kalimba.mp3


In [78]:
for video_path in lista_video_paths:
  video_info = extract_video_info(video_path)

  if video_info:
    print("Video Information:")
    print(video_path)
    for key, value in video_info.items():
      print(f"\t{key}: {value}")

Video Information:
/content/Adrienne.mp4
	filename: Adrienne.mp4
	size: 14944332
	format: mov,mp4,m4a,3gp,3g2,mj2
	duration: 239.339100
	width: 480
	height: 360
	frame_rate: 30000/1001
Video Information:
/content/Obama_Speech_about_education.mp4
	filename: Obama_Speech_about_education.mp4
	size: 18613317
	format: mov,mp4,m4a,3gp,3g2,mj2
	duration: 669.160000
	width: 640
	height: 360
	frame_rate: 25/1


### Arquivos de áudio (mp3)

https://pt.wikipedia.org/wiki/MP3

In [79]:
for audio_path in lista_audio_paths:
  audio_info, audio_data, audio_metadata = extract_audio_info(audio_path)

  if audio_info:
    print("Audio Information:")
    print(audio_path)

  if audio_data:
    print("Audio Data:")
    print(audio_path)
    for key, value in audio_data.items():
      print(f"\t{key}: {value}")

  if audio_metadata:
    print("Audio Metadata:")
    print(audio_path)
    print(type(audio_metadata), audio_metadata)

Audio Information:
/content/Kalimba.mp3
Audio Data:
/content/Kalimba.mp3
	filename: Kalimba.mp3
	size: 8414449
	format: MP3
	length: 348.060833
	channels: 2
	sample_rate: 44100
Audio Metadata:
/content/Kalimba.mp3
<class 'set'> {'thumbnail', 'Cover (front)'}
