Skip to content
Akash Mahanty edited this page Dec 24, 2021 · 33 revisions

This is the extended usage page, API reference can be found here.

Attributes of VideoHash objects

The VideoHash objects have some useful attributes and they are described here and see also Accessing files created by VideoHash instance.

from videohash import VideoHash
videohash1 = VideoHash(url="https://www.youtube.com/watch?v=PapBjpzRhnA") # VIDEO : Artemis I Hot Fire Test
videohash2 = VideoHash(path="/home/akamhy/Downloads/rocket.mkv") # VIDEO : Artemis I Hot Fire Test, same video as of videohash1

hash

  • The videohash-value of the input video. A 64-bit python string and is prefixed with 0b.
>>> videohash1.hash
'0b0011010000011111111011111111111110001111011110000000000000000000'
>>> videohash2.hash
'0b0011010000011111111011111111111110001111011110000000000000000000'
>>> videohash1.hash == "0b0011010000011111111011111111111110001111011110000000000000000000"
True
>>> videohash1.hash == videohash2.hash
True
>>> videohash1 == videohash2
True
>>> videohash1 - videohash1
0
>>> videohash1 - "0b0011010000011111111011111111111110001111011110000000000000000000"
0

hash_hex

  • Hexadecimal representation of the hash prefixed with 0x.
>>> videohash1.hash_hex
'0x341fefff8f780000'
>>> videohash2.hash_hex
'0x341fefff8f780000'
>>> videohash1 - "0x341fefff8f780000"
0
>>> videohash1 == "0x341fefff8f780000"
True
>>> videohash1 == videohash2.hash_hex
True

bits_in_hash

  • A constant, 64. Indicates the number of bits in the hash.
>>> videohash1.bits_in_hash
64
>>> videohash2.bits_in_hash
64

bitlist

  • The bits of the hash in a python list, only the bits are present in the string but not the prefix '0b'. The number of elements in the list is always 64.
>>> videohash1.bitlist
[0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
>>> videohash2.bitlist
[0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
>>> videohash1 - [0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
0
>>> videohash1 - videohash1.bitlist
0
>>> len(videohash1.bitlist)
64
>>> len(videohash1.bitlist) == videohash1.bits_in_hash
True

video_duration

  • Retrieve the exact video duration as echoed by the FFmpeg and return the duration in seconds. The maximum duration supported is 999 hours, above which the regex is doomed to fail(no match).
>>> videohash1.video_duration
52.08
>>> videohash2.video_duration
52.08

collage_path

image

  • Type PIL.JpegImagePlugin.JpegImageFile. The collage image PIL object.
>>> videohash1.image
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1024x1171 at 0x7F811123F760>
>>> dir(videohash1.image)
['_Image__transformer', '__array__', '__class__', '__copy__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattr__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_category', '_close_exclusive_fp_after_loading', '_copy', '_crop', '_dump', '_ensure_mutable', '_exclusive_fp', '_exif', '_expand', '_get_safe_box', '_getexif', '_getmp', '_getxmp', '_min_frame', '_new', '_open', '_repr_png_', '_seek_check', '_size', 'alpha_composite', 'app', 'applist', 'bits', 'close', 'convert', 'copy', 'crop', 'custom_mimetype', 'decoderconfig', 'decodermaxblock', 'draft', 'effect_spread', 'entropy', 'filename', 'filter', 'format', 'format_description', 'fp', 'frombytes', 'get_format_mimetype', 'getbands', 'getbbox', 'getchannel', 'getcolors', 'getdata', 'getexif', 'getextrema', 'getim', 'getpalette', 'getpixel', 'getprojection', 'getxmp', 'height', 'histogram', 'huffman_ac', 'huffman_dc', 'icclist', 'im', 'info', 'layer', 'layers', 'load', 'load_djpeg', 'load_end', 'load_prepare', 'load_read', 'map', 'mode', 'palette', 'paste', 'point', 'putalpha', 'putdata', 'putpalette', 'putpixel', 'pyaccess', 'quantization', 'quantize', 'readonly', 'reduce', 'remap_palette', 'resize', 'rotate', 'save', 'seek', 'show', 'size', 'split', 'tell', 'thumbnail', 'tile', 'tobitmap', 'tobytes', 'toqimage', 'toqpixmap', 'transform', 'transpose', 'verify', 'width']

path

  • If you passed a path of the input video, this attribute stores that path. If you passed a URL this attribute is None.
>>> videohash2.path
>>> "/home/akamhy/Downloads/rocket.mkv"
<class 'NoneType'>
>>> type(videohash1.path)
<class 'NoneType'>

url

  • If you passed a URL of the input video, this attribute stores that URL. If you passed a path this attribute is None.
>>> type(videohash2.url)
<class 'NoneType'>
>>> videohash1.url
'https://www.youtube.com/watch?v=PapBjpzRhnA'

storage_path

video_path

task_uid

  • I guess here code will work better than words.
   @staticmethod
    def _get_task_uid():
        """
        Returns a unique task id for the instance. Task id is used to
        differentiate the instance files from the other unrelated files.
        We want to make sure that only the instance is manipulating the instance files
        and no other process nor user by accident deletes or edits instance files while
        we are still processing.
        """
        sys_random = random.SystemRandom()
        return "".join(
            sys_random.choice(
                "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
            )
            for _ in range(20)
        )

Downloading the best quality video(if possible from sites like YouTube and Vimeo)

The VideoHash class has a download_worst parameter and the default argument is False. It can be set to True to save bandwidth but sometimes the worst files might have very big black borders and the hash-value might not match.

VideoHash(url="https://www.youtube.com/watch?v=PapBjpzRhnA") # DEFAULT: To download the best quality video
VideoHash(url="https://www.youtube.com/watch?v=PapBjpzRhnA", download_worst=True) # To download the worst quality video

Deleting the storage path

The delete_storage_path() method deletes the storage path, please remember that deleting the storage directory will also delete the collage, extracted frames, and the downloaded video. If you passed an argument to the storage_path that directory will not be deleted but only the files and directories created inside that directory by the instance will be deleted, this is a feature(not a bug) to ensure that multiple instances of the same program are not deleting the storage path while other instances still require that storage directory.

⚠️ Many OS delete the temporary directory on boot or they never delete it.: If you will be calculating videohash-value for many videos and don't want to run out of storage don't forget to delete the storage path.

>>> videohash1 = VideoHash(url="https://www.youtube.com/watch?v=PapBjpzRhnA")
>>> videohash1.delete_storage_path() # Delete the storage_path

Accessing files created by VideoHash instance

Whenever a VideoHash instance is created, it creates some directories and files. It should also download the video if an URL is used to calculate the videohash value.

>>> videohash1 = VideoHash(url="https://www.youtube.com/watch?v=PapBjpzRhnA")

Accessing the collage_path and collage_dir

The collage_path is the absolute path of the collage generated by the instance and collage_dir is the absolute path of the directory in which the collage is stored.

>>> videohash1.collage_path
'/tmp/tmprf3g8gqi/temp_storage_dir/jbdz59cjosxf/collage/collage.jpg'
>>> videohash1.collage_dir
'/tmp/tmprf3g8gqi/temp_storage_dir/jbdz59cjosxf/collage/'

Accessing the frames_dir

The frames of the input video are extracted and stored in the frames_dir by the instance.

>>> videohash1.frames_dir
'/tmp/tmprf3g8gqi/temp_storage_dir/jbdz59cjosxf/frames/'

Accessing the video_dir, video_download_dir and video_path

  • The video_path is the absolute path of the video from which the frames are extracted.
  • The video_dir is the directory containing the video from which instance extracts the frames, if you passed an URL the video is download in the video_download_dir and then copied to the video_dir and if you pass a video path the video is directly copied to the video_dir.
  • The video downloaded are stored in video_download_dir.
>>> videohash1.video_path
'/tmp/tmprf3g8gqi/temp_storage_dir/jbdz59cjosxf/video/video.webm'
>>> videohash1.video_dir
'/tmp/tmprf3g8gqi/temp_storage_dir/jbdz59cjosxf/video/'
>>> videohash1.video_download_dir
'/tmp/tmprf3g8gqi/temp_storage_dir/jbdz59cjosxf/downloadedvideo/'