Skip to content

PRRPCHT/Aspargus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Aspargus

What is Aspargus?

The idea behind Aspargus is to have an automated solution to properly identify home videos. All those videos one makes of their kids, family... that are stored in folder with very unpractical names. Why not leverage the power of AI to give a preview of what's happening in the video?

How does it work?

Aspargus uses Ollama and LLMs to first understand what's happening in the video and then create a title, a description and keywords for that video.

Some LLMs can do both the Computer Vision part and the metadata generation very well such as Llava-Llama3 and therefore only one pass is needed. Otherwise two passes are needed, fist with a dedicated Computer Vision model, then with Text Generation model.

We can recommend LlaVa-Llama3 and Mistral as models (they are the default models actually) but you can use any other model if you want. Please make sure that Ollama is installed and running on your machine with the proper models downloaded before running Aspargus. Please refer to Ollama documentation on how to download a model.

Aspargus also uses FFMPEG and FFPROBE to extract frames from the videos. Please make sure that FFMPEG and FFPROBE are installed on your machine and in the path before running Aspargus.

How to run Aspargus?

Aspargus runs in your terminal and therefore takes arguments to execute :

  • List of video files (optional). A Space separated list of paths to the videos.

  • -f or --folder (optional): The folder where the videos are stored. Used alternatively with the list of videos, in order to avoid to specify all files in one go.

  • -s or --start (optional): Used together with the -f or --folder arguments, to specify which files in the list should be used as starting point (including), in an alphabetical order. If not specified, Aspargus begin its work from the first file (alphabetically) from the folder provided by -f or --folder arguments.

  • -e or --end (optional): Used together with the -f or --folder arguments, to specify which files in the list should be used as ending point (including), in an alphabetical order. If not specified, Aspargus end its work on the last file (alphabetically) from the folder provided by -f or --folder arguments.

  • -r or --rename (optional): renames the video files according to the provided template:

    • %Y: The year of creation of the video with 4 digits
    • %M: The month of creation of the video with 2 digits (with leading 0 if needed)
    • %D: The day of creation of the video with 2 digits (with leading 0 if needed)
    • %T: The title generated by Aspargus for the video
    • %K: The list of keywords generated by Aspargus for the video, separated by a dash -
    • %J: The list of keywords generated by Aspargus for the video, separated by a comma and a space ,
  • -j or --json (optional): The path of the JSON file where to store all videos' metadata.

  • -c or --cv_model (optional): Sets the name of the Computer Vision model to be used. Automatically saves the setting for the next usage, so no need to repeat this argument. Defaults to LlaVa.

  • --cv_server (optional): Sets the URL of the Computer Vision server. Automatically saves the setting for the next usage, so no need to repeat this argument. Defaults to Mistal.

  • --cv_server_port (optional): Sets the port of the Computer Vision server. Automatically saves the setting for the next usage, so no need to repeat this argument. Defaults to Mistal.

  • -t or --text_model (optional): Sets the name of the Text model to be used. Automatically saves the setting for the next usage, so no need to repeat this argument. Defaults to Mistal.

  • --text_server (optional): Sets the URL of the Text server. Automatically saves the setting for the next usage, so no need to repeat this argument. Defaults to Mistal.

  • --text_server_port (optional): Sets the port of the Text server. Automatically saves the setting for the next usage, so no need to repeat this argument. Defaults to Mistal.

  • --two_steps (optional): Runs the analysis in two steps, first running the Computer Vision model and then running Text model to generate a resume

At least the path of a video file or a path to a folder must be given in order for Aspargus to run.

Examples

  • aspargus -f /path/to/folder -s avideo.mp4 -e myvideo.mp4 -r "%Y-%M-%D_%T_%K" -t llama3:instruct analyses all the videos from the given folder in alphabetical order from avideo.mp4 to myvideo.mp4 (including) and renames all the files according to the given template:
    2024-04-16_my-video-title_keyword1-keyword2-keyword3
    
    It also specifies llama3:instruct as the new text processing model.
  • aspargus /path/to/video1.mp4 /another/path/to/video2.mp4 -j /third/path/to/file.json analyzes two videos and stores the metadata in the specified JSON file.

Constraints, known issues and limitations

  • Be careful when choosing your models, as it can take a lot of time to perform the frame analysis and/or the text generation depending on your hardware. The default models are 7B models that can run decently on pretty much any hardware.
  • When Aspargus uses 2 models for computer vision and text generation, all the computer vision tasks are run then the text generation ones in order to avoid loading the models every time. It can be an improvement in the future.

Licence

MIT licence.