Skip to content

Video Pipeline/Training Support#510

Merged
BryonLewis merged 12 commits into
masterfrom
server/video-pipeline-fix
Dec 24, 2020
Merged

Video Pipeline/Training Support#510
BryonLewis merged 12 commits into
masterfrom
server/video-pipeline-fix

Conversation

@BryonLewis
Copy link
Copy Markdown
Collaborator

@BryonLewis BryonLewis commented Dec 22, 2020

Not the best way to solve the issue but the least intrusive way for the time being.

Pipeline Running Updates:

  • image-sequence for training will now filter out .json files
  • video type will now load the folder filter out any non compatible video files and then look for the video file which doesn't have the codec meta tag on it. I would like to share the validMediaFiles with the constants folder or maybe do this filtering before calling the task, I just didn't want to go with too big of a change right now with just getting video training to work.

CSV Metadata Update:

  • Added in the writing of the metadata to the header. Right not it only writes the fps value for the training and other items to user. More stuff can be added in the future, just trying to keep it fairly simple right now.
  • NOTE!!!! - The resetting of fps = None is there because of a training bug right now. Once fixed that will be removed.

Training Updates:

  • In the beginning of the training I updated the CSV creation to be more like export and include stuff like the fps so it knows to write it.
  • Training will use the same function to find the source video file and place that in the training manifest file. This means the file is directly referenced with the included extensions as well.

Copy link
Copy Markdown
Contributor

@subdavis subdavis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the conventional wisdom I'm aware of is telling me that judging files by their extension is a bad idea. We've done a large amount of work to make sure whatever garbage files a user might upload get forced through our sieve and come out sanitized, safe video, and this is a step in the wrong direction.

Proposal

If we want to move to using the raw uploaded video, we should go through the process of having ffprobe verify the file (we already do this), tag that file, and then use it.

Then, the logic could look like this:

if tagged safe original video exists,
  send it to training,
else if tagged safe transcoded video exists:
  send it to training
else 
  throw an error about no safe videos.

for (key, value) in metadata.items():
metadata_list.append(f" {key}: {value}")
metadata_str = ",".join(metadata_list)
writer.writerow([f'# metadata -{metadata_str}'])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary branching/default. Could we just make metadata a required positional arg and remove the if?

Comment thread server/viame_server/serializers/viame.py Outdated
Comment thread server/viame_tasks/utils.py Outdated
return groundtruth


def get_source_video(input_path: Path, folderId: str, girder_client):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is kinda dangerous and should be documented

Comment thread server/viame_tasks/utils.py Outdated
from girder_worker.task import Task

# TODO: Move viame_server.constants into a shared area like viame_utils to share constants
validVideoFormats = [".mp4", ".avi", ".mov", ".mpg"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well make it a set instead of a list so that when the refactor happens, it's a drop-in replacement.

Comment thread server/viame_tasks/utils.py Outdated
extension = os.path.splitext(file_name)[1].lower()
if item.get("meta", {}).get("codec") is None and extension in validVideoFormats:
return file_name
return None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the strategy of "choose the thing that isn't tagged but has the right file extension".

It's fragile. I think the transcoded file should be the thing we actually transmit, not the original source.

If a user uploads a txt file named .mp4, this will explode. However, if they used the tagged transcoded video, it's basically bulletproof because transcoding will only succeed on a well-formed video.

Transcoding is basically a litmus test for valid video before training runs, so I think we should use the "safer" version.

We can slightly relax our compression on transcoding if that would help. I really don't think this is a good idea.

@BryonLewis
Copy link
Copy Markdown
Collaborator Author

So I removed the extension checking and now it checks for:

  • meta: {source_video: true}
  • Then meta: {codec: 'h264'}
  • Then it will raise an exception if no video is found

Added the itemId to the convert_video, it was just the easiest way to add the metadata for {source_video:True} if it properly converts the video instead of iterating over the folder and finding the item by name/path.

Updated the get_source_video_filename function to specifically just get the filename of the source video or fall back to the transcoded version if the source one can't be found. Leave it up to where it is used to raise an error and handle the filepath joining.

My Testing process:

  • Importing new videos and making sure the metadata was tagged properly.
  • Running pipeline on old video and on new video. Checking the job output to make sure it referenced the proper file.
  • Running training on both old and newly uploaded videos. Again checked the job output to make sure each folder was using the proper video (transcoded for old video, source video for new ones)
  • Deleted the source_video and codec metadata and attempted to run pipelines and training. Error was properly raised in both instances
  • Also double checked the metadata output when exporting detections and that seemed to reflect the fps that was set properly. The updated one will include a # metadata - line for non videos but I don't think that should be a problem.

Only issue I see is if there is a super old video which doesn't have the {codec: 'h264'} tag on it but I think that will be rare.

@BryonLewis BryonLewis requested a review from subdavis December 23, 2020 18:57
Copy link
Copy Markdown
Contributor

@subdavis subdavis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like it would be prudent to go ahead and tag every video we've verified with codec.

Now, codec can just plainly mean codec, instead of carrying the extra implied "BTW this is the trancoded video".

Note that you also have to explicitly query for source_video does not exist in _get_clip_meta

item = Item().findOne(
            {
                'folderId': folder['_id'],
                'meta.codec': 'h264',
                'meta.source_video': { '$exists': False },
            }
        )

Comment thread server/viame_tasks/tasks.py Outdated
manager.updateStatus(JobStatus.PUSHING_OUTPUT)
new_file = gc.uploadFileToFolder(folderId, output_path)
gc.addMetadataToItem(new_file['itemId'], {"codec": "h264"})
gc.addMetadataToItem(itemId, {"source_video": True})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
gc.addMetadataToItem(itemId, {"source_video": True})
gc.addMetadataToItem(
itemId,
{
"source_video": True,
"codec": videostream[0]["codec_name"],
},
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm testing this myself now, but I have to download the newer gpu algorithms.

@BryonLewis BryonLewis marked this pull request as ready for review December 23, 2020 23:57
@subdavis subdavis self-requested a review December 24, 2020 01:23
@BryonLewis BryonLewis merged commit 0b78183 into master Dec 24, 2020
@BryonLewis BryonLewis deleted the server/video-pipeline-fix branch December 24, 2020 01:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Issues with pipeline runs (video) [FEATURE] Allow training on videos

2 participants