Near-constant download of uploaded stories content apparent in S3/AWS logs #30

mradamcox · 2023-05-30T15:21:08Z

We have had a jump in our AWS bill over the last week, due to an unanticipated uptick in transfer out expenses, well beyond the free tier and well beyond normal usage of the site. I've put logging in AWS where I can, and, I believe, tracked this activity to GitHub node fetches. The timing of this seems to match up with when we uploaded a large number of files through the admin-upload process after a tabling event.

mradamcox · 2023-06-02T20:02:19Z

As far as I can tell, the Repair AV workflow was causing this data transfer. My understanding of this problem is as follows:

We had a lot of videos (24) in the queue from the admin-upload process, and the workflow doesn't commit the ids of completed videos if it is unexpectedly killed mid-way through.
We had some videos over 1gb in size, and it was while transcoding one of these that the workflow process was killed every time, presumably due to the file size.
Because the workflow was scheduled hourly, a new run would start (often even before the first run had failed), and begin working on the exact same videos until it failed by hitting the very large files.

For now, I have gone through the 24 un-transcoded videos in s3 and put the ids of those that are >= 1gb in size into a new txt file that the repair av script will reference and use to skip processing those videos. Ultimately, we'll need to compress and re-upload those particular videos.

mradamcox · 2023-08-07T14:48:07Z

@mukeshchugani10 One change to the code that would help address issues like this in the future would be a modifications of the workflow script such that each time a video is processed successfully, the updated list of video ids is committed to the repo. As far as I can tell, this commit only happens after all of the videos have been processed, which ultimately was what caused this particular issue.

We can set up a different ticket/workflow for handling very large (>1gb) video files, though, arguably, it would be better this size is never uploaded to s3 and we have some preprocessing, etc. Will figure that out with the next batch of uploads.

mradamcox · 2024-05-15T20:02:36Z

Ultimately, I've addressed this by disabling the A/V repair workflow and handling transcoding locally instead.

mradamcox added a commit that referenced this issue May 30, 2023

disable Repair AV files workflow #30

71bb42b

mradamcox mentioned this issue May 30, 2023

Approved stories revert to Unreviewed seemingly at random #31

Closed

mradamcox mentioned this issue Dec 21, 2023

video management updates #38

Merged

mradamcox closed this as completed May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Near-constant download of uploaded stories content apparent in S3/AWS logs #30

Near-constant download of uploaded stories content apparent in S3/AWS logs #30

mradamcox commented May 30, 2023

mradamcox commented Jun 2, 2023 •

edited

Loading

mradamcox commented Aug 7, 2023

mradamcox commented May 15, 2024

Near-constant download of uploaded stories content apparent in S3/AWS logs #30

Near-constant download of uploaded stories content apparent in S3/AWS logs #30

Comments

mradamcox commented May 30, 2023

mradamcox commented Jun 2, 2023 • edited Loading

mradamcox commented Aug 7, 2023

mradamcox commented May 15, 2024

mradamcox commented Jun 2, 2023 •

edited

Loading