Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include Notice Regarding IA Treatment Of Uploaded Video Content #101

Closed
wants to merge 2 commits into from
Closed

Include Notice Regarding IA Treatment Of Uploaded Video Content #101

wants to merge 2 commits into from

Conversation

brandongalbraith
Copy link
Collaborator

@brandongalbraith brandongalbraith commented Jul 11, 2019

Added to README:

Currently, Internet Archive policy is that video content uploaded outside of sanctioned archival operations will not be included in search results (but is still accessible directly with the item's URL). This is subject to change at any time, and is entirely at the discretion of the Internet Archive.

Merge pending final decision from IA staff.

cc @vxbinaca

@vxbinaca
Copy link
Collaborator

Ah that's why 130,000 of my items disappeared

@vxbinaca
Copy link
Collaborator

It's not even showing in my favorites. I noticed there was a issue when half my favorites disappeared, then a real problem when I only had 39k items when before I had 160k+.

So they're not going to reverse this? What the fuck I can't find anything of it doesn't show up in search. Doesn't even show up in my uploads which is a real bitch.

@vxbinaca
Copy link
Collaborator

Just merge it but holy fuck this sucks. I can't find certain things I really need to find. I mean okay wall it off but don't remove it from my uploads or even being able to find it.

@brandongalbraith
Copy link
Collaborator Author

brandongalbraith commented Jul 11, 2019

Holding off on merging until IA decides what the policy will be. It's not set in stone yet. Will roll something with the ia python lib to handle library management if necessary.

@brandongalbraith brandongalbraith changed the title Include Notice Regarding Treatment Of Uploaded Video Content Include Notice Regarding IA Treatment Of Uploaded Video Content Jul 11, 2019
@vxbinaca
Copy link
Collaborator

Wait a second I was told to email staff if I had large channels (which I consider over a few hundred videos) before dumping. I dunno I consider them making a collection to hold the items to be sanction. Thats not them telling me no.

@vxbinaca
Copy link
Collaborator

I've been told we may need to bake noindex=true in for internetarchive upload options. This saves the video with no way to search for it. This has effectively darked 3/4s of my uploads. 4 years work.

You'd have to scrape Google with some creative search dorking to get most of the items URLs and even then you wouldn't get it all.

@rudolphos
Copy link

rudolphos commented Jul 12, 2019

Same problem here. Items are being hidden every few minutes. I wonder how much it will go, until item count = 0 ?
Nothing is searchable anymore even on archive.org, the site seems to becoming useless.

I keep my own list of URLs (using xenu), but what's the point of it if missing stuff can't be found on search engines or on archive.org, which was the point of archiving in the first place...

image

Might be a good time to implement some kind of csv download-archive option with titles and tags for easier management, and publish it all somewhere on web, maybe custom search engine (NoIndex Archive.org Search Engine) where we could submit links (item ids) in batch.

@brandongalbraith
Copy link
Collaborator Author

brandongalbraith commented Jul 12, 2019

@rudolphos Please note below only wild speculation. I have no affiliation with the Internet Archive.

The Internet Archive doesn't want to be a dumping ground of Youtube videos that are still available on Youtube. They don't want people abusing their systems. The policy they are experimenting with still stores the content for a later date (and one might assume they would "undark" content from search engine indexes if it was detected as no longer available on Youtube), and sanctioned archiving of video content will show up in search.

I can't speak to "what the point of archiving" is, as everyone's reason for preservation is different; my preservation goals for content is on 100+ year timelines, and as long as the bits safely make it on to Internet Archive disks, I don't care if it's searchable/dark/etc. If your use case requires the video content be searchable immediately, you might need to consider a different platform or storage system going forward (depending on how the current policy experiment shakes out).

@vxbinaca
Copy link
Collaborator

Brandon bake noindex=true in and call it a day. After that she's all yours. I'm working on your pypi access right now.

@vxbinaca
Copy link
Collaborator

@rudolphos why CSV? You can serially download items. As for ytpmv-mad I'm not sure what I'm going to do. I may end up giving it over to someone else to write to.

@rudolphos
Copy link

rudolphos commented Jul 13, 2019

@vxbinaca no I meant some kind of replacement for --use-download-archive file, so it's easier to manage and search the list for archived items, right now it just records service-id.

service; id; creator; video title; tags

something like that, which would be easier to import in spreadsheet software and at least keep track of what's being archived, maybe someone else would find this useful.

I think this is a youtube-dl built in feature. But tubeup is parsing the same metadata (title, tags, creator) to ia uploader, which shouldn't be too difficult to be recorded in some additional file while at it.

@vxbinaca
Copy link
Collaborator

Okay the policy was lifted at some point today. I got a ecstatic DM on Twitter. @brandongalbraith any clue what happened?

@brandongalbraith
Copy link
Collaborator Author

brandongalbraith commented Jul 14, 2019 via email

@vxbinaca
Copy link
Collaborator

@rudolphos I need you and your friends in your chatroom to stop the mass dumping of channels. Only channels that are actually in real danger of loss with no backup. We're all suffering right now in part because of your group of which I am not nor will ever be affiliated with. It's other people too but you're here so you're getting my grief.

@rudolphos
Copy link

@vxbinaca I am archiving individually and I'm not in any kind of chatrooms. I don't know where you get that idea.

Also I'm not dumping any channels. ~50% of the content I have archived over the past 4 years is gone from youtube already, I don't backup useless crap, but only content that is at risk of deletion.

I also don't have any kind of automation like you do, I don't use cron and such things. Everything I archive is done manually by hand and that is maybe around >100 videos per week.

@makew0rld
Copy link

What is the current status of IA's position on videos and uploading video content?

@vxbinaca
Copy link
Collaborator

Don't rip unless it's going to disappear AND it will exist nowhere else. And no, PewDiePie threatening to delete his channel isn't a emergency and is 100 percent bullshit every time.

Essentially, do not use this tool. You probably lack good judgment which is why the policy was changed, because a bunch of people from Reddit - FUCK REDDIT BTW - had terrible judgment and zero consideration.

@vxbinaca
Copy link
Collaborator

Also this abuse and it's consequences wasn't the reason I stopped dumps and maintaining this tool, but it was the last straw.

Do not use this tool. Period.

@makew0rld
Copy link

So, is the IA policy of not including videos in search results still in effect? You mentioned it might have changed earlier in the thread.

@brandongalbraith
Copy link
Collaborator Author

brandongalbraith commented Nov 24, 2019 via email

@antonizoon
Copy link
Member

@makeworld-the-better-one

The fact of the matter is, the infinite stream of video content on Youtube cannot fit into the limited budget of the Internet Archive. The people who demand eternal archival of their content simply are not the ones shouldering the eternal costs.

That said, the fragility of Too Big To Fail content platforms is soon to rear its ugly head for various reasons, which makes your scrambling and nervousness understandable. However, the fact is those who pay the bills make the rules.

Please inform your users that work desperately needs to be done to attach tubeup to python libraries providing alternative platforms (perhaps using rclone?) that are infinite or otherwise cheap cloud storage capable, such as Google Drive for Enterprise (the best fit at the moment, oversubscribed by businesses whose employees and students only have duration limited access and shy away from posting content when the rest of the business can see it), Backblaze, the Cloud Tape Storage systems like Amazon Glacier and Google Nearline, and other cloud backup mechanisms.

Our organization believes that from our experiments, the only effective way to archive an infinite internet at minimum costs (LTO4 is $15 a 800GB tape, LTO5 is $15-20 a 1.5TB tape, LTO6 is $20-30 for a 2.5TB tape) is through LTO Tape Storage, as cloud datacenters themselves use it for backups. LTO Tape provides an offline archival medium that is shelf stable unlike hard drives that are spinning disks of doom and significantly cheaper than flash storage. Unfortunately, we only have patchy guides representing our limited knowledge in understanding how to utilize tape, but in summary tar (tape archive format) ltfs (conventional looking filesystem, but don't use for random access) or bareos (successor to bacula) can be used for data storage.

https://www.ibm.com/us-en/marketplace/ibm-lto-ultrium-6-data-cartridge

https://wiki.bibanon.org/LTO_Tape
https://wiki.bibanon.org/LTO_Tape/6

We would appreciate an archival community formed to further research and reduce the significant hurdles involved in utilizing LTO Tape storage, and I personally will assist any efforts in setting up tubeup to utilize such methods.

@makew0rld
Copy link

makew0rld commented Nov 25, 2019

@brandongalbraith I am not experiencing this policy with videos I have uploaded, including videos I uploaded after you made this PR. The videos I've uploaded can still be found by searching.

@antonizoon Thanks for the in depth reply. I'll look into the codebase, maybe I'll be able to add other backends. Also, just to be clear, I'm not representing any users, I'm just personally interested.

@vxbinaca
Copy link
Collaborator

It's not the resources, and I can't say why I know this, it's dipshits dumping Linus Tech Tips videos for example (Linus has his content backed up on his own hard drives, and if he had a failure too OH WELL), it's that people were dumping large well-known YTers and there was DMCAs being filed all the time. It was becoming a hassle for staff.

So, abuse. People not thinking things through clearly, usually from Reddit, no perspective or thought just dump it and cry when 1 video someone else uploaded fills your disk with a failed channel dump 20k videos big 'oh no you made me eat up all my metered European traffic and disc space on my windows gaming machine hooked into my personal gaming computer'.

You don't need Tubeup to do alternate sites. Use youtube-dl & rclone. Here's the reccomneded command:

youtube-dl --continue --retries 4 --write-info-json --write-description --write-thumbnail --write-annotations --all-subs --ignore-errors -f bestvideo+bestaudio --write-subtitles --all-subs

May be wrong on the subs flags, play with it and see.

@makew0rld
Copy link

Yep, I understand about wasting their space. I'm just confused about the policy now, because it doesn't seem to be in effect for videos I've uploaded.

@brandongalbraith
Copy link
Collaborator Author

The Internet Archive hasn't provided a written policy for me to document in the README (cc @jjjake), only discussions that previously took place. Closing for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants