-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include Notice Regarding IA Treatment Of Uploaded Video Content #101
Conversation
Ah that's why 130,000 of my items disappeared |
It's not even showing in my favorites. I noticed there was a issue when half my favorites disappeared, then a real problem when I only had 39k items when before I had 160k+. So they're not going to reverse this? What the fuck I can't find anything of it doesn't show up in search. Doesn't even show up in my uploads which is a real bitch. |
Just merge it but holy fuck this sucks. I can't find certain things I really need to find. I mean okay wall it off but don't remove it from my uploads or even being able to find it. |
Holding off on merging until IA decides what the policy will be. It's not set in stone yet. Will roll something with the ia python lib to handle library management if necessary. |
Wait a second I was told to email staff if I had large channels (which I consider over a few hundred videos) before dumping. I dunno I consider them making a collection to hold the items to be sanction. Thats not them telling me no. |
I've been told we may need to bake You'd have to scrape Google with some creative search dorking to get most of the items URLs and even then you wouldn't get it all. |
Same problem here. Items are being hidden every few minutes. I wonder how much it will go, until item count = 0 ? I keep my own list of URLs (using xenu), but what's the point of it if missing stuff can't be found on search engines or on archive.org, which was the point of archiving in the first place... Might be a good time to implement some kind of csv download-archive option with titles and tags for easier management, and publish it all somewhere on web, maybe custom search engine (NoIndex Archive.org Search Engine) where we could submit links (item ids) in batch. |
@rudolphos Please note below only wild speculation. I have no affiliation with the Internet Archive. The Internet Archive doesn't want to be a dumping ground of Youtube videos that are still available on Youtube. They don't want people abusing their systems. The policy they are experimenting with still stores the content for a later date (and one might assume they would "undark" content from search engine indexes if it was detected as no longer available on Youtube), and sanctioned archiving of video content will show up in search. I can't speak to "what the point of archiving" is, as everyone's reason for preservation is different; my preservation goals for content is on 100+ year timelines, and as long as the bits safely make it on to Internet Archive disks, I don't care if it's searchable/dark/etc. If your use case requires the video content be searchable immediately, you might need to consider a different platform or storage system going forward (depending on how the current policy experiment shakes out). |
Brandon bake |
@rudolphos why CSV? You can serially download items. As for ytpmv-mad I'm not sure what I'm going to do. I may end up giving it over to someone else to write to. |
@vxbinaca no I meant some kind of replacement for --use-download-archive file, so it's easier to manage and search the list for archived items, right now it just records service; id; creator; video title; tags something like that, which would be easier to import in spreadsheet software and at least keep track of what's being archived, maybe someone else would find this useful. I think this is a youtube-dl built in feature. But tubeup is parsing the same metadata (title, tags, creator) to ia uploader, which shouldn't be too difficult to be recorded in some additional file while at it. |
Okay the policy was lifted at some point today. I got a ecstatic DM on Twitter. @brandongalbraith any clue what happened? |
Internet Archive staff prioritized a feature enhancement so that noindex
items are shown in the uploader’s “upload” tab.
…On Sat, Jul 13, 2019 at 10:52 PM Paul Henning ***@***.***> wrote:
Okay the policy was lifted at some point today. I got a ecstatic DM on
Twitter. @brandongalbraith <https://github.com/brandongalbraith> any clue
what happened?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#101?email_source=notifications&email_token=AACVVLJM6UJO7J47A5QHLPDP7KPINA5CNFSM4IBTLA22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ35TAQ#issuecomment-511170946>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACVVLOIP6TJGXD4KDWSWOTP7KPINANCNFSM4IBTLA2Q>
.
|
@rudolphos I need you and your friends in your chatroom to stop the mass dumping of channels. Only channels that are actually in real danger of loss with no backup. We're all suffering right now in part because of your group of which I am not nor will ever be affiliated with. It's other people too but you're here so you're getting my grief. |
@vxbinaca I am archiving individually and I'm not in any kind of chatrooms. I don't know where you get that idea. Also I'm not dumping any channels. ~50% of the content I have archived over the past 4 years is gone from youtube already, I don't backup useless crap, but only content that is at risk of deletion. I also don't have any kind of automation like you do, I don't use cron and such things. Everything I archive is done manually by hand and that is maybe around >100 videos per week. |
What is the current status of IA's position on videos and uploading video content? |
Don't rip unless it's going to disappear AND it will exist nowhere else. And no, PewDiePie threatening to delete his channel isn't a emergency and is 100 percent bullshit every time. Essentially, do not use this tool. You probably lack good judgment which is why the policy was changed, because a bunch of people from Reddit - FUCK REDDIT BTW - had terrible judgment and zero consideration. |
Also this abuse and it's consequences wasn't the reason I stopped dumps and maintaining this tool, but it was the last straw. Do not use this tool. Period. |
So, is the IA policy of not including videos in search results still in effect? You mentioned it might have changed earlier in the thread. |
The policy you mentioned is in effect, and is likely to continue to avoid
abuse.
…On Sun, Nov 24, 2019 at 5:34 PM makeworld ***@***.***> wrote:
So, is the IA policy of not including videos in search results still in
effect? You mentioned it might have changed earlier in the thread.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#101?email_source=notifications&email_token=AACVVLNXGPQI5L3CIMBHVC3QVMFP7A5CNFSM4IBTLA22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFAYCUI#issuecomment-557941073>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACVVLMCUQUPGQOBPOPPO2TQVMFP7ANCNFSM4IBTLA2Q>
.
|
The fact of the matter is, the infinite stream of video content on Youtube cannot fit into the limited budget of the Internet Archive. The people who demand eternal archival of their content simply are not the ones shouldering the eternal costs. That said, the fragility of Too Big To Fail content platforms is soon to rear its ugly head for various reasons, which makes your scrambling and nervousness understandable. However, the fact is those who pay the bills make the rules. Please inform your users that work desperately needs to be done to attach tubeup to python libraries providing alternative platforms (perhaps using rclone?) that are infinite or otherwise cheap cloud storage capable, such as Google Drive for Enterprise (the best fit at the moment, oversubscribed by businesses whose employees and students only have duration limited access and shy away from posting content when the rest of the business can see it), Backblaze, the Cloud Tape Storage systems like Amazon Glacier and Google Nearline, and other cloud backup mechanisms. Our organization believes that from our experiments, the only effective way to archive an infinite internet at minimum costs (LTO4 is $15 a 800GB tape, LTO5 is $15-20 a 1.5TB tape, LTO6 is $20-30 for a 2.5TB tape) is through LTO Tape Storage, as cloud datacenters themselves use it for backups. LTO Tape provides an offline archival medium that is shelf stable unlike hard drives that are spinning disks of doom and significantly cheaper than flash storage. Unfortunately, we only have patchy guides representing our limited knowledge in understanding how to utilize tape, but in summary https://www.ibm.com/us-en/marketplace/ibm-lto-ultrium-6-data-cartridge https://wiki.bibanon.org/LTO_Tape We would appreciate an archival community formed to further research and reduce the significant hurdles involved in utilizing LTO Tape storage, and I personally will assist any efforts in setting up tubeup to utilize such methods. |
@brandongalbraith I am not experiencing this policy with videos I have uploaded, including videos I uploaded after you made this PR. The videos I've uploaded can still be found by searching. @antonizoon Thanks for the in depth reply. I'll look into the codebase, maybe I'll be able to add other backends. Also, just to be clear, I'm not representing any users, I'm just personally interested. |
It's not the resources, and I can't say why I know this, it's dipshits dumping Linus Tech Tips videos for example (Linus has his content backed up on his own hard drives, and if he had a failure too OH WELL), it's that people were dumping large well-known YTers and there was DMCAs being filed all the time. It was becoming a hassle for staff. So, abuse. People not thinking things through clearly, usually from Reddit, no perspective or thought just dump it and cry when 1 video someone else uploaded fills your disk with a failed channel dump 20k videos big 'oh no you made me eat up all my metered European traffic and disc space on my windows gaming machine hooked into my personal gaming computer'. You don't need Tubeup to do alternate sites. Use youtube-dl & rclone. Here's the reccomneded command:
May be wrong on the subs flags, play with it and see. |
Yep, I understand about wasting their space. I'm just confused about the policy now, because it doesn't seem to be in effect for videos I've uploaded. |
The Internet Archive hasn't provided a written policy for me to document in the README (cc @jjjake), only discussions that previously took place. Closing for now. |
Added to README:
Currently, Internet Archive policy is that video content uploaded outside of sanctioned archival operations will not be included in search results (but is still accessible directly with the item's URL). This is subject to change at any time, and is entirely at the discretion of the Internet Archive.
Merge pending final decision from IA staff.
cc @vxbinaca