Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Feature: Libretime automatic import from filesystem directory #514

Open
wants to merge 91 commits into
base: master
from

Conversation

Projects
None yet
7 participants
@Robbt
Copy link
Member

commented Sep 9, 2018

I still have a fair amount to do with this in terms of getting it cleaned up, but as an alternative implementation of a fix for #70 - this creates a python process that will monitor a folder in the filesystem currently /srv/airtime/stor/uploads and then import any file copied or moved into this folder. It also deletes the file immediately afterwards.

Here is some of my thinking behind it
Libretime_import

to operate - needs a directory to watch
needs libretime host, port, basedir, api-key or could be done with user authentication for remote user setup

import folder location should be configurable during the libretime install

ideally wouldn't import duplicate files (would need way to look up existing file in database via track title and/or md5) or files that were being written (could be done by lsof) with pyinotify or watchdog

in theory would be cool if it could be ran remotely or even cross-platform

minimal code needed -
pull config vars from /etc/airtime/airtime.conf
pull import_dir from database or parameter or airtime.conf

ways it could run:
scan every file in directory and subdirectories, import them via rest call and then (leave them intact, move them to new directory, or delete them)

watch for new files only and then import them and (move or delete them) or keep them intact

Issues with keeping them in place, don't want to re-import them again in the future (need duplicate checking and that will consume resources)

Issues with moving them - need a folder to move them to that isn't scanned, takes up double space for every file imported, is of questionable value (just to show non-destructive status of the import)

Issues with deleting them - if someone chose the wrong directory ie a working and organized music directory it could be destroy the organization of their files - perhaps not giving them the option and just making it work with /stor/import by default would be a good safeguard, a setting to change this with warnings could be available under the admin settings page.

The copy based import makes sense as a one-off script that could be ran during the install or triggered via a request to the UI. (not sure best approach here as we want to prevent user mistakes with duplicate files)

The simplest approach for now is to simply setup a watched folder under /srv/stor/uploads and then ignore existing files and watch new files copied here and send them to the import process via the UI and delete them from this directory.

hairmare and others added some commits Apr 1, 2017

Report on airtime-celery service in status
Adds celery checks to /?config and /systemstatus for completeness' sake.

Fixes #138
Always return proper file size
Legacy upstream had a weird way of doing this that could lead to empty files being requested with a range header that did not match what was available on the server. I'm assuming this code used to work aroud some media-monitor feature that has been since refactored into analyzer which handles these cases properly.
@Robbt

This comment has been minimized.

Copy link
Member Author

commented Sep 9, 2018

Ok, so anyone checking this out - at the moment it is working with the folder /srv/airtime/store/uploads - which will be created upon installation and will automatically import and delete any audio files and folders contained within it.

I still need to get it working as a daemon with upstart, init.d and systemd - I copied the code and modified what was in airtime_analyzer's install but it doesn't appear to start when I try to start it. I might get around to doing that later today as getting it to run as a standalone service is my goal.

It also is ignoring Ctrl-C and needs to be killed via killall libretime_import - which is probably an annoying result of using the libretime_analyzer as a model.

But if anyone wants to try this they could install it via python setup.py install from inside the directory and then run it with sudo -u www-data libretime_import - they will also need to make sure they add an import_dir = /srv/airtime/stor/uploads/ to /etc/airtime/airtime.conf under the general header and create that directory owned by www-data and it should work.

@Robbt

This comment has been minimized.

Copy link
Member Author

commented Sep 9, 2018

Ok, did a little bit more work on this and I think it should be ready to go in terms of installation etc.
I could add the option to configure the import directory to the media directory portion of the setup script, but I managed to get it working as a standalone service and it ready for testing.

ned-kelly and others added some commits Sep 16, 2018

Merge pull request #527 from frecuencialibre/master
Docs: recommend installing on ubuntu 14 or 16
Merge pull request #529 from frecuencialibre/dragonslaying
docs: dragon slaying round 1!

@Robbt Robbt changed the title [WIP] Feature: Libretime automatic import from filesystem directory Feature: Libretime automatic import from filesystem directory Nov 29, 2018

@Robbt

This comment has been minimized.

Copy link
Member Author

commented Nov 29, 2018

Ok, I realized it still had some potential rough edges in terms of empty folders being left after import and so I added some code to delete them and also remove the watch so that we don't exceed the inode watch limit with really big imports.

It could use some more testing but I think it is operational. I tested it under docker and was able to get it working.

@hairmare

This comment has been minimized.

Copy link
Member

commented Nov 29, 2018

IMO running the service as root and chmodding files 777 isn't an option. I don't see any reasons the service can't run as non-root user. In fact, my production doesn't use www-data for any services other than apache. Things like analyser et al have their own libretime-analyzer users. This should also be doable for the importer (run it from www-data in ./install based systems and a bespoke user it you want to configure another user).

I'll give this a try locally (in a vagrant box) since it's a very important feature. Maybe I can figure out how to run it as non-root user and help updating the docs so users know how to configure this securely.

@Robbt

This comment has been minimized.

Copy link
Member Author

commented Nov 29, 2018

Yeah the running it as www-data was that it couldn't read directories even if they were owned by www-data and given 777 for permission.

From a usability perspective it would be good if it could always read any folder or file copied into it. This can only happen with it running as root. If we run it as www-data then non-root users won't be able to copy folders into the directory. Unless they set them as 777 ahead of time. So we could do a permission check and see if we can read the file and then give an error for that. But if the idea is a filesystem based import directory that automatically ingests anything put into it I don't see a way to have it not require elevated permissions on the part of the user or the service. As far as I know about Linux filesystem permissions there is no way to say anything copied into this directory will be set writeable by a specific user. But those are the reasons I think as designed it needs to run as root. If we say wanted to integrate a special uploads user account that people could connect to via ssh that might be another option and we could run the service as this special user.

I do understand that running services as root is something that should generally be avoided.

I guess integrating ftp might actually be a good idea too as if we are honest about this most users won't be comfortable using the CLI too copy files via ssh and so this is probably going to be used primarily by an admin and might even make more sense as a one off import script ala airtime-import without making it have a logical interface for non technical users.

@ned-kelly

This comment has been minimized.

Copy link
Member

commented Nov 30, 2018

i'm not sure why the service didn't start immediately after the install. long-running services such as playout are managed in @ned-kelly 's multicontainer instance using Supervisor. there's probably work that should be done in the docker install to also include this new libretime_import service, but i unfortunately won't have time to get into that until next month when i'm planning on trying again to get the multicontainer repo under the LibreTime user, and benefiting from improvements we're making upstream in this main LT repo. nonetheless hope this helps a little!

Yep - this just needs to be added into Supervisor on the docker build - I've done some work on the docker build this week (such as building in Silan to 0.4.0, Liquidsoap, etc) - I might merge this feature into a branch of the docker build for anyone wanting to test until the permissions that was pointed out by hairmare is changed and this becomes one of the main services.

@frecuencialibre

This comment has been minimized.

Copy link
Contributor

commented Nov 30, 2018

for anyone wanting to test

i am. adding to supervisor seemed quite simple: frecuencialibre/docker-multicontainer-libretime@ef46577

@Robbt Robbt changed the title Feature: Libretime automatic import from filesystem directory [WIP] Feature: Libretime automatic import from filesystem directory Nov 30, 2018

@Robbt

This comment has been minimized.

Copy link
Member Author

commented Nov 30, 2018

So as per my comment yesterday I'm going to recommend we hold off on introducing this as a new service into LibreTime until the use case can be better clarified and the service refined to meet that.

It might also make more sense to simply roll the python code I wrote into airtime_analyzer to avoid creating a whole new service if it isn't necessary. and then making libretime-import the equivalent of airtime-import a CLI tool that works to import files and folders.

I can see this service being useful if it was tied together to a docker watched folder and automatically inserted any files that were copied into that but I'm not sure if this would work in terms of propogating the inotify events from host to docker instance. It won't work with a windows host - but that is a whole separate theoretical issue.

It would probably be better than nothing if we can get it working w/o using root and perhaps we could make it work with a user that can SFTP into the uploads directory and so a FTP client could also upload folders for automatic import. This would most likely be more useful than requiring the end user to have elevated privileges to '''sudo cp files /srv/airtime/stor/uploads''' - the other idea of setting up virtual FTP that automatically imports to a designated users files could also make sense but would require supporting a FTP server that authenticated by libretime_auth and modifying the code to import the owner based upon who "logged in".

@frecuencialibre

This comment has been minimized.

Copy link
Contributor

commented Nov 30, 2018

My vote is for this to be a one off command rather than a long-running service. why?

  • a command could be run as root, thus getting us past permissions problems which can be very frustrating for folks without linux experience
  • success/failure feedback could be shown to the user immediately right in their terminal. no more digging through multiple log files
  • i agree that this is a function that will be used exclusively by admin. normal users can use the web ui.
  • i recently lost time because pynotify wasn't being triggered by rsync, which i only discovered much later by accident.
  • this might get us past the 8192 file limit ??? (not sure)
@frecuencialibre

This comment has been minimized.

Copy link
Contributor

commented Nov 30, 2018

it would be great to move files that fail to import to another folder, ideally preserving folder hierarchy.

@Robbt

This comment has been minimized.

Copy link
Member Author

commented Nov 30, 2018

So helping @frecuencialibre troubleshoot this we found out that it is failing for files that contain utf8 or non-ASCII characters in the filename. I found it erroring out on this change from #3 Zend_File_Transfer and printing 'invalid file'.

So not sure what the fix or issue is perhaps @hairmare has more insight since it says that this error shouldn't happen in the code.

@Robbt

This comment has been minimized.

Copy link
Member Author

commented Dec 1, 2018

Ok, so I fixed this by using urllib.pathname2url to wrap the pathname. So hopefully it'll also solve the issue @frecuencialibre had.

@Robbt

This comment has been minimized.

Copy link
Member Author

commented Dec 1, 2018

quick update, while this last "fix" did move things further along, ie bypassing the Zend_File_Transfer->isValid check - Mutagen borked when it received the filename and in addition the file names were urlencoded and the original UTF8 was lost. I spent a few hours trying to figure out a way to get PHP to reconvert or play nice with the urlEncoded files but no luck thus far.

@Robbt

This comment has been minimized.

Copy link
Member Author

commented Dec 2, 2018

So I really went done the rabbit hole on this one to find a solution, evidently there are a lot of people who have issues with utf8 encoded filenames and python requests and urllib3 because they are using an 18 year old RFC vs. a newer spec.

So I converted libretime_import to python3, it was relatively easy, it did require one minor addition to airtime_analyzer to work with the way the filenames were encoded with utf8.

Anyways I still haven't worked on getting this fully tested and it might require this PR or a derivative being committed to urrlib3 to work out of the box though.

@Robbt

This comment has been minimized.

Copy link
Member Author

commented Dec 2, 2018

Yeah, so this will work out of the box with files that are only ASCII filenames but UTF8 filenames require the urllib3 library to be installed from my PR. I merged it with my master and hopefully I can help steward it into the main urllib3 library.

Anyways anyone who wants to use it can do so git clone https://github.com/Robbt/urllib3.git into your python_apps folder and then run sudo urllib3/python3 setup install and it'll install the newest version.

You probably need to rerun the setup for libretime_import by sudo python3 setup.py install --install-scripts=/usr/bin in the libretime_import directory.

But yeah, might end up refactoring this and/or just using curl for a CLI based app. And still need to troubleshoot the permission issues.

@ned-kelly

This comment has been minimized.

Copy link
Member

commented Dec 3, 2018

I can see this service being useful if it was tied together to a docker watched folder and automatically inserted any files that were copied into that but I'm not sure if this would work in terms of propogating the inotify events from host to docker instance. It won't work with a windows host - but that is a whole separate theoretical issue.

Based on my experience you're gonna have issues for anyone running on Docker for Windows/Mac unfortunately inotify does not play that nice when working from within a docker container (based on experience) and tends to use a lot of resources on the host machine if you're got a reasonable folder size / file count ..

It may be better to make a command line import type script - This is how (Funkwhale)[https://funkwhale.audio/] Not a radio solution but more a self hosted Spotify solution do it - They are also using a lot of Python code behind the scenes.

@Robbt

This comment has been minimized.

Copy link
Member Author

commented Dec 4, 2018

Well yeah we are running into issues testing with @frecuencialibre and I think he is using Linux yet it seems to stop after a few hundred files are imported. My suspicion is that the pynotify can't keep up with the inotify results being sent as it just seems to lose them. So yeah this is evidently not the right way to do the bulk import of tracks. A command line tool that uses curl probably makes more sense. pynotify was used extensively in media-monitor though and perhaps could of been causing some of the instability issues that app experienced, it's been so long since we used it as a project that I don't know.

In general this would work as an alternative way to bulk import a reasonable # of tracks but not large libraries. And so I think it might be worth going back to the work I did trying to get #111 working with our current HEAD. It just needs a fair amount of cleaning up as it was more of a proof of concept that a finished module we'd want to ship. It built its own import stack that used parts of analyzer vs. modifying analyzer. It's kind of uncharted territory.

@frecuencialibre

This comment has been minimized.

Copy link
Contributor

commented Dec 4, 2018

I think he is using Linux

yes. host running debian 9, multicontainer docker build with ubuntu 16.

stop after a few hundred files are imported

yup. luckily the files that were not imported remained in /srv/airtime/stor/uploads/, so i was able get past this short term pinch by looping a workaround dozens of times:

rsync -av /srv/airtime/stor/uploads/ /external-media/round2 --delete
rm -Rf /srv/airtime/stor/uploads/* && cp -R /external-media/round2/* /srv/airtime/stor/uploads

more notes for the record here:

  • files with capitalized file extensions (eg. .MP3) didn't import
  • in order to use Robbt's upstream fix to urllib3 we had to do run the commands below. note that the /opt... paths work for ned's multicontainer build, but will probably need changing if used elsewhere. also the killall command at the end stops the process managed by Supervisor, and then it is automatically restarted. if not using supervisor you'll need to restart it manually.
cd /opt/libretime/python_apps/ && git clone https://github.com/Robbt/urllib3.git
pip uninstall urllib3
pip install -e /opt/libretime/python_apps/urllib3/
cd /opt/libretime/python_apps/urllib3/ && python3 setup.py install
rm -R /usr/local/lib/python3.5/dist-packages/urllib3-1.24.1-py3.5.egg/
killall libretime_import
@hairmare

This comment has been minimized.

Copy link
Member

commented Dec 4, 2018

A command line tool that uses curl probably makes more sense.

So... this? It doesn't do any inotify things but AFAICT you can run it on individual files and they get uploaded... Maybe something like this:

find /path/to/files -name '*.mp3' -exec python_apps/airtime_analyzer/tools/ftp-upload-hook.sh {} \;

It could even be possible to call this via systemd.path.

@Robbt

This comment has been minimized.

Copy link
Member Author

commented Dec 4, 2018

As far as a command line tool, I think it would be helpful to write something structured like airtime-import which did more than just upload tracks. It included basic help etc and also made it possible to manage watched folders.

As far as the extension goes, I had code checking to see if a file was a mp3 or a MP3 but perhaps the uppercase switch wasn't matching for some reason.

It's hard to diagnose exactly what was failing based upon the information but I'm suspecting that inotify is probably not the right solution for bulk-importing from the command line.

@Robbt

This comment has been minimized.

Copy link
Member Author

commented Jan 18, 2019

So I'm not sure what is salvageable from this module. It appears that there are some issues that would need to be resolved when "mass importing" tracks with this and the issue with the requests library and utf-8 would need to be fixed before this could be suggested as a filesystem based mass import. As it stands right now it might be useful connected with a FTP upload process as documented in http://libretime.org/manual/automated-file-import/

@Robbt

This comment has been minimized.

Copy link
Member Author

commented Mar 26, 2019

So my PR was finally accepted by urllib3 so this could actually work for utf8 files without requiring a forked copy of the upstream library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.