-
-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bringing back the 'watch' function to libretime. #111
Conversation
It's a stand alone program, working directly with the database. Depending on the file status, it will either - add a new record into cc_files - update some metadata in cc_files - skip any analyse of the metadata So far there is no way to define a watch directory within Libretime, so you have to create it by your own directly in the database. (see comment)
I can merge my UI work onto this so there is a way to edit. After that we can add rabbitmq and sysv/upstart/systemd to the mix :) |
Restore the /preference/directory-config setting manager. Adding watched dirs seems to work and the also show up on /systemstatus.
My commit a0ab8ce re-implements the ui for adding, changing and removing media folders. |
I don't think there's a way to 👍 on commits so I'm commenting to say this looks very promising! |
Hi Lucas,
Thank you...
So for me it's running. I've added a small cron job, to 'wake' up the
system.
Now, how to manage, that libretime is doing this 'scheduler' job?
I've looked a little bit closer to airtime_analyser, so it looks like
all database stuff is done by the webserver api.
So, I've to dig a little bit more into the system. ;-)
Hans-Joachim
Am 2017-03-25 00:01, schrieb Lucas Bickel:
… I don't think there's a way to 👍 on commits so I'm commenting to
say this looks very promising!
--
--
RNI Radio Northsea International 24/7 on the internet
Deutscher Dienst:
http://www.rni220.com http://stream2.radio-northsea.de:7000/listen.pls
|
Yeah, the individual python processes don't usually have access to the database directly and go through the API. For some of them api_clients has the code to do the actual requests. If it needs something like a cronjob to work I can help package that for the installer (and make a systemd timer out of it for systemd systems). |
@Robbt what magic api egg should be in use? python-magic or file-magic? |
Cool kainz thanks for working on this. |
Gotcha. I don't think it'd be super hard to adopt this and get something useful out, any recommendations on refactoring the analytics part of the ingestion to run through `airtime-analyzer`? I'd like to do that to cut out duplication there then let watch focus on being a file event tracker. Also, if I'm on a multiprocessor platform and want analyze to be faster, looking at it's current layout, I suppose the easiest option is to spawn a set of them, eh?
Any plans to change the processing model there?
…On August 5, 2018 5:29:41 PM PDT, Robb ***@***.***> wrote:
Cool kainz thanks for working on this.
In #169 we switched to using file-magic - this code was written before
that and so it's still using python-magic. It basically worked for the
person who wrote it but I found it wasn't properly integrated with
systemd and there were issues like the ones you pointed out. So even if
it was rather hacky it also didn't work.
--
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
#111 (comment)
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
|
I honestly haven't spent very much time analyzing the analyzer code so I can't offer any recommendations without really spending some time trying to understand how it works. Like most things in this project it was written before the fork and mostly works so we have more or less left it alone. So the only change that people have mentioned is at some point porting the various python apps to Python 3 as they're still running 2.7 but we almost a year and a half until 2 is EOL and considering the Zend MVC framework we are using is already EOL figuring out a replacement for that is probably a higher and also more time consuming priority. So yeah you are welcome to come in and help push this PR through by refactoring and rewriting it to use the analyzer functions, this totally makes sense from my point of view. Let me know if you have any specific questions about any parts of the code, although I didn't write it another set of eyeballs can be helpful when trying to figure out how something is supposed to work. |
Ok, so I decided to analyze the analyzer code and I have a hunch that the approach taken here is probably unnecessary. We should be using airtime-analyzer and the way that airtime.pro did FTP imports was through sending a rabbit-mq message to airtime-analyzer. In theory this would be the easiest way to add watched folders but I'm not sure until someone writes an implementation that does this. I've also noticed another issue that I'll report shortly, and that is basically the airtime_analyzer code doesn't move files it fails to import anywhere. It just leaves them in the /store/organize folder. There is a TODO in the code. If we are going to be importing from a local filesystem using airtime_analyzer then we will definitely need to remove the "problem_files" after an import has been attempted or the folder will get messy. |
Ok, I create issue #508 to designate the issue with files that fail to import not being moved. I've also tried to look into how the analyzer imports files and from what I can tell it is basically just processing the files that it finds, checking their metadata etc and moving them (in the case of non s3 file systems) if they pass all of the checks otherwise it kicks back import failed: message and that is that. It doesn't actually create a file in the database or anything along those lines, this all happens through a rest/media/ put request that in turn creates the file from cc_file model which in turn makes a call to MediaService.php which in turn calls analyzer and somewhere the return message from rabbitmq is received and the UI updated and the library etc. Also based upon the presence of ftp-upload-hook.sh in the airtime_analyzer/tools directory and the fact that it basically does a curl put request of a local file to the rest directory I suspect that if we are going to utilize the current methodology vs. rewrite a parallel import service (like this PR does) we need to make a /rest/media call for new files that are detected in a directory. My suspicion is that when airtime.pro added the FTP import process they simply had a script that called tools/ftp-upload-hook.sh on every file in the FTP directory but I could be wrong. It seems like the simplest way to do this in terms of implementation would be to mimic this approach and use the existing import code path as much as possible rather than reinventing the rather complicated wheel. I think it makes more sense to do this with the existing propel classes rather than writing sql to directly modify the database. I'll see if I can figure out a minimalist way of doing this. |
So in testing the ftp-upload-hook.sh script it appears to be working in terms of uploading a file but failing in that it receives a 400 error from the Zend Rest controller. This results in it retrying the upload 5 times and thus uploading the file 5 times everytime the script is ran. I think that stepping through the code and determining why it is returning a 400 as if it were failing is a next step for determining if that is a good route to accomplish the goal. |
So there is something amiss with the curl call in that it sends the POST request and then it evidently sends another request that is considered a bad request that triggers the error. So there likely isn't anything wrong with Zend framework. So my guess is that it may make sense to configure a service that does the POST request from a python service using requests (http://docs.python-requests.org/en/latest/user/quickstart/#post-a-multipart-encoded-file) vs. relying upon a bash script. This could be integrated into the UI and even possibly add with minimal tweaking to the CcFiles pathway could be made to trigger the imports in different directories to be owned by different LibreTime users creating a solution for the request in #453 |
Ok, so I've been researching this further. I think that building something from scratch that uses the existing import process probably makes the most sense, especially since @HaJoHe hasn't had the time to get this working and all of the points about duplicating code and escaping database inserts etc. So my current thought process is to write a service similar to airtime_analyzer but all that it does is pull It then does uses pyinotify (https://github.com/seb-m/pyinotify/wiki/Handling-Events) to watch a directory. For this it would seem like def process_IN_CLOSE(self, event): from pyinotify would work because it is triggered when a file is closed, although this could happen when say a FTP connection closes due to network disconnect, it should usually trigger when a file is finished being copied. I don't think this will trigger for files that are already present in the directory. So we have to take that into consideration as importing existing files is probably useful. Will it work for mounted remote filesystems is an open question as well. |
Ok, so rather than starting from scratch, I'm seeing what modifications can be done to this code to get it working the way I think it should work. Anyone wanting to try to use this code should realize that it basically depends on receiving a message from rabbitmq via test_rabbit.py which is under test/rest_rabbit.py - this seems like a counter intuitive way to kick off the service. If it is running it should just run periodically at a set interval that can be configured via config file (or even the web UI). Next I'm going to modify the import process to actually just send the file to the web via a REST request rather than create a parallel import into the database as the import process setup by this module is currently failing for me and it seems like a bad idea to a bunch of duplicate code and as @kainz pointed out the SQL code isn't escaped and in my analysis airtime_analyzer doesn't actually interface with the database but just responds to import requests triggered by rest/media requests. |
Ok, so my first part refactoring has been done on this branch - (https://github.com/Robbt/libretime/tree/HaJoHe-libretime_watch) |
Just realized one thing. Importing the files that are in a watched folder is different from the premise that this process implemented here and that was simply adding files that are in a directory to libretime without moving them. Basically it is adding another directory wherein anything you place there will be added to the libretime library and its location will remain where it is. I am going to propose a rewrite that basically imports the files and then deletes them from the filesystem and they can reside in the LibreTime directory, but that is fundamentally different from integrating files that are in a directory into the LibreTime library. |
I spent some more time playing around in my branch to see what the quickest way of making the I think this is due to how the mediaController parses the Put response and calls CcFile:updateFromArray, which sets the File Path relative to the storage directory. I could write some more code to create an alternative path for Watched Files here but it all seems pretty fragile and hacky to me at this point. I think it probably just makes more sense for me to code a automatic directory based file import vs. implement the watched folder functionality. |
Ok, I took a stab at implementing the import folder functionality I described above in #514 - it is currently in a working state but needs a little bit more work. |
I have successfully implemented a working version of the watched folders functionality with Python’s watchdog library. It works great, I’ll tie everything up and push a commit so you guys can test the feature, then if you like it I’ll send you a PR. |
@xabispacebiker Were you able to push your final work? |
It's a stand alone program, working directly with the database.
Depending on the file status, it will either
So far there is no way to define a watch directory within Libretime,
so you have to create it by your own directly in the database. (see comment)