Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bringing back the 'watch' function to libretime. #111

Open
wants to merge 23 commits into
base: master
from

Conversation

Projects
None yet
8 participants
@HaJoHe
Copy link
Contributor

commented Mar 21, 2017

It's a stand alone program, working directly with the database.

Depending on the file status, it will either

  • add a new record into cc_files
  • update some metadata in cc_files
  • skip any analyse of the metadata

So far there is no way to define a watch directory within Libretime,
so you have to create it by your own directly in the database. (see comment)

Proof of concept for bringing back the 'watch' function to libretime.
It's a stand alone program, working directly with the database.

Depending on the file status, it will either

- add a new record into cc_files
- update some metadata in cc_files
- skip any analyse of the metadata

So far there is no way to define a watch directory within Libretime,
so you have to create it by your own directly in the database. (see comment)
@hairmare

This comment has been minimized.

Copy link
Member

commented Mar 21, 2017

I can merge my UI work onto this so there is a way to edit.

After that we can add rabbitmq and sysv/upstart/systemd to the mix :)

Restore Media Folders UI
Restore the /preference/directory-config setting manager. Adding watched dirs seems to work and the also show up on /systemstatus.
@hairmare

This comment has been minimized.

Copy link
Member

commented Mar 21, 2017

My commit a0ab8ce re-implements the ui for adding, changing and removing media folders.

@hairmare

This comment has been minimized.

Copy link
Member

commented Mar 24, 2017

I don't think there's a way to 👍 on commits so I'm commenting to say this looks very promising!

@HaJoHe

This comment has been minimized.

Copy link
Contributor Author

commented Mar 25, 2017

@hairmare

This comment has been minimized.

Copy link
Member

commented Mar 25, 2017

Yeah, the individual python processes don't usually have access to the database directly and go through the API. For some of them api_clients has the code to do the actual requests.

If it needs something like a cronjob to work I can help package that for the installer (and make a systemd timer out of it for systemd systems).

@hairmare hairmare modified the milestone: 3.0.0-alpha.1 Mar 26, 2017

HaJoHe added some commits Mar 31, 2017

@hairmare hairmare modified the milestones: 3.0.0-alpha.1, 3.0.0-alpha.2 Apr 4, 2017

@hairmare hairmare changed the title Proof of concept for bringing back the 'watch' function to libretime. Bringing back the 'watch' function to libretime. Apr 6, 2017

@Robbt

This comment has been minimized.

Copy link
Member

commented Dec 1, 2017

Ok I managed to install it but the service doesn't start
libretime_watch.serviceFailed to start libretime_watch.service: Unit libretime_watch.service failed to load: No such file or directory.

@Cysioland

This comment has been minimized.

Copy link

commented Apr 18, 2018

This one would be useful, we have loads of tracks we need to import, and the easiest way to do this is a plain ol' copy.

@Numerico

This comment has been minimized.

Copy link

commented Apr 30, 2018

Hi, I'm interested in bringing this feature back, and can probably help coding.
What would be necessary to complete this work?

@HaJoHe

This comment has been minimized.

Copy link
Contributor Author

commented May 1, 2018

@Robbt

This comment has been minimized.

Copy link
Member

commented May 5, 2018

I believe that there are a few small changes such as python.magic vs file.magic. Basically it needs to be tested to work out of the box via the installer.

@kainz

This comment has been minimized.

Copy link

commented Aug 5, 2018

So I tried testing this with the pull rebased to master, and there are issues.

#1: routing key doesn't match what the PHP app sends/sets up, and this watcher does not gracefully fail on the other watch_* messages.

#2: the 'watcher' doesn't really match the behavior in analyzer well. It would probably be better to structer analyzer's FileMoveAnalyzer to only optionally move files, and teach the FileUpload status api about watchfolder imports. Alternatively, maybe look at pulling in those functions from Analyzer as a library.

#3: In addition, the SQL stuff and some of the manual metadata stuff needs some considerable hardening. Parameter binding needs to be used instead of direct string building pretty much everywhere. I'll have a patch up in a sec that shows what I needed to get this to run off master.

See my changes in kainz@94d8fa3 . Disregard the dummying out of silan and replaygain, my testing platform right now is a raspberry pi3 so those steps are a bit slow for me.

@kainz

This comment has been minimized.

Copy link

commented Aug 5, 2018

@Robbt what magic api egg should be in use? python-magic or file-magic?

@Robbt

This comment has been minimized.

Copy link
Member

commented Aug 6, 2018

Cool kainz thanks for working on this.
In #169 we switched to using file-magic - this code was written before that and so it's still using python-magic. It basically worked for the person who wrote it but I found it wasn't properly integrated with systemd and there were issues like the ones you pointed out. So even if it was rather hacky it also didn't work.

@kainz

This comment has been minimized.

Copy link

commented Aug 6, 2018

@Robbt

This comment has been minimized.

Copy link
Member

commented Aug 6, 2018

I honestly haven't spent very much time analyzing the analyzer code so I can't offer any recommendations without really spending some time trying to understand how it works. Like most things in this project it was written before the fork and mostly works so we have more or less left it alone.

So the only change that people have mentioned is at some point porting the various python apps to Python 3 as they're still running 2.7 but we almost a year and a half until 2 is EOL and considering the Zend MVC framework we are using is already EOL figuring out a replacement for that is probably a higher and also more time consuming priority.

So yeah you are welcome to come in and help push this PR through by refactoring and rewriting it to use the analyzer functions, this totally makes sense from my point of view. Let me know if you have any specific questions about any parts of the code, although I didn't write it another set of eyeballs can be helpful when trying to figure out how something is supposed to work.

@Robbt Robbt referenced this pull request Aug 31, 2018

Closed

Import files in a batch #502

@Robbt

This comment has been minimized.

Copy link
Member

commented Sep 4, 2018

Ok, so I decided to analyze the analyzer code and I have a hunch that the approach taken here is probably unnecessary. We should be using airtime-analyzer and the way that airtime.pro did FTP imports was through sending a rabbit-mq message to airtime-analyzer. In theory this would be the easiest way to add watched folders but I'm not sure until someone writes an implementation that does this.

I've also noticed another issue that I'll report shortly, and that is basically the airtime_analyzer code doesn't move files it fails to import anywhere. It just leaves them in the /store/organize folder. There is a TODO in the code.

If we are going to be importing from a local filesystem using airtime_analyzer then we will definitely need to remove the "problem_files" after an import has been attempted or the folder will get messy.

@Robbt

This comment has been minimized.

Copy link
Member

commented Sep 4, 2018

Ok, I create issue #508 to designate the issue with files that fail to import not being moved. I've also tried to look into how the analyzer imports files and from what I can tell it is basically just processing the files that it finds, checking their metadata etc and moving them (in the case of non s3 file systems) if they pass all of the checks otherwise it kicks back import failed: message and that is that.

It doesn't actually create a file in the database or anything along those lines, this all happens through a rest/media/ put request that in turn creates the file from cc_file model which in turn makes a call to MediaService.php which in turn calls analyzer and somewhere the return message from rabbitmq is received and the UI updated and the library etc.

Also based upon the presence of ftp-upload-hook.sh in the airtime_analyzer/tools directory and the fact that it basically does a curl put request of a local file to the rest directory I suspect that if we are going to utilize the current methodology vs. rewrite a parallel import service (like this PR does) we need to make a /rest/media call for new files that are detected in a directory.

My suspicion is that when airtime.pro added the FTP import process they simply had a script that called tools/ftp-upload-hook.sh on every file in the FTP directory but I could be wrong. It seems like the simplest way to do this in terms of implementation would be to mimic this approach and use the existing import code path as much as possible rather than reinventing the rather complicated wheel.

I think it makes more sense to do this with the existing propel classes rather than writing sql to directly modify the database. I'll see if I can figure out a minimalist way of doing this.

@Robbt

This comment has been minimized.

Copy link
Member

commented Sep 4, 2018

So in testing the ftp-upload-hook.sh script it appears to be working in terms of uploading a file but failing in that it receives a 400 error from the Zend Rest controller. This results in it retrying the upload 5 times and thus uploading the file 5 times everytime the script is ran. I think that stepping through the code and determining why it is returning a 400 as if it were failing is a next step for determining if that is a good route to accomplish the goal.

@Robbt

This comment has been minimized.

Copy link
Member

commented Sep 4, 2018

So there is something amiss with the curl call in that it sends the POST request and then it evidently sends another request that is considered a bad request that triggers the error. So there likely isn't anything wrong with Zend framework. So my guess is that it may make sense to configure a service that does the POST request from a python service using requests (http://docs.python-requests.org/en/latest/user/quickstart/#post-a-multipart-encoded-file) vs. relying upon a bash script. This could be integrated into the UI and even possibly add with minimal tweaking to the CcFiles pathway could be made to trigger the imports in different directories to be owned by different LibreTime users creating a solution for the request in #453

@Robbt

This comment has been minimized.

Copy link
Member

commented Sep 5, 2018

Ok, so I've been researching this further. I think that building something from scratch that uses the existing import process probably makes the most sense, especially since @HaJoHe hasn't had the time to get this working and all of the points about duplicating code and escaping database inserts etc.

So my current thought process is to write a service similar to airtime_analyzer but all that it does is pull
from LibreTime a set of folders to watch.

It then does uses pyinotify (https://github.com/seb-m/pyinotify/wiki/Handling-Events) to watch a directory.
Now here is the tricky part, if we are monitoring the filesystem and we have an automatic import process happening we don't want to say try to import a file while it is still being uploaded.

For this it would seem like def process_IN_CLOSE(self, event): from pyinotify would work because it is triggered when a file is closed, although this could happen when say a FTP connection closes due to network disconnect, it should usually trigger when a file is finished being copied. I don't think this will trigger for files that are already present in the directory. So we have to take that into consideration as importing existing files is probably useful. Will it work for mounted remote filesystems is an open question as well.

@Robbt

This comment has been minimized.

Copy link
Member

commented Sep 7, 2018

Ok, so rather than starting from scratch, I'm seeing what modifications can be done to this code to get it working the way I think it should work.

Anyone wanting to try to use this code should realize that it basically depends on receiving a message from rabbitmq via test_rabbit.py which is under test/rest_rabbit.py - this seems like a counter intuitive way to kick off the service. If it is running it should just run periodically at a set interval that can be configured via config file (or even the web UI).

Next I'm going to modify the import process to actually just send the file to the web via a REST request rather than create a parallel import into the database as the import process setup by this module is currently failing for me and it seems like a bad idea to a bunch of duplicate code and as @kainz pointed out the SQL code isn't escaped and in my analysis airtime_analyzer doesn't actually interface with the database but just responds to import requests triggered by rest/media requests.

@Robbt

This comment has been minimized.

Copy link
Member

commented Sep 7, 2018

Ok, so my first part refactoring has been done on this branch - (https://github.com/Robbt/libretime/tree/HaJoHe-libretime_watch)
The code now successfully pushes a file to the rest API.
So it isn't working completely at this point as it will continue to import the same files over and over again everytime the test_rabbit.py command is ran.
I just wanted to share the progress that I made thus far.

@Robbt

This comment has been minimized.

Copy link
Member

commented Sep 7, 2018

Just realized one thing. Importing the files that are in a watched folder is different from the premise that this process implemented here and that was simply adding files that are in a directory to libretime without moving them. Basically it is adding another directory wherein anything you place there will be added to the libretime library and its location will remain where it is.

I am going to propose a rewrite that basically imports the files and then deletes them from the filesystem and they can reside in the LibreTime directory, but that is fundamentally different from integrating files that are in a directory into the LibreTime library.

@Robbt

This comment has been minimized.

Copy link
Member

commented Sep 9, 2018

I spent some more time playing around in my branch to see what the quickest way of making the
existing REST based call just add the file to the database with appropriate analysis but not actually move the file. It involved adding a watchedFolder parameter and modifying the analyzer in ways that were less than elegant and there is still an issue with the final file path not being set even as it is added to the database.

I think this is due to how the mediaController parses the Put response and calls CcFile:updateFromArray, which sets the File Path relative to the storage directory. I could write some more code to create an alternative path for Watched Files here but it all seems pretty fragile and hacky to me at this point. I think it probably just makes more sense for me to code a automatic directory based file import vs. implement the watched folder functionality.

@Robbt

This comment has been minimized.

Copy link
Member

commented Sep 9, 2018

Ok, I took a stab at implementing the import folder functionality I described above in #514 - it is currently in a working state but needs a little bit more work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.