Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bringing back the 'watch' function to libretime. #111

Closed
wants to merge 23 commits into from

Conversation

HaJoHe
Copy link
Contributor

@HaJoHe HaJoHe commented Mar 21, 2017

It's a stand alone program, working directly with the database.

Depending on the file status, it will either

  • add a new record into cc_files
  • update some metadata in cc_files
  • skip any analyse of the metadata

So far there is no way to define a watch directory within Libretime,
so you have to create it by your own directly in the database. (see comment)

It's a stand alone program, working directly with the database.

Depending on the file status, it will either

- add a new record into cc_files
- update some metadata in cc_files
- skip any analyse of the metadata

So far there is no way to define a watch directory within Libretime,
so you have to create it by your own directly in the database. (see comment)
@hairmare
Copy link
Member

I can merge my UI work onto this so there is a way to edit.

After that we can add rabbitmq and sysv/upstart/systemd to the mix :)

Restore the /preference/directory-config setting manager. Adding watched dirs seems to work and the also show up on /systemstatus.
@hairmare
Copy link
Member

My commit a0ab8ce re-implements the ui for adding, changing and removing media folders.

@hairmare
Copy link
Member

I don't think there's a way to 👍 on commits so I'm commenting to say this looks very promising!

@HaJoHe
Copy link
Contributor Author

HaJoHe commented Mar 25, 2017 via email

@hairmare
Copy link
Member

Yeah, the individual python processes don't usually have access to the database directly and go through the API. For some of them api_clients has the code to do the actual requests.

If it needs something like a cronjob to work I can help package that for the installer (and make a systemd timer out of it for systemd systems).

@hairmare hairmare modified the milestone: 3.0.0-alpha.1 Mar 26, 2017
@hairmare hairmare modified the milestones: 3.0.0-alpha.1, 3.0.0-alpha.2 Apr 4, 2017
@hairmare hairmare changed the title Proof of concept for bringing back the 'watch' function to libretime. Bringing back the 'watch' function to libretime. Apr 6, 2017
@kainz
Copy link

kainz commented Aug 5, 2018

@Robbt what magic api egg should be in use? python-magic or file-magic?

@Robbt
Copy link
Member

Robbt commented Aug 6, 2018

Cool kainz thanks for working on this.
In #169 we switched to using file-magic - this code was written before that and so it's still using python-magic. It basically worked for the person who wrote it but I found it wasn't properly integrated with systemd and there were issues like the ones you pointed out. So even if it was rather hacky it also didn't work.

@kainz
Copy link

kainz commented Aug 6, 2018 via email

@Robbt
Copy link
Member

Robbt commented Aug 6, 2018

I honestly haven't spent very much time analyzing the analyzer code so I can't offer any recommendations without really spending some time trying to understand how it works. Like most things in this project it was written before the fork and mostly works so we have more or less left it alone.

So the only change that people have mentioned is at some point porting the various python apps to Python 3 as they're still running 2.7 but we almost a year and a half until 2 is EOL and considering the Zend MVC framework we are using is already EOL figuring out a replacement for that is probably a higher and also more time consuming priority.

So yeah you are welcome to come in and help push this PR through by refactoring and rewriting it to use the analyzer functions, this totally makes sense from my point of view. Let me know if you have any specific questions about any parts of the code, although I didn't write it another set of eyeballs can be helpful when trying to figure out how something is supposed to work.

@Robbt Robbt mentioned this pull request Aug 31, 2018
@Robbt
Copy link
Member

Robbt commented Sep 4, 2018

Ok, so I decided to analyze the analyzer code and I have a hunch that the approach taken here is probably unnecessary. We should be using airtime-analyzer and the way that airtime.pro did FTP imports was through sending a rabbit-mq message to airtime-analyzer. In theory this would be the easiest way to add watched folders but I'm not sure until someone writes an implementation that does this.

I've also noticed another issue that I'll report shortly, and that is basically the airtime_analyzer code doesn't move files it fails to import anywhere. It just leaves them in the /store/organize folder. There is a TODO in the code.

If we are going to be importing from a local filesystem using airtime_analyzer then we will definitely need to remove the "problem_files" after an import has been attempted or the folder will get messy.

@Robbt
Copy link
Member

Robbt commented Sep 4, 2018

Ok, I create issue #508 to designate the issue with files that fail to import not being moved. I've also tried to look into how the analyzer imports files and from what I can tell it is basically just processing the files that it finds, checking their metadata etc and moving them (in the case of non s3 file systems) if they pass all of the checks otherwise it kicks back import failed: message and that is that.

It doesn't actually create a file in the database or anything along those lines, this all happens through a rest/media/ put request that in turn creates the file from cc_file model which in turn makes a call to MediaService.php which in turn calls analyzer and somewhere the return message from rabbitmq is received and the UI updated and the library etc.

Also based upon the presence of ftp-upload-hook.sh in the airtime_analyzer/tools directory and the fact that it basically does a curl put request of a local file to the rest directory I suspect that if we are going to utilize the current methodology vs. rewrite a parallel import service (like this PR does) we need to make a /rest/media call for new files that are detected in a directory.

My suspicion is that when airtime.pro added the FTP import process they simply had a script that called tools/ftp-upload-hook.sh on every file in the FTP directory but I could be wrong. It seems like the simplest way to do this in terms of implementation would be to mimic this approach and use the existing import code path as much as possible rather than reinventing the rather complicated wheel.

I think it makes more sense to do this with the existing propel classes rather than writing sql to directly modify the database. I'll see if I can figure out a minimalist way of doing this.

@Robbt
Copy link
Member

Robbt commented Sep 4, 2018

So in testing the ftp-upload-hook.sh script it appears to be working in terms of uploading a file but failing in that it receives a 400 error from the Zend Rest controller. This results in it retrying the upload 5 times and thus uploading the file 5 times everytime the script is ran. I think that stepping through the code and determining why it is returning a 400 as if it were failing is a next step for determining if that is a good route to accomplish the goal.

@Robbt
Copy link
Member

Robbt commented Sep 4, 2018

So there is something amiss with the curl call in that it sends the POST request and then it evidently sends another request that is considered a bad request that triggers the error. So there likely isn't anything wrong with Zend framework. So my guess is that it may make sense to configure a service that does the POST request from a python service using requests (http://docs.python-requests.org/en/latest/user/quickstart/#post-a-multipart-encoded-file) vs. relying upon a bash script. This could be integrated into the UI and even possibly add with minimal tweaking to the CcFiles pathway could be made to trigger the imports in different directories to be owned by different LibreTime users creating a solution for the request in #453

@Robbt
Copy link
Member

Robbt commented Sep 5, 2018

Ok, so I've been researching this further. I think that building something from scratch that uses the existing import process probably makes the most sense, especially since @HaJoHe hasn't had the time to get this working and all of the points about duplicating code and escaping database inserts etc.

So my current thought process is to write a service similar to airtime_analyzer but all that it does is pull
from LibreTime a set of folders to watch.

It then does uses pyinotify (https://github.com/seb-m/pyinotify/wiki/Handling-Events) to watch a directory.
Now here is the tricky part, if we are monitoring the filesystem and we have an automatic import process happening we don't want to say try to import a file while it is still being uploaded.

For this it would seem like def process_IN_CLOSE(self, event): from pyinotify would work because it is triggered when a file is closed, although this could happen when say a FTP connection closes due to network disconnect, it should usually trigger when a file is finished being copied. I don't think this will trigger for files that are already present in the directory. So we have to take that into consideration as importing existing files is probably useful. Will it work for mounted remote filesystems is an open question as well.

@Robbt
Copy link
Member

Robbt commented Sep 7, 2018

Ok, so rather than starting from scratch, I'm seeing what modifications can be done to this code to get it working the way I think it should work.

Anyone wanting to try to use this code should realize that it basically depends on receiving a message from rabbitmq via test_rabbit.py which is under test/rest_rabbit.py - this seems like a counter intuitive way to kick off the service. If it is running it should just run periodically at a set interval that can be configured via config file (or even the web UI).

Next I'm going to modify the import process to actually just send the file to the web via a REST request rather than create a parallel import into the database as the import process setup by this module is currently failing for me and it seems like a bad idea to a bunch of duplicate code and as @kainz pointed out the SQL code isn't escaped and in my analysis airtime_analyzer doesn't actually interface with the database but just responds to import requests triggered by rest/media requests.

@Robbt
Copy link
Member

Robbt commented Sep 7, 2018

Ok, so my first part refactoring has been done on this branch - (https://github.com/Robbt/libretime/tree/HaJoHe-libretime_watch)
The code now successfully pushes a file to the rest API.
So it isn't working completely at this point as it will continue to import the same files over and over again everytime the test_rabbit.py command is ran.
I just wanted to share the progress that I made thus far.

@Robbt
Copy link
Member

Robbt commented Sep 7, 2018

Just realized one thing. Importing the files that are in a watched folder is different from the premise that this process implemented here and that was simply adding files that are in a directory to libretime without moving them. Basically it is adding another directory wherein anything you place there will be added to the libretime library and its location will remain where it is.

I am going to propose a rewrite that basically imports the files and then deletes them from the filesystem and they can reside in the LibreTime directory, but that is fundamentally different from integrating files that are in a directory into the LibreTime library.

@Robbt
Copy link
Member

Robbt commented Sep 9, 2018

I spent some more time playing around in my branch to see what the quickest way of making the
existing REST based call just add the file to the database with appropriate analysis but not actually move the file. It involved adding a watchedFolder parameter and modifying the analyzer in ways that were less than elegant and there is still an issue with the final file path not being set even as it is added to the database.

I think this is due to how the mediaController parses the Put response and calls CcFile:updateFromArray, which sets the File Path relative to the storage directory. I could write some more code to create an alternative path for Watched Files here but it all seems pretty fragile and hacky to me at this point. I think it probably just makes more sense for me to code a automatic directory based file import vs. implement the watched folder functionality.

@Robbt
Copy link
Member

Robbt commented Sep 9, 2018

Ok, I took a stab at implementing the import folder functionality I described above in #514 - it is currently in a working state but needs a little bit more work.

@xabispacebiker
Copy link
Contributor

I have successfully implemented a working version of the watched folders functionality with Python’s watchdog library. It works great, I’ll tie everything up and push a commit so you guys can test the feature, then if you like it I’ll send you a PR.

@jooola jooola added status: pinned This issue or pull request won't stale and removed airtime-upgrade-path labels Sep 18, 2021
@zklosko
Copy link
Member

zklosko commented Dec 13, 2021

@xabispacebiker Were you able to push your final work?

@paddatrapper paddatrapper removed the status: pinned This issue or pull request won't stale label Mar 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.