Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use proper lock directory #60

Closed
mnlipp opened this issue Jan 8, 2016 · 33 comments
Closed

Use proper lock directory #60

mnlipp opened this issue Jan 8, 2016 · 33 comments
Labels
Milestone

Comments

@mnlipp
Copy link
Contributor

mnlipp commented Jan 8, 2016

It is a nice attempt to provide the native parts of the serial interface embedded in the jar. However, you should be aware that this moves the burden of keeping things "proper and up-to-date" to you, the maintainers of this package. RXTXComm, in contrary, is available for all Linux distributions that I know of but maintained by their packagers.

At least Fedora and Arch have changed the directory for the device lock files to "/var/lock/lockdev" with the owner/mode for "/var/lock" having been changed to drwxr-xr-x. 7 root root /var/run/lock (link to /run/lock) (about 3 years ago). (Ubuntu is in a kind of "transition phase" with owner/mode set to drwxrwxrwt 8 root root /var/run/lock (link to /run/lock) and a "/var/lock/lockdev" directory.) This has been done for security reasons and should not be undone light-heartedly.

While package maintainers for RXTXComm have adapted this change and the RXTXComm packages work fine on these systems, the libraries provided with nrjavaserial still attempt to create the lock in "/var/lock" which fails, of course, despite the process being a member of groups uucp/tty and lock.

This is not only a technical issue. It also makes discussions of problems with the serial interface difficult, as I have just experienced. Because nrjavaserial claims to have the native support "on board" Java developers with little background in OS specifics tend to ignore related problems and tend to call you ignorant or simply state "works for me" (because they use Windows or Ubuntu). I'm emphasizing on this because the problem has arisen by nrjavaserial's claim to be the out-of-box solution instead of sticking to the well working principal that package maintainers are responsible for the native libraries.

Currently, nrjavaserial does not work on Fedora or Arch Linux because it uses the wrong lock directory. This should be changed. Maybe the best approach would be to replace the "static" definition of LOCKDEV with a more flexible one for "__linux__" (see SerialImp.h): check if "/var/lock/lockdev" exists and is writable, else fall back to "/var/lock" (because not all Linuxes are equal).

@madhephaestus
Copy link
Member

SInce you're intimately aware of the problem, could you send a pull request with an appropriate fix?

We are an active project and would like to maintain usability across as many platforms as Java supports (as a goal). Direct implementation conflicts across distros of Linux are super hard to check for, especially since our auto builder is already over the allowable runtime for a Travis CI instance (2 hours). We depend heavily on a broad coalition of developers to be aware of and maintain support for distros beyond Ubuntu/Debian.

The old model is more modular, sure, and modularity makes it easier to have developers specializing in architectures, but the difficulty of actually making a compatible full-system release, to build all the packages and their OS specific installers, made that project stagnate. It also limited the usefulness of the library by forcing Java developers to have to build installers for their apps in order to release them. Even worse (and this was the normal case for RXTX) users would get an app working with the OS dep installed on their system, then release it without consideration of the JNI slib, meaning a generally broken app being released. The self-extracting model allows developers that use NrJavaSerial to be sure if the jar works in one place, it will work more or less anywhere.

I am much more comfortable making NrJavaSerial more difficult to maintain (we have to deal with os weirdness) then forcing that difficulty onto the users of NrJavaSerial.

@mnlipp
Copy link
Contributor Author

mnlipp commented Jan 10, 2016

This is not a simple patch, since it involves a design decision.

Assuming that you are targeting common linux distributions, I'd personally recommend to use liblockdev (should be available in all distribution, I have just made sure for Fedora, Ubuntu and Arch). It was design 1997-1999 to overcome the different approaches to handle locking devices. Support for this library is already in RXTXComm 2.2pre2, the latest version, that is commonly used for building the corresponding packages in linux distributions.

Strangely enough, the support for liblockdev is missing in the rxtxcomm sources that you use in nrjavaserial. I don't know whether you have started with an older version or deliberately removed it.
For the original sources, it was sufficient to add -DLIBLOCKDEV when compiling in order to activate the usage of the lockdev library.

You should use the lockdev library to ensure a common behavior of all programs on the system. Of course, you could also "get around" the problem by using the dirty hack that I have sketched when submitting the issue. But assume that you decide to check for /var/lock first and then /var/lock/lockdev and another program (depending on the OS flavor) does it just the other way round, then both programs could acquire the lock. This is why programs using device locking should use liblockdev (just checked: in fedora ckermit, minicom, uucp (the usual culprits) and libgphoto2 all depend on liblockdev).

I can only speculate on why RXTXComm doesn't use liblockdev "by default". Since, according to the copyright notice, it was started in 1997 just like liblockdev, probably liblockdev wasn't generally available. Second, from the comment on the original source: "This is for use with liblockdev wich comes with Linux distros. I suspect it will be problematic with embedded Linux". Well, okay. But if I want to support an "embedded Linux system" that doesn't have liblockdev, then it is not unreasonable to assume that it doesn't use /var/lock[/lockdev] either. So I'll have to find a special solution for this anyway (this is where all the other options for LOCK and UNLOCK offered in RXTXComm should be considered).

So, the question is, why don't you use liblockdev?

@madhephaestus
Copy link
Member

There is no reason other than we forked at 2.1 or maybe 2.2pre1, in any case before liblockdev was added. I would Love to get our build up to the most feature complete possible.

Since you read both sources and noticed the difference, do you think you could update our approach to locking? Your help here would be greatly appreciated by the whole community.

@mnlipp
Copy link
Contributor Author

mnlipp commented Jan 10, 2016

Okay, but due to time constraints it will take a few days until I have something ready.

@madhephaestus madhephaestus added this to the 3.12.0 milestone Jan 11, 2016
@mnlipp
Copy link
Contributor Author

mnlipp commented Jan 13, 2016

I was wrong about liblockdev support having been removed, somehow I overlooked the functions. Anyway, the only required changes are in the Makefile. Here is the patch.

liblockdev-patch.txt

I have enabled the liblockdev support for the systems where it should work (i.e. where I could check that liblockdev is available). I could only test it on Fedora and Ubuntu (x86_64). With the systems I have available, I should be able to cross-compile it for linux/arm and test it on Arch Linux for Raspberry Pi, but you haven't specified which packages need to be installed to make that "make arm" work. And my choice of "obvious packages" didn't provide the required tools.

@mnlipp
Copy link
Contributor Author

mnlipp commented Jan 15, 2016

As I haven't found any gcc cross compiler that accepts the "--with-arch=..." flag that you use in the Makefile to specify the target architecture, I have simply build the native library on my Rasperry Pi. So I can confirm now that using liblockdev also works on the Raspberry Pi (Arch Linux).

@kolaf
Copy link

kolaf commented Feb 20, 2016

Hi,

I'm using Openhab 2 to connect to several serial devices (zwave dongle, home-made mysensor Gateway, and an rfxcom. All these connect through emulated USB serial ports. The problem is that it appears to be only able to connect to 2 of them at a time. Connecting to the third device gives the following error message:

14:34:15.756 [INFO ] [ing.zwave.handler.ZWaveSerialHandler] - Connecting to serial port '/dev/ttyACM0'
14:34:15.757 [INFO ] [me.event.ThingStatusInfoChangedEvent] - 'zwave:serial_zstick:55200a05' changed from UNINITIALIZED to INITIALIZING
RXTX fhs_lock() Error: opening lock file: /var/lock/LCK..mysensorsUSB: File exists.14:34:15.760           [INFO ] [me.event.ThingStatusInfoChangedEvent] - 'zwave:serial_zstick:55200a05' changed from INITIALIZING to ONLINE
 It is mine

`� testRead() Lock file failed
14:34:15.762 [INFO ] [me.event.ThingStatusInfoChangedEvent] - 'zwave:serial_zstick:55200a05' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Serial Error: Port /dev/ttyACM0 does not exist
14:34:15.764 [INFO ] [smarthome.event.ThingAddedEvent     ] - Thing 'zwave:serial_zstick:55200a05' has been added.

It looks like it's referencing the lock file for another USB device than the one it is actually communicating with.

I do not know if this is relevant for this problem, but I was sent here from https://github.com/openhab/openhab2-addons/issues/671

@mnlipp
Copy link
Contributor Author

mnlipp commented Feb 20, 2016

@kolaf Well, you're wrong here. This is about nrjavaserial trying to create lock files in the wrong directory on "modern" Linux distributions and how to handle this. The issue seems to have come to some kind of "halt" although I've provided a fix -- don't know why. Once the fix is applied AND the new version of nbjavaserial has made its way into openhab2 I'll hopefully be able to get openhab2 running on my system and get some more experience with it. Nevertheless I'll have a look at your openhab2 issue and maybe comment there.

@kolaf
Copy link

kolaf commented Feb 20, 2016

Sorry for the mixup, but thanks for looking into it.

@madhephaestus
Copy link
Member

I just merged in the fix, sorry for the delay, i have a lot of projects i manage and miss things sometimes. @MrDOS I was thinking about doing a release of 3.12.0 soon, where are we with the auto-build?

I would like to get this released and fix this issue once and for all.

@madhephaestus
Copy link
Member

Ok, here is a pre-release for testing, let me know if this version fixes your issue and i will work with my team to get a full release our and published to Maven. https://github.com/NeuronRobotics/nrjavaserial/releases/tag/3.12.0

@mnlipp
Copy link
Contributor Author

mnlipp commented Mar 4, 2016

I just tried the pre-release with openhab. Doesn't work, still uses the wrong lock directory. Going to try to find out now what went wrong.

@mnlipp
Copy link
Contributor Author

mnlipp commented Mar 4, 2016

The pre-release doesn't work for me because I'm using an ARM architecture (Raspberry Pi 2). You have used my pull request (#65) unchanged. However, I did mention in the pull request that "The committed native libs are up to date and tested for armv[67](_HF)". Which implies that they are not up to date for the other architectures, especially not ARMv5, because I have neither a native environment nor a cross-compiler for that one (the cross-gcc I have doesn't seem to support this architecture). Actually, I assumed that you would rebuild from scratch before releasing anything.

Now, whoever knows the ARMs "by heart" might wonder why the ARMv5 library gave me troubles on the Raspberry Pi. Well this is due to the fact that the ARMv5 library is actually the only really used library on ARM processors (see #47). Maybe it would be a good idea to include a patch for this (#69) into the next release as well (sorry for cross referencing issues, but in that case they are related).

To openhab users/developers following this issue: to make sure that things also work "downstream" I overwrote the ARMv5 native lib with the ARMv6 native lib in the jar (so that effectively the ARMv6 library would be loaded in my RP) and put the jar in the org.openhab.io.transport.serial bundle. With this, my RFXcomm works.

@dennisausbremen
Copy link

Do you have a link to the compiled org.openhab.io.transport.serial.jar for the Pi to be used in OH2 addons?

@mnlipp
Copy link
Contributor Author

mnlipp commented Mar 7, 2016

I've understood the cross-compilation issues a bit better by now. So here https://github.com/mnlipp/nrjavaserial/releases is a pre-release that has some more working native libraries.

I don't want to compete with the nrjavaserial project, especially since I'm still not convinced of the approach to provide one jar for all systems instead of providing java serial connectivity as system specific packages (as it is done with rxtx). But I needed to get this running in order to be able to try out openhab (although, by now I think it would have been faster to replace nrjavaserial with rxtx in openhab), and since the project team doesn't seem to have much time and at least one more openhab issue seems to depend on this, I decided to provide a release in my forked project. I'm going to delete that as soon as there's an update available in the (original) nrjavaserial project.

@andyrozman
Copy link

Actually I think that using one jar for all system is definitelly better idea, that the way RXTX (and most libraries requiring native libraries are) is...

Users of my application are mostly people, that are end users, with little or no computer knowledge, and explaining to someone like that, that he should put correct native file into specific directory is real pain in the a**. Having native library outside of jar is good if you have user that knows what to do with it (and where to get better suited library for his/hers system or how to recompile it), but for end users this is big no-go.

@mnlipp
Copy link
Contributor Author

mnlipp commented Mar 7, 2016

Of course, I understand the idea behind it. But still, your user has to install a java runtime, liblockdev (in Unix environments) and -- depending on your application -- maybe other support software/libraries. Even end users are accustomed to klick on an installer (Windows; installer handles dependencies by invoking sub-installers [ever seen what get's also installed by some games?]) or copy/paste some package manager invocation (Linux; package manager pulling dependent packages automatically). And rxtx (or some replacement) is just one more dependency. Note that I was taking about installable packages (as they are available for rxtx), not about copying files around by hand; we certainly shouldn't have that anymore.

Anyway, I didn't want to start a discussion about NeuronRobotics approach, I just wanted to emphasize why my contribution is temporary. Let's simply see how successful NeuronRobotics will be in providing a distribution with multi-system native components on the long run. For me, it definitely caused trouble, because they hadn't considered support for "modern, secure locking" and I therefore couldn't get openhab running on Arch Linux. Certainly not the environment for the casual end user -- but not that exotic either and showing the difficulties of the approach.

@dennisausbremen
Copy link

Well that's progress. Thank you for the prerelease.
Problem is now i get
java.lang.NoClassDefFoundError: Could not initialize class gnu.io.RXTXCommDriver thrown while loading gnu.io.RXTXCommDriver
and directly afterwards
[ERROR] [echno.internal.CULIntertechnoBinding] - Can't open CUL org.openhab.io.transport.cul.CULDeviceException: gnu.io.NoSuchPortException

@mnlipp
Copy link
Contributor Author

mnlipp commented Mar 10, 2016

What architecture/OS do you use?

@dennisausbremen
Copy link

I'm running Openhab 2.0.0-b2 and Homegear on an rPi2 (Raspbian Jessie) with 2 USB-Cul Sticks (868 / 433 MhZ)

@MrDOS
Copy link
Contributor

MrDOS commented Mar 10, 2016

@mnlipp Did you statically link against liblockdev?

@dennisausbremen Out of curiosity, could you please install the package liblockdev1 and try again (or let us know if it was already installed)?

@dennisausbremen
Copy link

Okay, so I installed liblockdev1 and now the exceptions / warnings are gone. So not statically linked.
But to completely get the communication working, i have to screen /dev/ttyUSB1 38400 and quit out of screen to get the signals going.

But it works! ^^

@mnlipp
Copy link
Contributor Author

mnlipp commented Mar 10, 2016

@MrDOS Of course I didn't link statically. If you linked statically, you wouldn't solve the initial problem.

Let me summarize. The problem is that strategies for handling locks differs between systems and their versions for historical reasons. So not only do different Unix/Linux versions use different names for the lock files, they also use different directories where they put the files. And all this also changes over time. Newer Ubuntu versions (for example) recommend a different directory than older ones (but still allow you to use the old one). The general recommendation from Linux developers is to enforce that new directory (for security reasons). Arch Linux (in some respects the most modern Linux distribution) enforces the new directory in its "current release" already. Handling lock files is therefore a system dependent detail just like e.g. the resolution of host names. Contrary to resolution of host names, the API for handling lock files is not part of the standard C library, it is provided by the UNIX specific lockdev library. This library is available as shared library and kept up to date by the OS/distribution maintainers just like the standard C library.

Therefore, if you want to make sure that your program uses the appropriate locking strategy, you use that API and its up to date implementation (the current version of the dynamic library). Statically linking with liblockdev binds your program to some system and version specific implementation which may not be appropriate for (or out of date on) the system where you're running your program. It's similar to why you link dynamically with the standard C library. It makes sure that you use the up to date implementation (and the "fitting" implementation) on the system where you run your program on. (Okay, linking with a dynamic library also reduces the size of your program, but that's a different point.)

Now, why did things go "rather well" until I found the problem on Arch Linux and started this issue? The liblockdev was introduced only 1997 and became "widespread" about 1999. Up to then, programmers who wanted their programs to be portable had to implement (and keep up to date) the lock strategy depending on the system the program was compiled for. You find a lot of (preprocessor) code for handling locking in RXTX, which dates back to about the same time and strives to support systems where liblockdev is not yet available (or will likely never become available -- the RXTX source mentions "embedded systems" as example, see my comment from January, 10th). If you really tested it, you'd find that RXTX's "self implemented locking" (and therefore nrjavaserial before my patch) doesn't really work on most modern systems. I.e. if your Java program opens a serial port and you open that same serial port with e.g. minicom, the conflict will not be detected on all systems.

Effectively not preventing other programs from using the same serial device may not cause big headaches. But my problem with RXTX's own locking implementation running on a modern Arch Linux is that it prevents the serial device from being used at all, because the "historic" locking implementation from RXTX doesn't work on a modern Arch Linux any more. It causes RXTX to report an error instead of opening the device. Now there are two ways to go (this is why I emphasized in my comment from January, 10th that the fix involves a design decision -- maybe I was too brief then). Either you rely on the liblockdev that is dynamically bound at runtime on the system where your program runs to do locking as is appropriate for the system (my preferred approach). Or you keep the locking code of RXTX up to date and make sure that it runs an all target platforms (preferably really doing locking, i.e. preventing other programs from accessing the same device). Considering the variety of locking strategies and their changes in systems over time, I consider that a major effort which I personally would never attempt to undertake.

@dennisausbremen
Copy link

Any updates to this issue?

@tarioch
Copy link

tarioch commented Mar 29, 2016

@dennisausbremen did you use my jars for openhab cul and intertechno? That should fix the issue with needing a screen first. If not, let's continue the discussion here https://community.openhab.org/t/testing-help-needed-for-cul-refactoring/8037 or here openhab/openhab1-addons#3885 as I'm pretty sure this should only be related to openhab and not to the nrjavaserial issue.

@Pragmataraxia
Copy link

Building the head revision (with lockdev) seems to solve the problem on Arch Linux. Do you have any idea when this version might make it to Maven?

@MrDOS
Copy link
Contributor

MrDOS commented Apr 20, 2016

Whenever I get around to fixing it ;)

However, it looks like liblockdev is on the way out: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=728023 I need to dig into the native code and see where liblockdev is used (and how) and see if the transition to flock(2) is straightforward. If possible, this would be preferable to the use of liblockdev as it doesn't introduce further runtime dependencies and doesn't leave us right next to the chopping block.

@bengtmartensson
Copy link

Just wanted to report that the 3.12.0 release (both the jar here and in Maven repo) does not work with Fedora23 (x64). It is not linked with liblockdev:

$ ldd native/linux/x86_64/libNRJavaSerial.so             
ldd: warning: you do not have execution permission for `native/linux/x86_64/libNRJavaSerial.so'
    linux-vdso.so.1 (0x00007ffe43df3000)
    libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f62aecbe000)
    libm.so.6 => /lib64/libm.so.6 (0x00007f62ae9bc000)
    libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f62ae7a4000)
    libc.so.6 => /lib64/libc.so.6 (0x00007f62ae3e3000)
    /lib64/ld-linux-x86-64.so.2 (0x000055a167032000)

If I compile it myself (again on F23) it will be linked with liblockdev

$ ldd ./build/resources/main/native/linux/x86_64/libNRJavaSerial.so
    linux-vdso.so.1 (0x00007fff3cdfb000)
    liblockdev.so.1 => /lib64/liblockdev.so.1 (0x00007f9e4d574000)     <--------------------
    libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f9e4d1f2000)
    libm.so.6 => /lib64/libm.so.6 (0x00007f9e4ceef000)
    libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f9e4ccd8000)
    libc.so.6 => /lib64/libc.so.6 (0x00007f9e4c917000)
    /lib64/ld-linux-x86-64.so.2 (0x0000564505e08000)

@thigg
Copy link

thigg commented Nov 16, 2019

Was there any progress on this issue in the last years?

@madhephaestus
Copy link
Member

i have liblockfile installed and can not get the old -DLIBLOCKDEV code to compile. It was only being used on the linux systems and was causing compatibility issues as it was changed at the OS level. I would be happy to revisit this issue in #60 but would like the fix to be cross platform and not depend on a system dep. Can we re-start this discussion in a cross-platform way?

@madhephaestus
Copy link
Member

This feature has been entirely rolled back

@mnlipp
Copy link
Contributor Author

mnlipp commented May 6, 2020

Closing it doesn't solve it, does it?

@madhephaestus
Copy link
Member

To re-start this discussion propose a solution that can be implemented across platforms in a uniform way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants