Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matlab license error on HAL? #339

Open
julielyang opened this issue Nov 12, 2015 · 48 comments
Open

Matlab license error on HAL? #339

julielyang opened this issue Nov 12, 2015 · 48 comments
Assignees

Comments

@julielyang
Copy link

Hi,

I tried batch running matlab with this call:

/opt/matlab/R2013a/bin/matlab < ./code/AffinityRegression/RunPvaluesNullDistribution/GeneratePvalues/runGeneratePvaluesAnalysis4.m

I got this license checkout error that I am posting below.

Then I just ran my code again and it ran. There are times my code works and times where I get the licensing error below.

My guess is that there was another node running the license I was trying to use. Could you help me figure out what was this error and how to run matlab all the time without getting the license error?

Julie

License checkout failed.
License Manager Error -9
This error may occur when:
-The hostid of this computer does not match the hostid in the license file.
-A Designated Computer installation is in use by another user.
If no other user is currently running MATLAB, you may need to activate.

Troubleshoot this issue by visiting:
http://www.mathworks.com/support/lme/R2013a/9

Diagnostic Information:
Feature: MATLAB
License path: /cbio/cllab/home/jly/.matlab/R2013a_licenses:/opt/matlab/R2013a/licenses/license.dat:/opt/matlab/R20
13a/licenses/*.lic
Licensing error: -9,57.
Warning: No display specified. You will not be able to display graphics on the screen.

@akahles
Copy link

akahles commented Nov 12, 2015

The node licenses are bound to the node. That is at most one user can use matlab at a given node at the same time. So if somebody uses matlab on the node already, you would get the error above.

There is a small script available to tell you who is using matlab on which node:

/cbio/shared/software/tools/torque/q-lic

However, this is not to be used for a regular high-throughput query as it ssh's into the nodes to find out.

On a long term the better solution would be if the scheduler knows about who is using which license on which node and then assigns jobs accordingly. I think there was a previous issue about this, but it turned out that this is not quite trivial to achieve.

@tatarsky
Copy link
Contributor

The above is correct. The node locked license/scheduler integration was looking very difficult. The request remains but I've not had time to delve into it and may not be able to actually solve it. #193

@tatarsky
Copy link
Contributor

And actually the better solution is floating licenses. Not node locked.

@akahles
Copy link

akahles commented Nov 12, 2015

That is true, but this comes with a price tag :)

@kuod
Copy link

kuod commented Nov 12, 2015

IIRC, we have 5 floating licenses on the head node. Also, another idea is to see if your script can run in the open source software octave which is not bound by price tag or licenses.

@tatarsky
Copy link
Contributor

I thought those were for toolkits (the floating ones).

@tatarsky
Copy link
Contributor

Yes, and I know it comes with a price. I'm looking at my assumptions for the integration of the scheduler and the licenses however again as time has passed and I'm older and perhaps wiser now.

@tatarsky
Copy link
Contributor

So we do I believe handle the floating ones with Moab. At least there are some comments to that effect. But I don't think that helps us with node ones. Perhaps some morning we could review the history a notch on this. Its been awhile.

@akahles
Copy link

akahles commented Nov 12, 2015

Sure - I mostly moved from Matlab to Python by now (yay) but am happy to help unrolling the history of this.

@kuod
Copy link

kuod commented Nov 12, 2015

We have a bit of everything in terms of the toolkits and they're distributed across the nodes. I'll update with more information off the issue tracker. MATLAB usage on the head node is do-able but I would caution against potentially running jobs that bring the cluster head node to a screeching halt.

@tatarsky
Copy link
Contributor

Confused by the words "head node" and "cluster" in the same sentence ;) I assume @julielyang wants to run Matlab on a node.

@tatarsky
Copy link
Contributor

Agree that a brief verbal unrolling might be useful to recall. Its been pretty much the same config as far as I've ever known.

@julielyang
Copy link
Author

Hi,

I would like to understand this better.

Can two users be assigned to the same node and the first user already be
using the matlab license on that node?

Can this be solved by requesting a all other user free node?

Julie

On Thu, Nov 12, 2015 at 4:35 PM, Andre Kahles notifications@github.com
wrote:

The node licenses are bound to the node. That is at most one user can use
matlab at a given node at the same time. So if somebody uses matlab on the
node already, you would get the error above.

There is a small script available to tell you who is using matlab on which
node:

However, this is not to be used for a regular high-throughput query as it ssh's into the nodes to find out.

On a long term the better solution would be if the scheduler knows about who is using which license on which node and then assigns jobs accordingly. I think there was a previous issue about this, but it turned out that this is not quite trivial to achieve.


Reply to this email directly or view it on GitHub
#339 (comment).

@akahles
Copy link

akahles commented Nov 12, 2015

This is correct - as the scheduler does not know about the licenses, the property matlab only ensures that your job will end up on a node that can run MATLAB but not that nobody else is using MATLAB there already.

You do not necessarily need a node without any other user on it, no other user running MATLAB ist sufficient.

@tatarsky
Copy link
Contributor

And the part that is difficult to integrate with node locked licenses is "but not that nobody else is using MATLAB there already." I am looking at one item regarding this again. I can't guarantee it will work or that I'm going to try it rapidly, but I am looking at it.

@julielyang
Copy link
Author

Okay I understand.

But since I can't guarantee a user is not using matlab on my node can I
instead require a stricter restriction that no other user be on the node?

On Thu, Nov 12, 2015 at 5:07 PM, tatarsky notifications@github.com wrote:

And the part that is difficult to integrate with node locked licenses is
"but not that nobody else is using MATLAB there already." I am looking at
one item regarding this again. I can't guarantee it will work or that I'm
going to try it rapidly, but I am looking at it.


Reply to this email directly or view it on GitHub
#339 (comment).

@akahles
Copy link

akahles commented Nov 12, 2015

This would be quite inefficient, you could just request all cores on that node:
-l nodes=1:ppn=24

@julielyang
Copy link
Author

Ah okay. Yes it is not a efficient solution.

On Thu, Nov 12, 2015 at 5:13 PM, Andre Kahles notifications@github.com
wrote:

This would be quite inefficient, you could just request all cores on that
node:
-l nodes=1:ppn=24


Reply to this email directly or view it on GitHub
#339 (comment).

@julielyang
Copy link
Author

Thank you for letting me know. This was really helpful~!

On Thu, Nov 12, 2015 at 5:14 PM, Julie Yang julie.li.yang@gmail.com wrote:

Ah okay. Yes it is not a efficient solution.

On Thu, Nov 12, 2015 at 5:13 PM, Andre Kahles notifications@github.com
wrote:

This would be quite inefficient, you could just request all cores on that
node:
-l nodes=1:ppn=24


Reply to this email directly or view it on GitHub
#339 (comment).

@akahles
Copy link

akahles commented Nov 12, 2015

If you don't have a large number of jobs, you could also request an interactive session on a full node and then start your matlab processes in background all on the same node - this would be more efficient, but has slight overhead.

@kuod
Copy link

kuod commented Nov 12, 2015

Don't forget to include -l nodes=1:ppn=24:matlab to your torque submission. Otherwise, you may get a node that may not even have a matlab license.

@tatarsky
Copy link
Contributor

Yeah, before we go down that road I'd like to see if I can try something. I might be able to set a "matlabfree" properly based on @akahles script which basically is a pgrep MATLAB if I'm reading it. Its not quite the same as the original goal of #193 which if I recall was to try to assist a person that already had a Matlab license job (on a specific node) to be able to get other jobs to go there.

Can @kuod confirm gpu-1-4 is a matlab node lock machine? I believe so....

@julielyang
Copy link
Author

Yes, okay.

On Thu, Nov 12, 2015 at 5:16 PM, tatarsky notifications@github.com wrote:

Yeah, before we go down that road I'd like to see if I can try something.
I might be able to set a "matlabfree" properly based on @akahles
https://github.com/akahles script which basically is a pgrep MATLAB if
I'm reading it. Its not quite the same as the original goal of #193
#193 which if I recall was
to try to assist a person that already had a Matlab license job to be able
to get other jobs to go there.

Can @kuod https://github.com/kuod confirm gpu-1-4 is a matlab node lock
machine? I believe so....


Reply to this email directly or view it on GitHub
#339 (comment).

@kuod
Copy link

kuod commented Nov 12, 2015

@tatarsky confirm from my notes that gpu-1-4 is a node-locked license.

@akahles
Copy link

akahles commented Nov 12, 2015

I just want to confirm the original goal of #193 was as described. The purpose being to not scatter many jobs of a single user over an array of nodes and all lock them for other users.

@tatarsky
Copy link
Contributor

I may have to wait until such a machine can be offlined as I'm not clear on my ideas impact to running jobs. This is where I wish a had a test environment. @julielyang relatively how critical would you rank this just so I can decide the best way forward. I think I can add a dynamic properly that at least flags "matlabfree" for a node that also has property "matlab".

Yeah @akahles its that user part that was making #193 very difficult.

@tatarsky
Copy link
Contributor

And the "freedom" of matlab would be based on the process table. I do not want to turn the property "matlab" into a consumable as that then defeats the desire I know people have to stack multiple matlab jobs on the same node as the same user.

@julielyang
Copy link
Author

Sure. My scripts are running on license free nodes so I will not even be
using the inefficient suggested solution of obtaining a user-free node.

I will look forward for your solution. This can also be low priority on
your list of to dos because my jobs are already running so I don't have a
pressing need for a solution. Thanks so much for your help.

Julie Yang

On Thu, Nov 12, 2015 at 5:24 PM, tatarsky notifications@github.com wrote:

I may have to wait until such a machine can be offlined as I'm not clear
on my ideas impact to running jobs. This is where I wish a had a test
environment. @julielyang https://github.com/julielyang relatively how
critical would you rank this just so I can decide the best way forward. I
think I can add a dynamic properly that at least flags "matlabfree" for a
node that also has property "matlab".

Yeah @akahles https://github.com/akahles its that user part that was
making #193 #193 very
difficult.


Reply to this email directly or view it on GitHub
#339 (comment).

@tatarsky
Copy link
Contributor

Noted. While what I may do will be a bit of a hack it might speed that selection process.

@tatarsky
Copy link
Contributor

I tried a few things over the weekend but it didn't work as I hoped. I've opened a ticket to see if Adaptive has any ideas.

@julielyang
Copy link
Author

Okay, thank you.

On Mon, Nov 16, 2015 at 1:56 PM, tatarsky notifications@github.com wrote:

I tried a few things over the weekend but it didn't work as I hoped. I've
opened a ticket to see if Adaptive has any ideas.


Reply to this email directly or view it on GitHub
#339 (comment).

@tatarsky
Copy link
Contributor

Some experiments in this regard are being included in efforts to bring up a new head node. Status only. Nothing end user facing at this time.

@tatarsky
Copy link
Contributor

tatarsky commented Dec 4, 2015

This issue will be further experimented with as part of #349

@tatarsky
Copy link
Contributor

So while I can't fully explain all the paths I have walked down to get one simple resource added as a test example, I was able to finally figure out on the test Moab server how to add at least a script generated resource. The trick is now getting the proper data from a node as Moab executes these extension scripts on the scheduler system, not the node. (Yes, we can have it do stuff like ssh but I want it to be efficient).

So this is a placeholder that I've made some progress on this. But remain working on it on the test Moab system.

@akahles
Copy link

akahles commented Dec 10, 2015

Could we just use a file to log this information somewhere centrally? So it would be easy to look up. But maybe I am also missing the point here ...

@tatarsky
Copy link
Contributor

Basically if I understand what you folks want for any system running MATLAB node locked we need a dynamic resource I'm calling "MATLABUSER" which is set to the username of the person running Matlab already on the node. Allowing you to then submit additional jobs stating that as a requirement.

The information that goes into that resource on the node is from the process table and will probably end up as some kind of file based lookup. But you have to be very carefully extending a scheduler doing lots of different things with some kind of lookup....

Per what I've found out so far.

@akahles
Copy link

akahles commented Dec 10, 2015

Ok, I see. That was basically what I was suggesting. But I see the danger of adding to an already quite complex system ...

@tatarsky
Copy link
Contributor

Does anyone have an example of requesting one of the shared matlab licenses from qsub? Or do I misunderstand that part. I don't mean the :matlab item which is a node property. I'm looking for an example of these resources I see globally defined:

 License Bioinformatics_Toolbox    2 of   2 available  (Idle: 100.00%  Active: 0.00%)
  License Compiler            2 of   2 available  (Idle: 100.00%  Active: 0.00%)
  License Image_Toolbox       3 of   3 available  (Idle: 100.00%  Active: 0.00%)
  License Optimization_Toolbox    3 of   3 available  (Idle: 100.00%  Active: 0.00%)
  License Statistics_Toolbox    3 of   3 available  (Idle: 100.00%  Active: 0.00%)

I'd like to fully make sure I understand the syntax a license is asked for from qsub or the submit file.

@tatarsky
Copy link
Contributor

OK. So I think the direction I'm going here is using a Moab nodeset which I'm learning so I may not have all the parts right.

But it appears I can dynamically based on a script output assign what are called variable attributes to nodes via Moab. Basically in this case some kind of polling for what users are running MATLAB on the nodes.

Then, what this looks like from the qsub point of view is Moab places the variable attribute like this along with some of the static ones:

checknode gpu-3-8

Attributes:         Memory=1024,Processors=1,batch,gtx680,matlabuser=paul,nv352

Selecting those nodes however requires a slightly different syntax from my findings I believe due to the use of a "varattr" compared to a feature which can't have a "name=value" pair (I think).

This appears to be the incarnation to say "run my job on systems with this variable attribute".

qsub -l nodeset=FIRSTOF:VARATTR:matlabuser=paul (some script)

FIRSTOF can also be other values.

http://docs.adaptivecomputing.com/mwm/Content/topics/optimization/nodesetoverview.html

Initial tests on a small scale using a fake "matlabuser" assignment appear to function. I'm trying to decide the rate of polling and actual mechanics of that and how to prevent hung or slow nodes from delaying things. I will probably separate the gathering of the matlabuser from Moab itself and the Moab part will just read a file.

@tatarsky
Copy link
Contributor

And only issue I see in this approach is you may have to "seed" the process by getting the node locked license with one qsub first and then follow it up with the qsub to request your other jobs try to get placed on that node due to it. I don't see a clear way around that at the moment, but I suspect you are already doing this.

I may attempt to convert this code over to the main scheduler next week as I don't have any Matlab nodes in the small test environment (gpu-3-8 and gpu-3-9)

@tatarsky
Copy link
Contributor

Can somebody remind me where the actual Matlab node locked license lives? I see out there a blend between:

 /opt/matlab/R2013a/licenses/license.lic
 /opt/matlab/R2013a/licenses/network.lic

I am also preparing for a automatic /opt area sync and I believe these licenses are unique and located in those trees and thus would require exceptions.

Working on a final config attempt for this process. Sorry for all the updates.

@kuod
Copy link

kuod commented Dec 14, 2015

On the compute nodes there should exist a .dat file in this folder

/opt/matlab/R2013a/licenses/

IIRC, the suffixes may not all be .dat but effectively, they contain the information for the licensing server to allow the the start of Matlab.

@tatarsky tatarsky self-assigned this Feb 19, 2016
@ChayaSt
Copy link

ChayaSt commented Mar 31, 2016

I ran into the same problem today when I tried using Matlab on hal. Had this been sorted out?

@tatarsky
Copy link
Contributor

I feel the best solution is floating licenses. While I have a partial implementation of what I described above I wasn't overly impressed by its real usability. I can attempt some degree of it but the floating licenses path would be the more supportable one. So I'm going to ask @juanperin what he would like me to do as in the end my support tasks will be migrated to others.

@jchodera
Copy link
Member

I seem to recall that MSK now has a site license. A possible solution would be to use the MSK license server.

@edingtoj
Copy link

If the Matlab license on hal has been maintained we could update. But in order to access the MSK site license hal needs to move inside the MSK network.

On Mar 31, 2016, at 9:41 AM, John Chodera notifications@github.com wrote:

I seem to recall that MSK now has a site license. A possible solution would be to use the MSK license server.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub

@jchodera
Copy link
Member

Networking can open a hole in the firewall just for specific ports and the specific MSK license server. This would presumably be dealt with via a firewall port open request specifying specific IP source range (the nodes) and a specific IP destination (the license server) with a specific range of ports for the license server. This would be subject to review by InfoSec, but the risk should be minimal due to the restricted nature and that both systems are behind firewalls.

I don't know the current status of the software licenses on hal. This has been under the aegis of the HPC core for nearly a year now, right?

@juanperin
Copy link

We asked about opening a port. It wasn't done because they wouldn't have been able to support too many license requests at once. The number of licenses available on the institutional license is limited. I believe its for 40 users, so they were ok giving us access on saba considering only a few nodes and users would possibly get consumed at one time, but any larger set of requests would exceed their capacity.

We'll find an appropriate option as soon as possible to enable preferably floating licenses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants