Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resources2 mongo #205

Open
wants to merge 2 commits into
base: resources2
Choose a base branch
from

Conversation

holofermes
Copy link

This PR adds a new repository type based on mongo. A new tool was created (rez-copy) to aid the logistic of moving packages from and to different repositories, and some features were added to a few existing tools to account for the new repository capabilities (rez-build, rez-release, rez-search).

rez-build and rez-release use two new configuration options local_packages_repository_path, release_packages_repository_path to control where the package is installed. By default their value is interpreted as: filesystem@local_packages_path and filesystem@release_packages_path, but they can be controlled directly from rez-build/release through the --repo-prefix flag (ie: --repo-prefix mongo@/svr/local).

EXAMPLES

rez-copy

copy all packages and variants from all filesystem repos to their mongo counterpart

$ rez-copy --all filesystem@ mongo@

copy all versions and possibly variants of a package as seen by the specified source filesystem path

$ rez-copy python filesystem@/home/fpiparo/packages mongo@/home/fpiparo/packages --variant 0
Searching [################################] 1/1
Found 4 variant/s in 2 package/s.
Do you want to continue? (y)es or (n)o or (d)etails: y
Copying [################################] 4/4
Finished processing 4 variant/s found in 2 package/s into 1 destination/s. 

copy a specific version and a specific variant

$ rez-copy python-2.7.3 filesystem@/home/fpiparo/packages mongo@/home/fpiparo/packages --variant 0
Searching [################################] 1/1
Found 1 variant/s in 1 package/s.
Do you want to continue? (y)es or (n)o or (d)etails: y
Copying [################################] 1/1
Finished processing 1 variant/s found in 1 package/s into 1 destination/s. 

rez-search

perform a search on multiple repository types

$ rez-search python --paths mongo@/home/fpiparo/packages:filesystem@/home/fpiparo/packages --format='{qualified_name} {repository_path} {repository_type}'
python-2.6 mongo@host=localhost,db=local,port=27017,namespace=/home/fpiparo/packages mongo
python-2.7.3 mongo@host=localhost,db=local,port=27017,namespace=/home/fpiparo/packages mongo
python-2.6 filesystem@/home/fpiparo/packages filesystem
python-2.7.3 filesystem@/home/fpiparo/packages filesystem

rez-build

install the foo package, and its payload to local_packages_path, and the package definition in the mongo@/svr/packages repository.

$ rez-build -i --repo-prefix mongo@/svr/packages
...
...
...

and then search for it

$ rez-search \* --path mongo@/svr/packages -t package --format='{qualified_name} {repository_path}'
foo-1.0.1 mongo@host=localhost,db=local,port=27017,namespace=/svr/packages

OUTSTANDING

  • with rez-search if the same family.name happens to be in two different repositories we need to figure out how to best display that information since formatting doesn't work for when using the type flag as 'family'.
  • pymongo needs to be installed for the mongo repository to work. I tried with installing directly into rez's virtualenv (after installing rez itself) ie:
$ source /path/to/rez/bin/activate
$ pip install -U pymongo.src.tgz

maybe this could be taken care of during the installation of rez. One caveat on linux is that pymongo itself supports optional C extensions, which requires a few things to be installed (http://api.mongodb.org/python/current/installation.html#dependencies-for-installing-c-extensions-on-unix), also I'm not sure if it would work on windows/mac the same way.

  • decide what is base and what is uri in respect of mongo for now:
$ rez-search python --paths mongo@/home/fpiparo/packages:/home/fpiparo/packages --format='{uri}' -t package --nw
host=localhost,db=local,port=27017,namespace=/home/fpiparo/packages:python-2.6
/home/fpiparo/packages/python/2.6/package.yaml
$ rez-search python --paths mongo@/home/fpiparo/packages:/home/fpiparo/packages --format='{base}' -t package --nw
host=localhost,db=local,port=27017,namespace=/home/fpiparo/packages:python
/home/fpiparo/packages/python/2.6
  • add UT for the modified tools (rez-search, rez-copy, rez-build, rez-release)
  • make rez-copy work the other way around (from mongo to filesystem).

Fabio Piparo added 2 commits April 22, 2015 14:56
…ges saved in a mongo database.

-added rez-copy. this tool lets you copy packages from and to arbitrary repositories, for example from filesystem to mongo.
-rez-build and rez-release uses two new configuration options local_packages_repository_path, release_packages_repository_path to control where the package is installed. By default their value is interpreted as: filesystem@local_packages_path and filesystem@release_packages_path. These options could be overridden directly from rez-build and rez-release through the --repo-prefix flag (ie: --repo-prefix mongo@/svr/local).
-changed source_code_schema so that SourceCode is serialized to a basestring class.
-rez-search support lookups on different repositories.
@mstreatfield
Copy link
Contributor

Thanks, @holofermes this looks very cool. I have a couple of questions:

In your rez-search example (rez-search --path mongo@/svr/packages ...), this means use the mongo repository (where the username etc is configured in rezconfig) for package metadata, and /svr/packages as the filesystem location for the build artifacts?

If a package is installed to the mongo repository, I assume the package.yaml doesn't get copied into the install path? Would this be possible to aid transition - I imagine only some users using mongo initially while we test/evaluate so imagine they could become out of sync.

It seems desirable for local packages to use the filesystem repository and released packages to use mongo in a normal developer workflow, what do you think?

Can rez copy be used to copy packages between two repositories of the same type - e.g. from one filesystem repository to another? I assume it's only copying the metadata and not any build artifacts.

pymongo needs to be installed for the mongo repository to work.
I'm not sure if it would work on windows/mac the same way.

Would it be possible to install pymongo under the rez.vendor namespace without c-extensions? Reading the page you linked to suggests that it would then work out-of-the-box on every platform (as presumably at that stage it's pure python).

Rez could then internally manage the imports of pymongo such that if pymongo is found in the virtualenv it is used in preference of rez.vendor, but with the fallback if required. This way pymongo is essentially the same as other external dependencies in rez, but still allowing a studio to build the c-extensions themselves based on their environment - an option could be added to the install.py script to enable this?

@nerdvegas
Copy link
Contributor

I'll let holofermes cover the finer details, but see below..

On Tue, May 26, 2015 at 4:34 PM, Mark Streatfield notifications@github.com
wrote:

Thanks, @holofermes https://github.com/holofermes this looks very cool.
I have a couple of questions:

In your rez-search example (rez-search --path mongo@/svr/packages ...),
this means use the mongo repository (where the username etc is configured
in rezconfig) for package metadata, and /svr/packages as the filesystem
location for the build artifacts?

If a package is installed to the mongo repository, I assume the
package.yaml doesn't get copied into the install path? Would this be
possible to aid transition - I imagine only some users using mongo
initially while we test/evaluate so imagine they could become out of sync.

This is a good point. I think it's probably always desirable to install the
package.py anyway - it just makes things easy for debugging and managing
packages generally. One problem though is keeping them in sync. Since a
variant would be installed into mongo + filesystem separately, there is the
chance that another process might write to the package.py inbetween these
steps, resulting in differing variant indexes between mongo and filesystem.
Or, one install might succedd and one fail, leading to the same kind of
problem.

It seems desirable for local packages to use the filesystem repository and
released packages to use mongo in a normal developer workflow, what do you
think?

Agreed. Mixing together different repository types on REZ_PACKAGES_PATH has
always been the goal, mostly for this reason.

Can rez copy be used to copy packages between two repositories of the
same type - e.g. from one filesystem repository to another? I assume it's
only copying the metadata and not any build artifacts.

Yes and yes. It should not matter what type of repo the src/dest are, so
far as rez knows it is just using package repositories via the common
plugin API.

pymongo needs to be installed for the mongo repository to work.
I'm not sure if it would work on windows/mac the same way.

Would it be possible to install pymongo under the rez.vendor namespace
without c-extensions? Reading the page you linked to suggests that it would
then work out-of-the-box on every platform (as presumably at that stage
it's pure python).

Rez could then internally manage the imports of pymongo such that if
pymongo is found in the virtualenv it is used in preference of rez.vendor,
but with the fallback if required. This way pymongo is essentially the same
as other external dependencies in rez, but still allowing a studio to build
the c-extensions themselves based on their environment - an option could be
added to the install.py script to enable this?

That would be my preference.

Similarly, I saw this:
https://github.com/pbrady/fastcache

If it looks like it works, I was considering a similar approach - if a
studio wants a bit of a speed boost, they could install fastcache into the
rez virtualenv, and Rez will manage loading it in its imports.


Reply to this email directly or view it on GitHub
#205 (comment).

@holofermes
Copy link
Author

In your rez-search example (rez-search --path mongo@/svr/packages ...), this means use the mongo repository (where the username etc is configured in rezconfig) for package metadata, and /svr/packages as the filesystem location for the build artifacts?

when using --path with rez-search you are telling where to search, so in the example rez will search in a mongo repo, in the /svr/packages namespace.
First a few things about the concept of "location" in repository types:
Here is what we have for now:
filesystem@/svr/packages
memory@<dict at 285429200>
mongo@host=localhost,db=local,port=27017,namespace=/svr/packages

also note that the following are equivalent:
rez-search --path filesystem@/svr/packages
rez-search --path /svr/packages
..so when a repo_type@ is omitted, rez always assume is a filesystem repo.
In our mongo type, the namespace attribute defines the actual collection name of the mongodb database. The other attributes (host,db,port) are defined in the package_repository rezconfig and can be omitted when creating the location string.

If a package is installed to the mongo repository, I assume the package.yaml doesn't get copied into the install path? Would this be possible to aid transition - I imagine only some users using mongo initially while we test/evaluate so imagine they could become out of sync
For now you have to go from something on disk to mongo, and either with rez-build/rez-search the actual package you are dropping in mongo needs to be valid, so it needs a package.py.

It seems desirable for local packages to use the filesystem repository and released packages to use mongo in a normal developer workflow, what do you think?

That's totally doable. As of now, both local and released packages will go to the filesystem, however there are two settings that can override where the package metadata gets saved local_packages_repository_path and release_packages_repository_path, also coupled with an extra flag in rez-build/rez-release for further overriding.
For example:

$ cd foo-source
$ rez-build -i --repo-prefix mongo@
$ tree /home/fpiparo/packages/foo
/home/fpiparo/packages/foo
`-- 1.0.1
    `-- platform-linux
        `-- arch-x86_64
            `-- os-Ubuntu-12.04
                `-- build.rxt
$ rez-search foo --path mongo@/home/fpiparo/packages -t variant
foo-1.0.1[0]
$ rez-env foo-1.0.1 --path mongo@/home/fpiparo/packages
> $exit
$ rez-env foo-1.0.1
rez: PackageDefinitionFileMissing: Missing package definition file: FileSystemPackageResource({'version': '1.0.1', 'repository_type': 'filesystem', 'location': '/home/fpiparo/packages', 'name': 'foo'})

so here the package.py isn't present in the filesystem, but exists in mongo. Instead of using the --repo-prefix flag, one could make this permanent by modifying:

$ rez config | grep repository_path
local_packages_repository_path: filesystem@
release_packages_repository_path: filesystem@

By convention if the values are provided without anything after the @ then the following defaults are appended:
local_packages_path: /home/fpiparo/packages
release_packages_path: /home/fpiparo/.rez/packages/int

Would it be possible to install pymongo under the rez.vendor namespace without c-extensions? Reading the page you linked to suggests that it would then work out-of-the-box on every platform (as presumably at that stage it's pure python).

In theory it should install the module without building the C extensions, so yes it should work, but it needs testing.

@fnaum
Copy link

fnaum commented Jul 2, 2015

Hi Fabio,

Thank you so much for this feature, we might end up using this in the short term.
I tried this branch, (merged manually against our latest internal branch)

The rez copy, rez search, and installing the metadata in the mongo DB worked as advertised 👍

I run into 2 issues.

  1. When running rez-env foo --path mongo@/path/to/packages it resolves fine but then when is trying to source the context in the shell I got the following error

Error getting resource from pool: Unknown resource type 'mongo.variant'

Are you able to do a rez-env and get a shell?

  1. I built package foo and installed it in the mongo DB repo,

foo/1.1.1> rez-build -i --repo-prefix mongo@/path/to/packages

then I tried build package bar that has foo as a dependency, adding the mongo path to the REZ_PACKAGES_PATH

>setenv REZ_PACKAGES_PATH mongo@/path/to/packages:filesystem@/path/to/packages
bar/1.1.0> rez-build -i

It succeeded to get a build_context but failed to carry one with the build.

I debugged a little and I found this after it resolves all the packages,

>>> for package in resolver.resolved_packages:
...    print package
...    print package.resource._repository.name()
... 
Variant(mongo.variant{'index': None, 'name': 'foo', 'ext': None, 'version': u'1.1.1', 'location': 'host=localhost,db=local,port=27017,namespace=/film/tools/packages', 'repository_type': 'mongo'})
filesystem
Variant(mongo.variant{'index': None, 'name': 'CentOS', 'ext': None, 'version': u'6.6', 'location': 'host=localhost,db=local,port=27017,namespace=/film/tools/packages', 'repository_type': 'mongo'})
filesystem
Variant(filesystem.variant{'index': None, 'version': '1.0.17', 'repository_type': 'filesystem', 'location': '/film/tools/packages', 'name': 'ALBlacklist'})
filesystem


>>> resolver.resolved_packages[0] 
Variant(MongoVariantResource({'index': None, 'name': 'foo', 'ext': None, 'version': u'1.1.1', 'location': 'host=localhost,db=local,port=27017,namespace=/film/tools/packages', 'repository_type': 'mongo'}))
>>> resolver.resolved_packages[0].repository_type
'filesystem'
>>> resolver.resolved_packages[0].parent
Package(MongoPackageResource({'version': u'1.1.1', 'repository_type': 'mongo', 'location': 'host=localhost,db=local,port=27017,namespace=/film/tools/packages', 'name': 'foo'}))
>>> resolver.resolved_packages[0].parent.repository_type
'mongo'

I guess at some point you sort of need to translate the path in the namespace in the mongo repo location to an actual filesystem.

Is that expected that the variant repository_type is filesystem and the package is mongo ?

Am I doing something wrong?

For the first issues probably there are some changes in resource2 branch that we have not integrated yet

Thanks,
Fede

@holofermes
Copy link
Author

Hi Fede,

Error getting resource from pool: Unknown resource type 'mongo.variant'

Are you able to do a rez-env and get a shell?

Yes I am able to rez-env and get a shell, and I also have a hunch that your branch might miss some of the juice from resources2.

Is that expected that the variant repository_type is filesystem and the package is mongo ?
Not quite,and I wonder if this could be related to the resources2 branch changes, and to make sure that's the case, maybe try to reproduce the issue by using directly this branch.

Also you can perform a rez-search, and see how is the package/variant data modeled:

rez-search bar -f='{qualified_name} {repository_type} {repository_path}' -t variant
bar-0.0.1[0] mongo mongo@host=localhost,db=local,port=27017,namespace=/path/to/packages
bar-0.0.1[0] filesystem filesystem@/home/fpiparo/packages

lemme know!

@fnaum
Copy link

fnaum commented Jul 3, 2015

I merged holofermes/resources2 into my branch, but I still have the same issue..
When I do the rez-search it does not find the metadada int the filesystem.

rez-build -i --repo-prefix mongo@/scratch/federicon/rez/packages

that build fine, then

> rez-search bla  --path mongo@/scratch/federicon/rez/packages:filesystem@/scratch/federicon/rez/packages -f='{qualified_name} {repository_type} {repository_path}' -t variant
bla-1.1.1[] {repository_typeindex} mongo@host=localhost,db=local,port=27017,namespace=/scratch/federicon/rez/packages
bla-1.1.1
Missing package definition file: FileSystemPackageResource({'version': '1.1.1', 'repository_type': 'filesystem', 'location': '/scratch/federicon/rez/packages', 'name': 'bla'}

if I run the rez-build -i then it gets installed in the REZ_LOCAL_PACKAGES path and then

rez-search bla --path mongo@/scratch/federicon/rez/packages:filesystem@/scratch/federicon/rez/packages -f='{qualified_name} {repository_type} {repository_path}' -t variant
bla-1.1.1[] {repository_typeindex} mongo@host=localhost,db=local,port=27017,namespace=/scratch/federicon/rez/packages
bla-1.1.1[] {repository_typeindex} filesystem@/scratch/federicon/rez/packages

but at any point the rez-env fails.

>setenv REZ_PACKAGES_PATH mongo@/scratch/federicon/rez/packages:filesystem@/film/tools/packages
>rez-env bla
....
You are now in a rez-configured environment.
......
   raise ResolvedContextError("%s: %s: %s" % (msg, exc_name, str(e)))
rez.exceptions.ResolvedContextError: Failed to load context from /tmp/rez_context_c_dgye/context.rxt: ResourceError: Error getting resource from pool: Unknown resource type 'mongo.variant'

If you can not spot anything wrong in what I am doing here, don't waste your time.. it can be something on the merge. We will be merging all dangling branches to our local resource2 branch next week and I'll give it another shot.

PS: just for curiosity I run the rez-env in one our biggest Maya environments (300+ packages) and compared the time it took to get the resources+resolve against the filesystem version (resolved cache deactivated of course) .
The filesystem version took in average 67 seconds and the one with mongo 28 seconds 😄 .
So apart from the other benefits of this feature it will be a good performance improvement for when the cache gets invalidated 👍

@holofermes
Copy link
Author

I merged holofermes/resources2 into my branch, but I still have the same issue..
When I do the rez-search it does not find the metadada int the filesystem.

rez-build -i --repo-prefix mongo@/scratch/federicon/rez/packages
that build fine, then

rez-search bla --path mongo@/scratch/federicon/rez/packages:filesystem@/scratch/federicon/rez/packages -f='{qualified_name} {repository_type} {repository_path}' -t variant
bla-1.1.1[] {repository_typeindex} mongo@host=localhost,db=local,port=27017,namespace=/scratch/federicon/rez/packages
bla-1.1.1
Missing package definition file: FileSystemPackageResource({'version': '1.1.1', 'repository_type': 'filesystem', 'location': '/scratch/federicon/rez/packages', 'name': 'bla'}
if

When you do the first rez-build your package's payload might be installed in filesystem@/scratch/federicon/rez/packages but the package definition (aka: package.py) is not in there but lives in mongo@/scratch/federicon/rez/packages, and that's why rez is tripping with "Missing package definition". And the second time it actually creates the package.py in the filesystem repo.
I guess the package definition must be installed in the filesystem.

but at any point the rez-env fails.

setenv REZ_PACKAGES_PATH mongo@/scratch/federicon/rez/packages:filesystem@/film/tools/packages
rez-env bla
....
You are now in a rez-configured environment.
......
raise ResolvedContextError("%s: %s: %s" % (msg, exc_name, str(e)))
rez.exceptions.ResolvedContextError: Failed to load context from /tmp/rez_context_c_dgye/context.rxt: ResourceError: Error getting resource from pool: Unknown resource type 'mongo.variant'

rez-env failing I'm not sure about..

PS: just for curiosity I run the rez-env in one our biggest Maya environments (300+ packages) and compared the time it took to get the resources+resolve against the filesystem version (resolved cache deactivated of course) .
The filesystem version took in average 67 seconds and the one with mongo 28 seconds .
So apart from the other benefits of this feature it will be a good performance improvement for when the cache gets invalidated

Are you not using the memcache feature in rez? You should get pretty good results and it's not as invasive as rolling out mongo.

@nerdvegas
Copy link
Contributor

"""
The filesystem version took in average 67 seconds and the one with mongo 28
seconds .
"""

Just wanted to chime in on what Fabio said on this RE memcached. He is
right, you should get big performance improvements using memcached - even
if a solve becomes invalidated in the cache, the packages themselves are
not, and Rez uses memcached as a file cache as well as a solve cache. The
vast majority of your 67 seconds should be taken up loading packages, so
you should get most of this back, even on a fresh solve.

Hth
A

On Mon, Jul 6, 2015 at 12:06 PM, Fabio notifications@github.com wrote:

I merged holofermes/resources2 into my branch, but I still have the same
issue..
When I do the rez-search it does not find the metadada int the filesystem.

rez-build -i --repo-prefix mongo@/scratch/federicon/rez/packages
that build fine, then

rez-search bla --path mongo@/scratch/federicon/rez/packages:filesystem@/scratch/federicon/rez/packages
-f='{qualified_name} {repository_type} {repository_path}' -t variant
bla-1.1.1[] {repository_typeindex} mongo@host
=localhost,db=local,port=27017,namespace=/scratch/federicon/rez/packages
bla-1.1.1
Missing package definition file: FileSystemPackageResource({'version':
'1.1.1', 'repository_type': 'filesystem', 'location':
'/scratch/federicon/rez/packages', 'name': 'bla'}
if

When you do the first rez-build your package's payload might be
installed in filesystem@/scratch/federicon/rez/packages but the package
definition (aka: package.py) is not in there but lives in mongo@/scratch/federicon/rez/packages,
and that's why rez is tripping with "Missing package definition". And the
second time it actually creates the package.py in the filesystem repo.
I guess the package definition must be installed in the filesystem.

but at any point the rez-env fails.

setenv REZ_PACKAGES_PATH mongo@/scratch/federicon/rez/packages:filesystem@
/film/tools/packages
rez-env bla
....
You are now in a rez-configured environment.
......
raise ResolvedContextError("%s: %s: %s" % (msg, exc_name, str(e)))
rez.exceptions.ResolvedContextError: Failed to load context from
/tmp/rez_context_c_dgye/context.rxt: ResourceError: Error getting resource
from pool: Unknown resource type 'mongo.variant'

rez-env failing I'm not sure about..

PS: just for curiosity I run the rez-env in one our biggest Maya
environments (300+ packages) and compared the time it took to get the
resources+resolve against the filesystem version (resolved cache
deactivated of course) .
The filesystem version took in average 67 seconds and the one with mongo
28 seconds .
So apart from the other benefits of this feature it will be a good
performance improvement for when the cache gets invalidated

Are you not using the memcache feature in rez? You should get pretty good
results and it's not as invasive as rolling out mongo.


Reply to this email directly or view it on GitHub
#205 (comment).

@fnaum
Copy link

fnaum commented Jul 7, 2015

Hi Allan,

Good point, I disabled all memcache completely when I did the tests.

I just tried with memcache on, invalidating the resolved cache by adding a new version of a package and I got the resolve in ~28 seconds

Thanks for the clarification

Cheers
Fede

@nerdvegas
Copy link
Contributor

Hey Fede, that's still a really big amount of time, have you profiled it to
see where it's going? Typically solve time here is very low, a few seconds
maybe.

Something I'm also considering is a service that continually monitors for
config changes and takes the resolve hit hopefully before an artist does.
This would also be helpful because failed resolved could be detected before
anyone is exposed to them.

A

On Mon, Jul 6, 2015 at 6:07 PM, Federico Naum notifications@github.com
wrote:

Hi Allan,

Good point, I disabled all memcache completely when I did the tests.

I just tried with memcache on, invalidating the resolved cache by adding a
new version of a package and I got the resolve in ~28 seconds

Thanks for the clarification

Cheers
Fede


Reply to this email directly or view it on GitHub
#205 (comment).

@mstreatfield
Copy link
Contributor

Hey @nerdvegas I spent a bit of time today profiling, but nothing conclusive. I've been focusing on running two ResolvedContexts sequentially, for the same request, so:

r = ['list', 'of', 'packages']
ResolvedContext(r, caching=False)
ResolvedContext(r, caching=False)

The first resolve runs at about 50 seconds, the second resolve at just under 10 seconds. The second resolve is hitting the lru cache exclusively I think (and not memcache) and has no (or very little) filesystem access.

In the first run a large chunk of time (although I can't quantify how much just yet) is spent parsing the yaml files and accessing the filesystem. Whether that is enough to account for the 40 second difference I am not so sure (and from Fede's earlier testing replacing this with Mongo shows ~50% improvement maybe).

On the second run, nearly 2 million calls to _SubToken.eq are made, accounting for ~0.5 seconds. A similar number of calls and ~0.8 seconds is spent in AlphanumericVersionToken.less_than. So just the number of versions being considered (and objects being created and compared) is adding to the overall weight of the process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants