Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Locks + cache for multiple revisions #8510

Closed
wants to merge 68 commits into from
Closed
Show file tree
Hide file tree
Changes from 36 commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
03a2833
classes for a Conan cache
jgsogo Feb 2, 2021
0b1d298
implement locks on top of a database
jgsogo Feb 3, 2021
78d9152
create a manager and some lockable resource
jgsogo Feb 3, 2021
15fc970
class per file
jgsogo Feb 3, 2021
602d45a
Add some testing for basic behavior
jgsogo Feb 3, 2021
c274ad4
Add some testing for basic behavior
jgsogo Feb 3, 2021
099776f
wip
jgsogo Feb 8, 2021
9e85c3d
work on folders
jgsogo Feb 9, 2021
893ee3f
moving folders when the rrev is known
jgsogo Feb 9, 2021
056b452
first running impl
jgsogo Feb 9, 2021
0dcbd71
add comment
jgsogo Feb 9, 2021
7cc54d1
add comment
jgsogo Feb 9, 2021
deb051b
small testing
jgsogo Feb 10, 2021
9df88a6
add more tests
jgsogo Feb 10, 2021
062d33f
add more tests
jgsogo Feb 10, 2021
4454c18
check folders are retrieved from database
jgsogo Feb 10, 2021
1850271
Merge branch 'feat/cache' of github.com:jgsogo/conan into feat/cache
jgsogo Feb 10, 2021
4b2224d
add default path for packages
jgsogo Feb 10, 2021
ee889b7
working with multiprocessing
jgsogo Feb 11, 2021
d068fcc
organize fixtures
jgsogo Feb 11, 2021
9117a64
use URI to work with memory sqlite3 databases
jgsogo Feb 11, 2021
a10a92f
use URI to work with memory sqlite3 databases
jgsogo Feb 11, 2021
25b15f5
rename functions
jgsogo Feb 11, 2021
d857f53
reuse sqlite3 initializations
jgsogo Feb 11, 2021
ff85569
replicate scenario to install package
jgsogo Feb 15, 2021
fd2cfc6
typo
jgsogo Feb 15, 2021
eecd0a1
testing using external tools
jgsogo Feb 15, 2021
907b3aa
remove files
jgsogo Feb 16, 2021
13484d7
fix import
jgsogo Feb 16, 2021
d8a7b04
Remove LockableResource (no use case so far)
jgsogo Feb 16, 2021
3d6c9dc
dump to output buffer
jgsogo Feb 16, 2021
43c0784
Add meaning exceptions for locks
jgsogo Feb 16, 2021
11cfbb3
exceptions for cache
jgsogo Feb 16, 2021
ce12419
update comments
jgsogo Feb 16, 2021
628a606
Prevent SQL injection cc/ @SSE4
jgsogo Feb 16, 2021
622c24b
simplify function interfaces
jgsogo Feb 16, 2021
2fefe1c
check reference (at least) is et
jgsogo Feb 16, 2021
63b4689
lock the database while we are operating on it
jgsogo Feb 23, 2021
b1d35f1
test using external script (expecting to reproduce two processes)
jgsogo Feb 23, 2021
9fcb580
no need to rollback if we are closing the database
jgsogo Feb 23, 2021
1dde7ba
remove useless lines
jgsogo Feb 23, 2021
dc94417
add backend for locks using fasteners library (failure in tests)
jgsogo Feb 24, 2021
d82ff38
testing working with fasterners
jgsogo Feb 24, 2021
cdf1587
kill a process, leftovers are not removed :(
jgsogo Feb 25, 2021
15e5fa7
reorganize cache_folders
jgsogo Feb 25, 2021
4f44e3d
rename classes (now names too long)
jgsogo Feb 25, 2021
aa40da5
move functions inside
jgsogo Feb 25, 2021
4e0c2e4
add comments
jgsogo Feb 25, 2021
b3598cf
package_layout is not a member of references
jgsogo Feb 25, 2021
6946db8
basic operations for table of prefs
jgsogo Mar 1, 2021
1b825dc
basic testing for packages
jgsogo Mar 1, 2021
c605699
tests for packages table
jgsogo Mar 1, 2021
2bbdc52
latest pref, tests
jgsogo Mar 1, 2021
63be035
table to store folders
jgsogo Mar 1, 2021
cf79d39
manage folders, update timestamp for LRU
jgsogo Mar 1, 2021
a38a6be
migrating test to new implementation
jgsogo Mar 1, 2021
d6fba26
add uniqueness checks
jgsogo Mar 1, 2021
69d067c
use new database with tables approach
jgsogo Mar 1, 2021
e641c85
remove old directories table/database
jgsogo Mar 2, 2021
81df9e4
there exists a two-level cache
jgsogo Mar 2, 2021
1666cfc
use interface to ensure we provide same methods for all cache impleme…
jgsogo Mar 2, 2021
522b839
use db with 'get' and 'get_or_create' API
jgsogo Mar 4, 2021
ea7059c
get_or_create need to work at database level (for atomicity)
jgsogo Mar 4, 2021
26fb724
provide safe[r] implementation for read-only cache
jgsogo Mar 4, 2021
3879d87
remove dead code
jgsogo Mar 4, 2021
bfc58aa
functions to list, search,... references from one cache
jgsogo Mar 4, 2021
e67b028
Add docu
jgsogo Mar 4, 2021
8079715
list packages given a reference (search, list, find package-id)
jgsogo Mar 4, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
103 changes: 103 additions & 0 deletions conan/cache/README.md
@@ -0,0 +1,103 @@
# Conan Cache

## Considerations
* In the codebase I want to use objects like `recipe_layout`
or `package_layout`, I don't want to carry always the `cache` object
together with the `ConanFileReference` or `PackageReference`.

+ **Consequence**: the lock is not adquired at the momento of getting
the `RecipeLayout` or `PackageLayout` but when it is going to be used.

## Alternatives

1. Before using anything from a layout, you need to indicate the ownership
you want

1. All operations run inside the layout (read files, write,...)

1. Return a lock-object together with the information and let the user decide
what to do with it.


## SQlite3

According to docs, it is safe to use SQlite3 from different processes in a
concurrent way. It manages several readers at the same time and only one
writter at the same time ([info](https://sqlite.org/faq.html#q5),
[more info](https://www.sqlite.org/lockingv3.html)).

According to [Python docs](https://docs.python.org/3/library/sqlite3.html),
it is also safe:

> When a database is accessed by multiple connections, and one of the
> processes modifies the database, the SQLite database is locked until
> that transaction is committed. The timeout parameter specifies how
> long the connection should wait for the lock to go away until raising
> an exception. The default for the timeout parameter is 5.0 (five seconds).

For the sake of the operations we will be running the time spent by the
read-write operations is not worth considered (TBD) taking into account other
Conan operations.


## Cache folders

For each reference we need the following folders. Some folders are needed
before we know the final destination, if we want a deterministic cache layout
we need to move them **after** being used (is this a time issue?).

Some folders can't be deterministic as they depend on things that aren't,
like folders that encode the `prev`. Only when using a lockfile we will
know the `prev` in advance (or downloading from remotes).

### [tmp]/export

It is needed in order to compute the _recipe revision_.

### [rrev]/export

The place where `conanfile.py` and other exported files are located.

### [rrev]/export_source

Source files exported after the recipe.

### [rrev]/source

Resulting files after running `source()` function. Conan v2 should forbid
usage of `settings` or `options`. This folder is shared for all the
packages.

### [tmp]/build

Needed to build the package. It should be associated to the generated
`prev` (non deterministic builds), but the `prev` is not known yet.

### [tmp]/package

A place to put the generated files in order to compute the _package revision_

### [prev]/build

Final place for the _build_ folder. BIG ISSUE: if we move the files here
after the build, maybe you are no longer capable of debugging packages
you've built locally!

### [prev]/package

Final place for the package.

### [rrev]/dl

Place for downloaded `conan_export.tgz` and `conan_sources.tgz` files

### [prev]/dl

Place for downloaded `conan_package.tgz` files.



* We need some temporal folder to compute the recipe-revision, then
we can move everything to the final destination (`export` folder).

*
Empty file added conan/cache/__init__.py
Empty file.
87 changes: 87 additions & 0 deletions conan/cache/cache.py
@@ -0,0 +1,87 @@
import os
import shutil
from io import StringIO
from typing import Optional, Union

from conan.cache.cache_database import CacheDatabase, CacheDatabaseSqlite3Filesystem, \
CacheDatabaseSqlite3Memory
from conan.cache.recipe_layout import RecipeLayout
from conan.locks.locks_manager import LocksManager
from conans.model.ref import ConanFileReference, PackageReference


# TODO: Random folders are no longer accessible, how to get rid of them asap?
# TODO: Add timestamp for LRU
# TODO: We need the workflow to remove existing references.


class Cache:
def __init__(self, base_folder: str, backend: CacheDatabase, locks_manager: LocksManager):
self._base_folder = base_folder
self._locks_manager = locks_manager
self._backend = backend

@staticmethod
def create(backend_id: str, base_folder: str, locks_manager: LocksManager, **backend_kwargs):
if backend_id == 'sqlite3':
backend = CacheDatabaseSqlite3Filesystem(**backend_kwargs)
backend.create_table(if_not_exists=True)
return Cache(base_folder, backend, locks_manager)
elif backend_id == 'memory':
backend = CacheDatabaseSqlite3Memory(**backend_kwargs)
backend.create_table(if_not_exists=True)
return Cache(base_folder, backend, locks_manager)
else:
raise NotImplementedError(f'Backend {backend_id} for cache is not implemented')

def dump(self, output: StringIO):
""" Maybe just for debugging purposes """
self._backend.dump(output)

@property
def base_folder(self) -> str:
return self._base_folder

def get_reference_layout(self, ref: ConanFileReference) -> RecipeLayout:
return RecipeLayout(ref, cache=self, manager=self._locks_manager)

@staticmethod
def get_default_path(item: Union[ConanFileReference, PackageReference]):
if item.revision:
return item.full_str().replace('@', '/').replace('#', '/').replace(':', '/') # TODO: TBD
else:
return None

def _move_rrev(self, old_ref: ConanFileReference, new_ref: ConanFileReference,
move_reference_contents: bool = False) -> Optional[str]:
# Once we know the revision for a given reference, we need to update information in the
# backend and we might want to move folders.
# TODO: Add a little bit of all-or-nothing aka rollback

self._backend.update_rrev(old_ref, new_ref)

if move_reference_contents:
old_path, created = self._backend.get_or_create_directory(new_ref)
assert not created, "We've just updated it two lines above!"
new_path = self.get_default_path(new_ref)
self._backend.update_path(new_ref, new_path)
if os.path.exists(old_path):
shutil.move(old_path, new_path)
return new_path
else:
return None

def _move_prev(self, old_pref: PackageReference, new_pref: PackageReference,
move_package_contents: bool = False) -> Optional[str]:
# TODO: Add a little bit of all-or-nothing aka rollback
self._backend.update_prev(old_pref, new_pref)
if move_package_contents:
old_path, created = self._backend.get_or_create_directory(new_pref.ref, new_pref)
assert not created, "We've just updated it two lines above!"
new_path = self.get_default_path(new_pref)
self._backend.update_path(new_pref, new_path)
if os.path.exists(old_path):
shutil.move(old_path, new_path)
return new_path
else:
return None
180 changes: 180 additions & 0 deletions conan/cache/cache_database.py
@@ -0,0 +1,180 @@
import time
import uuid
from enum import Enum, unique
from io import StringIO
from typing import Tuple, Union

from conan.cache.exceptions import DuplicateReferenceException, DuplicatePackageReferenceException
from conan.utils.sqlite3 import Sqlite3MemoryMixin, Sqlite3FilesystemMixin
from conans.model.ref import ConanFileReference, PackageReference


@unique
class ConanFolders(Enum):
REFERENCE = 0
PKG_BUILD = 1
PKG_PACKAGE = 2


class CacheDatabase:
_table_name = "conan_cache_directories"
_column_ref = 'reference'
_column_ref_name = 'reference_name'
_column_rrev = 'rrev'
_column_pkgid = 'pkgid'
_column_prev = 'prev'
_column_path = 'relpath'
_column_folder = 'folder'
_column_last_modified = 'last_modified'

def create_table(self, if_not_exists: bool = True):
guard = 'IF NOT EXISTS' if if_not_exists else ''
query = f"""
CREATE TABLE {guard} {self._table_name} (
{self._column_ref} text NOT NULL,
{self._column_ref_name} text NOT NULL,
{self._column_rrev} text,
{self._column_pkgid} text,
{self._column_prev} text,
{self._column_path} text NOT NULL,
{self._column_folder} integer NOT NULL CHECK ({self._column_folder} IN (0,1, 2)),
{self._column_last_modified} integer NOT NULL
);
"""
# TODO: Need to add some timestamp for LRU removal
with self.connect() as conn:
conn.execute(query)

def dump(self, output: StringIO):
with self.connect() as conn:
r = conn.execute(f'SELECT * FROM {self._table_name}')
for it in r.fetchall():
output.write(str(it) + '\n')

def _get_random_directory(self, item: Union[ConanFileReference, PackageReference]) -> str:
# TODO: We could implement deterministic output for some inputs, not now.
# TODO: If we are creating the 'path' here, we need the base_folder (and lock depending on implementation)
return str(uuid.uuid4())

def _where_reference_clause(self, ref: ConanFileReference, filter_packages: bool) -> dict:
where_clauses = {
self._column_ref: str(ref),
self._column_rrev: ref.revision if ref.revision else None,
}
if filter_packages:
where_clauses.update({
self._column_pkgid: None,
self._column_prev: None
})
return where_clauses

def _where_package_reference_clause(self, pref: PackageReference) -> dict:
where_clauses = self._where_reference_clause(pref.ref, False)
where_clauses.update({
self._column_pkgid: pref.id if pref else None,
self._column_prev: pref.revision if pref and pref.revision else None
})
return where_clauses

def _where_clause(self, item: Union[ConanFileReference, PackageReference],
filter_packages: bool) -> Tuple[str, Tuple]:
if isinstance(item, ConanFileReference):
where_clauses = self._where_reference_clause(item, filter_packages)
else:
assert filter_packages, 'If using PackageReference then it WILL filter by packages'
where_clauses = self._where_package_reference_clause(item)

def cmp_expr(k, v):
return f'{k} = ?' if v is not None else f'{k} IS ?'

where_expr = ' AND '.join([cmp_expr(k, v) for k, v in where_clauses.items()])
where_values = tuple(where_clauses.values())
return where_expr, where_values

def get_or_create_directory(self, item: Union[ConanFileReference, PackageReference],
default_path: str = None) -> Tuple[str, bool]:
# reference = str(ref)
# assert reference, "Empty reference cannot get into the cache"
# assert not pref or ref == pref.ref, "Both parameters should belong to the same reference"

# Search the database
where_clause, where_values = self._where_clause(item, filter_packages=True)
query = f'SELECT {self._column_path} ' \
f'FROM {self._table_name} ' \
f'WHERE {where_clause};'

with self.connect() as conn:
r = conn.execute(query, where_values)
rows = r.fetchall()
assert len(rows) <= 1, f"Unique entry expected... found {rows}," \
f" for where clause {where_clause}" # TODO: Ensure this uniqueness
if not rows:
path = default_path or self._get_random_directory(item)
ref = item if isinstance(item, ConanFileReference) else item.ref
pref = item if isinstance(item, PackageReference) else None
values = (str(ref),
ref.name,
ref.revision if ref.revision else None,
pref.id if pref else None,
pref.revision if pref and pref.revision else None,
path,
ConanFolders.REFERENCE.value,
int(time.time()))
conn.execute(f'INSERT INTO {self._table_name} '
f'VALUES (?, ?, ?, ?, ?, ?, ?, ?)', values)
return path, True
else:
return rows[0][0], False

def update_rrev(self, old_ref: ConanFileReference, new_ref: ConanFileReference):
with self.connect() as conn:
# Check if the new_ref already exists, if not, we can move the old_one
where_clause, where_values = self._where_clause(new_ref, filter_packages=False)
query_exists = f'SELECT EXISTS(SELECT 1 ' \
f'FROM {self._table_name} ' \
f'WHERE {where_clause})'
r = conn.execute(query_exists, where_values)
if r.fetchone()[0] == 1:
raise DuplicateReferenceException(new_ref)

where_clause, where_values = self._where_clause(old_ref, filter_packages=False)
query = f"UPDATE {self._table_name} " \
f"SET {self._column_rrev} = '{new_ref.revision}' " \
f"WHERE {where_clause}"
r = conn.execute(query, where_values)
assert r.rowcount > 0

def update_prev(self, old_pref: PackageReference, new_pref: PackageReference):
with self.connect() as conn:
# Check if the new_pref already exists, if not, we can move the old_one
where_clause, where_values = self._where_clause(new_pref, filter_packages=True)
query_exists = f'SELECT EXISTS(SELECT 1 ' \
f'FROM {self._table_name} ' \
f'WHERE {where_clause})'
r = conn.execute(query_exists, where_values)
if r.fetchone()[0] == 1:
raise DuplicatePackageReferenceException(new_pref)

where_clause, where_values = self._where_clause(old_pref, filter_packages=True)
query = f"UPDATE {self._table_name} " \
f"SET {self._column_prev} = '{new_pref.revision}' " \
f"WHERE {where_clause}"
r = conn.execute(query, where_values)
assert r.rowcount > 0

def update_path(self, item: Union[ConanFileReference, PackageReference], new_path: str):
where_clause, where_values = self._where_clause(item, filter_packages=True)
query = f"UPDATE {self._table_name} " \
f"SET {self._column_path} = '{new_path}' " \
f"WHERE {where_clause}"
with self.connect() as conn:
r = conn.execute(query, where_values)
assert r.rowcount > 0


class CacheDatabaseSqlite3Memory(CacheDatabase, Sqlite3MemoryMixin):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer composition over inheritance. Try to avoid multiple inheritance too.
Specially, the "Memory" domain should exist in test scope only, not in production code.

Please, use composition for these, and have the Sqlite3Memory functionality defined in testing only, injected as a dependency, or patched.

pass


class CacheDatabaseSqlite3Filesystem(CacheDatabase, Sqlite3FilesystemMixin):
pass