Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GitLab storage #3461

Merged
merged 10 commits into from Oct 12, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
5 changes: 5 additions & 0 deletions changes/pr3461.yaml
@@ -0,0 +1,5 @@
enhancement:
- "Add Gitlab storage - [#3461](https://github.com/PrefectHQ/prefect/pull/3461)"

contributor:
- "[Max Del Giudice](https://github.com/madelgi)"
10 changes: 10 additions & 0 deletions docs/core/idioms/file-based.md
Expand Up @@ -22,6 +22,7 @@ pip install 'prefect[github]'
```
:::


In this example we will walk through a potential workflow you may use when registering flows with
[GitHub](/api/latest/environments/storage.html#github) storage. This example takes place in a GitHub
repository with the following structure:
Expand Down Expand Up @@ -94,6 +95,15 @@ If you change any of the structure of your flow such as task names, rearrange ta
will need to reregister that flow.
:::

::: tip GitLab users
This example applies to GitLab as well. To use GitLab storage, install the `gitlab` extra:

```bash
pip install 'prefect[gitlab]'
```

You can replace `GitHub` instances in the example above with `GitLab`, use the `"GITLAB_ACCESS_TOKEN"` secret rather than `"GITHUB_ACCESS_TOKEN"`, and then you may run the example as written.
:::
### File based Docker storage

```python
Expand Down
18 changes: 18 additions & 0 deletions docs/orchestration/execution/storage_options.md
Expand Up @@ -119,6 +119,24 @@ Flows registered with this storage option will automatically be labeled with `"g
GitHub storage uses a [personal access token](https://help.github.com/en/github/authenticating-to-github/creating-a-personal-access-token-for-the-command-line) for authenticating with repositories.
:::

## GitLab

[GitLab Storage](/api/latest/environments/storage.html#github) is a storage option that uploads flows to a GitLab repository as `.py` files.

Much of the GitHub example in the [file based storage](/core/idioms/file-based.html) documentation applies to GitLab as well.

::: tip Sensible Defaults
Flows registered with this storage option will automatically be labeled with `"gitlab-flow-storage"`; this helps prevents agents not explicitly authenticated with your GitLab repo from attempting to run this flow.
:::

:::tip GitLab Credentials
GitLab storage uses a [personal access token](https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html) for authenticating with repositories.
:::

:::tip GitLab Server
GitLab server users can point the `host` argument to their personal GitLab instance.
:::

## Docker

[Docker Storage](/api/latest/environments/storage.html#docker) is a storage option that puts flows inside of a Docker image and pushes them to a container registry. This method of Storage has deployment compatability with the [Docker Agent](/orchestration/agents/docker.html), [Kubernetes Agent](/orchestration/agents/kubernetes.html), and [Fargate Agent](/orchestration/agents/fargate.html).
Expand Down
2 changes: 1 addition & 1 deletion docs/outline.toml
Expand Up @@ -225,7 +225,7 @@ classes = ["CloudFlowRunner", "CloudTaskRunner"]
[pages.environments.storage]
title = "Storage"
module = "prefect.environments.storage"
classes = ["Storage", "Docker", "Local", "S3", "GCS", "Azure", "GitHub", "Webhook"]
classes = ["Storage", "Docker", "Local", "S3", "GCS", "Azure", "GitHub", "Webhook", "GitLab"]

[pages.environments.execution]
title = "Execution Environments"
Expand Down
1 change: 1 addition & 0 deletions setup.py
Expand Up @@ -45,6 +45,7 @@ def run(self):
"google-cloud-storage >= 1.13, < 2.0",
],
"github": ["PyGithub >= 1.51, < 2.0"],
"gitlab": ["python-gitlab >= 2.5.0, < 3.0"],
"google": [
"google-cloud-bigquery >= 1.6.0, < 2.0",
"google-cloud-storage >= 1.13, < 2.0",
Expand Down
5 changes: 3 additions & 2 deletions src/prefect/agent/local/agent.py
Expand Up @@ -6,7 +6,7 @@

from prefect import config
from prefect.agent import Agent
from prefect.environments.storage import GCS, S3, Azure, Local, GitHub, Webhook
from prefect.environments.storage import GCS, S3, Azure, Local, GitHub, GitLab, Webhook
from prefect.serialization.storage import StorageSchema
from prefect.utilities.agent import get_flow_run_command
from prefect.utilities.graphql import GraphQLResult
Expand Down Expand Up @@ -98,6 +98,7 @@ def __init__(
"s3-flow-storage",
"github-flow-storage",
"webhook-flow-storage",
"gitlab-flow-storage",
]
for label in all_storage_labels:
if label not in self.labels:
Expand Down Expand Up @@ -133,7 +134,7 @@ def deploy_flow(self, flow_run: GraphQLResult) -> str:

if not isinstance(
StorageSchema().load(flow_run.flow.storage),
(Local, Azure, GCS, S3, GitHub, Webhook),
(Local, Azure, GCS, S3, GitHub, GitLab, Webhook),
):
self.logger.error(
"Storage for flow run {} is not a supported type.".format(flow_run.id)
Expand Down
1 change: 1 addition & 0 deletions src/prefect/environments/storage/__init__.py
Expand Up @@ -32,6 +32,7 @@
from prefect.environments.storage.gcs import GCS
from prefect.environments.storage.s3 import S3
from prefect.environments.storage.github import GitHub
from prefect.environments.storage.gitlab import GitLab
from prefect.environments.storage.webhook import Webhook


Expand Down
161 changes: 161 additions & 0 deletions src/prefect/environments/storage/gitlab.py
@@ -0,0 +1,161 @@
from typing import TYPE_CHECKING, Any, Dict, List
from urllib.parse import quote_plus

from prefect.environments.storage import Storage
from prefect.utilities.storage import extract_flow_from_file

if TYPE_CHECKING:
from prefect.core.flow import Flow


class GitLab(Storage):
"""
GitLab storage class. This class represents the Storage interface for Flows stored
in `.py` files in a GitLab repository.

This class represents a mapping of flow name to file paths contained in the git repo,
meaning that all flow files should be pushed independently. A typical workflow using
this storage type might look like the following:

- Compose flow `.py` file where flow has GitLab storage:

```python
flow = Flow("my-flow")
# Can also use `repo="123456"`
flow.storage = GitLab(repo="my/repo", path="/flows/flow.py", ref="my-branch")
```

- Push this `flow.py` file to the `my/repo` repository under `/flows/flow.py`.

- Call `prefect register -f flow.py` to register this flow with GitLab storage.

Args:
- repo (str): the project path (i.e., '<namespace>/<project>') or ID
- host (str, optional): If using Gitlab server, the server host. If not specified, defaults
to Gitlab cloud.
- path (str, optional): a path pointing to a flow file in the repo
- ref (str, optional): a commit SHA-1 value or branch name
- **kwargs (Any, optional): any additional `Storage` initialization options
"""

def __init__(
self,
repo: str,
host: str = None,
path: str = None,
ref: str = None,
**kwargs: Any,
) -> None:
self.flows = dict() # type: Dict[str, str]
self._flows = dict() # type: Dict[str, "Flow"]
self.repo = repo
self.host = host
self.path = path
self.ref = ref

super().__init__(**kwargs)

@property
def default_labels(self) -> List[str]:
return ["gitlab-flow-storage"]

def get_flow(self, flow_location: str = None, ref: str = None) -> "Flow":
"""
Given a flow_location within this Storage object, returns the underlying Flow (if possible).
If the Flow is not found an error will be logged and `None` will be returned.

Args:
- flow_location (str): the location of a flow within this Storage; in this case,
a file path on a repository where a Flow file has been committed. Will use `path` if not
provided.
- ref (str, optional): a commit SHA-1 value or branch name. Defaults to 'master' if
not specified

Returns:
- Flow: the requested Flow

Raises:
- ValueError: if the flow is not contained in this storage
- UnknownObjectException: if the flow file is unable to be retrieved
"""
if flow_location:
if flow_location not in self.flows.values():
raise ValueError("Flow is not contained in this Storage")
elif self.path:
flow_location = self.path
else:
raise ValueError("No flow location provided")

# Use ref argument if exists, else use attribute, else default to 'master'
ref = ref if ref else (self.ref if self.ref else "master")

from gitlab.exceptions import GitlabAuthenticationError, GitlabGetError

try:
project = self._gitlab_client.projects.get(quote_plus(self.repo))
contents = project.files.get(file_path=flow_location, ref=ref)
except GitlabAuthenticationError:
self.logger.error(
"Unable to authenticate Gitlab account. Please check your credentials."
)
raise
except GitlabGetError:
self.logger.error(
f"Error retrieving file contents at {flow_location} in {self.repo}@{ref}. "
"Ensure the project and file exist."
)
raise

return extract_flow_from_file(file_contents=contents.decode())

def add_flow(self, flow: "Flow") -> str:
"""
Method for storing a new flow as bytes in the local filesytem.

Args:
- flow (Flow): a Prefect Flow to add

Returns:
- str: the location of the added flow in the repo

Raises:
- ValueError: if a flow with the same name is already contained in this storage
"""
if flow.name in self:
raise ValueError(
'Name conflict: Flow with the name "{}" is already present in this storage.'.format(
flow.name
)
)

self.flows[flow.name] = self.path # type: ignore
self._flows[flow.name] = flow
return self.path # type: ignore

def build(self) -> "Storage":
"""
Build the GitLab storage object and run basic healthchecks. Due to this object
supporting file based storage no files are committed to the repository during
this step. Instead, all files should be committed independently.

Returns:
- Storage: a GitLab object that contains information about how and where
each flow is stored
"""
self.run_basic_healthchecks()

return self

def __contains__(self, obj: Any) -> bool:
"""
Method for determining whether an object is contained within this storage.
"""
if not isinstance(obj, str):
return False
return obj in self.flows

@property
def _gitlab_client(self): # type: ignore
from prefect.utilities.git import get_gitlab_client

return get_gitlab_client(host=self.host)
21 changes: 21 additions & 0 deletions src/prefect/serialization/storage.py
Expand Up @@ -10,6 +10,7 @@
Local,
Storage,
GitHub,
GitLab,
Webhook,
)
from prefect.utilities.serialization import JSONCompatible, ObjectSchema, OneOfSchema
Expand Down Expand Up @@ -135,6 +136,25 @@ def create_object(self, data: dict, **kwargs: Any) -> GitHub:
return base_obj


class GitLabSchema(ObjectSchema):
class Meta:
object_class = GitLab

repo = fields.String(allow_none=False)
path = fields.String(allow_none=True)
host = fields.String(allow_none=True)
ref = fields.String(allow_none=True)
flows = fields.Dict(key=fields.Str(), values=fields.Str())
secrets = fields.List(fields.Str(), allow_none=True)

@post_load
def create_object(self, data: dict, **kwargs: Any) -> GitHub:
flows = data.pop("flows", dict())
base_obj = super().create_object(data)
base_obj.flows = flows
return base_obj


class WebhookSchema(ObjectSchema):
class Meta:
object_class = Webhook
Expand Down Expand Up @@ -169,5 +189,6 @@ class StorageSchema(OneOfSchema):
"Storage": BaseStorageSchema,
"S3": S3Schema,
"GitHub": GitHubSchema,
"GitLab": GitLabSchema,
"Webhook": WebhookSchema,
}
56 changes: 54 additions & 2 deletions src/prefect/utilities/git.py
@@ -1,12 +1,21 @@
"""
Utility functions for interacting with git.
Utility functions for interacting with git-based clients.
"""
import os
import prefect

from github import Github
from typing import Any

try:
from github import Github
except ImportError:
Github = None

try:
from gitlab import Gitlab
except ImportError:
Gitlab = None


def get_github_client(credentials: dict = None, **kwargs: Any) -> "Github":
"""
Expand All @@ -21,6 +30,11 @@ def get_github_client(credentials: dict = None, **kwargs: Any) -> "Github":
Returns:
- Client: an initialized and authenticated github Client
"""
if not Github:
raise ImportError(
"Unable to import Github, please ensure you have installed the github extra"
)

access_token = None

if credentials:
Expand All @@ -35,3 +49,41 @@ def get_github_client(credentials: dict = None, **kwargs: Any) -> "Github":
access_token = os.getenv("GITHUB_ACCESS_TOKEN", None)

return Github(access_token, **kwargs)


def get_gitlab_client(
credentials: dict = None, host: str = None, **kwargs: Any
) -> "Gitlab":
"""
Utility function for loading gitlab client objects from a given set of credentials.

Args:
- credentials (dict, optional): a dictionary of AWS credentials used to
initialize the Client; if not provided, will attempt to load the
Client using ambient environment settings
- host (str, optional): the host string for gitlab server users. If not provided, defaults
to https://gitlab.com
- **kwargs (Any, optional): additional keyword arguments to pass to the gitlab Client

Returns:
- Client: an initialized and authenticated gitlab Client
"""
if not Gitlab:
raise ImportError(
"Unable to import Gitlab, please ensure you have installed the gitlab extra"
)

if credentials:
access_token = credentials.get("GITLAB_ACCESS_TOKEN")
else:
access_token = prefect.context.get("secrets", {}).get(
"GITLAB_ACCESS_TOKEN", None
)

if not access_token:
access_token = os.getenv("GITLAB_ACCESS_TOKEN", None)

if not host:
host = "https://gitlab.com"

return Gitlab(host, private_token=access_token, **kwargs)