Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a quickstart guide with local example #72

Merged
merged 8 commits into from
Feb 28, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions docs/source/development.md

This file was deleted.

12 changes: 12 additions & 0 deletions docs/source/guides.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
How-to Guides
=============

This section includes several how-to guides designed to get you started with Giftless quickly.

.. toctree::
:maxdepth: 1
:caption: Contents:

installation
quickstart
using-gcs
3 changes: 1 addition & 2 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,7 @@ methods and authentication methods.
:maxdepth: 2
:caption: Contents:

quickstart
installation
guides
configuration
components
development
Expand Down
12 changes: 9 additions & 3 deletions docs/source/installation.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
Installing and Running
======================
Installation
============

You can install and run Giftless in different ways, depending on your needs:

## Running from a Docker image
Expand Down Expand Up @@ -30,10 +31,15 @@ It is recommended to install Giftless into a virtual environment:
(venv) $ pip install uwsgi
(venv) $ pip install giftless

**IMPORTANT**: as of the time of writing, a bug in one of Giftless' dependencies
requires that you explicitly install dependencies after installing using `pip`:

(venv) $ pip install -Ur https://raw.githubusercontent.com/datopian/giftless/master/requirements.txt

Once installed, you can run Giftless locally with uWSGI:

# Run uWSGI (see uWSGI's manual for help on all arguments)
(.venv) $ uwsgi -M -T --threads 2 -p 2 --manage-script-name \
(venv) $ uwsgi -M -T --threads 2 -p 2 --manage-script-name \
--module giftless.wsgi_entrypoint --callable app --http 127.0.0.1:8080

This will listen on port `8080`.
Expand Down
205 changes: 160 additions & 45 deletions docs/source/quickstart.md
Original file line number Diff line number Diff line change
@@ -1,72 +1,187 @@
Getting Started
===============
Quick Start
===========

### Running a local example
This guide will introduce you to the basics of Giftless by getting it up and running locally, and seeing how it can
interact with a local git repository.

1. Create a new project on Github or any other platform.
Here, we create a project named `example-proj-datahub-io`.
## Installing and Running Locally

2. Add any data file to it.
The goal is to track this possible large file with
git-lfs and use Giftless as the local server. In our example,
we create a CSV named `research_data_factors.csv`.
Install Giftless to a local virtual environment. You will need Python 3.7 or newer:

```shell
# Create and active a virtual environment
mkdir giftless && cd giftless
python -m venv .venv
source .venv/bin/activate

3. Create a file named `giftless.yaml` in your project root directory with the
following content in order to have a local server:
# Install Giftless
pip install giftless
# The following line is required due to a bug in one of our dependencies:
pip install -Ur https://raw.githubusercontent.com/datopian/giftless/master/requirements.txt
```

**NOTE**: This is a non-production installation of Giftless, using Flask's built-in development server.
Check out the [installation guide](installation.md) for other installation options.

Once done, verify that Giftless can run:
```shell
# Run Giftless using the built-in development server
export FLASK_APP=giftless.wsgi_entrypoint
flask run
```

You should see something like:

```shell
Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
```

This means Giftless is up and running with some default configuration on *localhost* port *5000*, with
the default configuration options.

Hit Ctrl+C to stop Giftless.

## Basic Configuration
To configure Giftless, create a file named `giftless.conf.yaml` in the current directory with the
following content:

```yaml
TRANSFER_ADAPTERS:
basic:
factory: giftless.transfer.basic_streaming:factory
options:
storage_class: giftless.storage.local_storage:LocalStorage
# Giftless configuration
AUTH_PROVIDERS:
- giftless.auth.allow_anon:read_write
```

4. Export it:
This will override the default read-only access mode, and allow open and full access to anyone, to any object stored
with Giftless. Clearly this is not useful in a production setting, but for a local test this will do fine.

```bash
$ export GIFTLESS_CONFIG_FILE=giftless.yaml
Run Giftless again, pointing to this new configuration file:
```shell
export GIFTLESS_CONFIG_FILE=giftless.conf.yaml
flask run
```

5. Start the Giftless server (by docker or Python).
## Interacting with git
We will now proceed to show how Giftless can interact with a local `git` repository, as a demonstration of how Git LFS
works.

6. Initialize your git repo and connect it with the
remote project:
Keep Giftless running and open a new terminal window or tab.

```bash
git init
git remote add origin YOUR_REMOTE_REPO
### Install the `lfs` Git extension
While having a local installation of `git-lfs` is not required to run Giftless, you will need
it to follow this guide.

Run:
```shell
git lfs version
```

If you see an error indicating that `'lfs' is not a git command`, follow the
[Git LFS installation instructions here](https://git-lfs.github.com/). On Linux, you may be able
to simply install the `git-lfs` package provided by your distro.

**IMPORTANT**: If you have `git-lfs` older than version 2.10, you will need to upgrade it to follow this tutorial,
otherwise you may encounter some unexpected errors. Follow the instructions linked above to upgrade to the latest
version.

### Create a local "remote" repository
For the purpose of this tutorial, we will create a fake "remote" git repository on your local disk. This is analogous
to a real-world remote repository such as GitHub or any other Git remote, but is simpler to set up.

```shell
mkdir fake-remote-repo && cd fake-remote-repo
git init --bare
cd ..
```

7. Track files with git-lfs:
Of course, you may choose to use any other remote repository instead - just remember to replace the repository URL
in the upcoming `git clone` command.

### Create a local repository and push some file
Clone the remote repository we have just created to a local repository:

```bash
git lfs track 'research_data_factors.csv'
git lfs track
git add .gitattributes #you should have a .gitattributes file at this point
git add "research_data_factors.csv"
git commit -m "Tracking data files"
```shell
git clone fake-remote-repo local-repo
cd local-repo
```
* You can see a list of tracked files with `git lfs ls-files`

8. Configure `lfs.url` to point to your local Giftless server instance:
Create some files and add them to git:
```shell
# This README file will be committed to Git as usual
echo "# This is a Giftless test" > README.md
# Let's also create a 1mb binary file which we'll want to store in Git LFS
dd if=/dev/zero of=1mb-blob.bin bs=1024 count=1024
git add README.md 1mb-blob.bin
```

```bash
git config -f .lfsconfig lfs.url http://127.0.0.1:5000/<user_or_org>/<repo>/
# in our case, we used http://127.0.0.1:5000/datopian/example-proj-datahub-io/;
# make sure to end your lfs.url with /
Enable Git LFS and tell it to track `.bin` files:
```shell
git lfs install
git lfs track "*.bin"
```

9. The previous configuration will produce changes into `.lfsconfig` file.
Add it to git:
This will actually create a file named `.gitattributes` in the root of your
repository, with the following content:

```shell
*.bin filter=lfs diff=lfs merge=lfs -text
```

```bash
git add .lfsconfig
git commit -m "New git-lfs server endpoint"
# if you don't see any changes, run git rm --cached *.csv and then re-add your files, then commit it
git lfs push origin master
Tell Git LFS where to find the Giftless server. We will do that by using the `git config` command to write to the
`.lfsconfig` file:
```shell
git config -f .lfsconfig lfs.url http://127.0.0.1:5000/my-organization/test-repo
```

**NOTE**: `my-organization/test-repo` is an organization / repository prefix under which your files will be stored.
Giftless requires all files to be stored under such prefix.

Tell git to track the configuration files we have just created. This will allow other users to have the same Git LFS
configuration as us when cloning the repository:
```shell
git add .gitattributes .lfsconfig
```

Commit all the files we have staged:
```shell
git commit -m "Adding some files to track"
```

Finally, let's push our tracked files to Git LFS:
```shell
git push -u origin master
```

### See your objects stored by Giftless locally

Switch over to the shell in which Giftless is running, and you will see log messages indicating that a file has just
been pushed to storage and verified. This should be similar to:

```
INFO 127.0.0.1 - - "POST /my-organization/test-repo/objects/batch HTTP/1.1" 200 -
INFO 127.0.0.1 - - "PUT /my-organization/test-repo/objects/storage/30e14955ebf1352266dc2ff8067e68104607e750abb9d3b36582b8af909fcb58 HTTP/1.1" 200 -
INFO 127.0.0.1 - - "POST /my-organization/test-repo/objects/storage/verify HTTP/1.1" 200 -
```

To further verify that the file has been stored by Giftless, we can list the files in our local Giftless storage
directory:

```shell
ls -lR ../lfs-storage/
```
You should see:
```shell
../lfs-storage/my-organization/test-repo:
total 1024
-rw-rw-r-- 1 shahar shahar 1048576 Feb 28 12:08 30e14955ebf1352266dc2ff8067e68104607e750abb9d3b36582b8af909fcb58
```

You will notice a 1mb file stored in `../lfs-storage/my-organization/test-repo` - this is identical to our `1mb-blob.bin`
file, but it is stored with its SHA256 digest as its name.

## Summary

You have now seen Giftless used as both a Git LFS server, and as a storage backend. This should give you a basic sense
of how to run Giftless, and how Git LFS servers interact with Git.

In a real-world scenario, you would typically have Giftless serve as the Git LFS server but not as a storage backend -
storage will be off-loaded to a Cloud Storage service which has been configured for this purpose.
5 changes: 5 additions & 0 deletions docs/source/storage-backends.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,11 @@ Here is an example of how to run it:
--module giftless.wsgi_entrypoint --callable app --http 127.0.0.1:8080
```

#### Notes

* If you plan to access objects directly from a browser (e.g. using a JavaScript based Git LFS client library),
your GCS bucket needs to be [CORS enabled](https://cloud.google.com/storage/docs/configuring-cors).

### Local Filesystem Storage

#### `giftless.storage.local:LocalStorage`
Expand Down
21 changes: 20 additions & 1 deletion docs/source/transfer-adapters.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,26 @@ Git LFS servers and clients can implement and negotiate different
[transfer adapters](https://github.com/git-lfs/git-lfs/blob/master/docs/api/basic-transfers.md).
Typically, Git LFS will only define a `basic` transfer mode and support that. `basic` is simple
and efficient for direct-to-storage uploads for backends that support uploading using
a single `PUT` request.
a single `PUT` request.

### External Storage `basic` transfer adapter
The `basic_external` transfer adapter is designed to facilitate LFS `basic` mode transfers (the default transfer
mode of Git LFS) for setups in which the storage backends supports communicating directly with the Git LFS client. That
is, files will be uploaded or downloaded directly from a storage service that supports HTTP `PUT` / `GET` based access,
without passing through Giftless. With this adapter, Giftless will not handle any file transfers - it will only be
responsible for providing the client with access to storage.

This transfer adapter works with storage adapters implementing the `ExternalStorage` storage interface - typically these
are Cloud storage service based backends.

### Streaming `basic` transfer adapter
The `basic_streaming` transfer adapter facilitates LFS `basic` mode transfers in which Giftless also handles object
upload, download and verification requests directly. This is less scalable and typically less performant than
the `basic_external` adapter, as all data and potentially long-running HTTP requests must be passed through Giftless
and its Python runtime. However, in some situations this may be preferable to direct-to-storage HTTP requests.

`basic_streaming` supports local storage, and also streaming requests from some Cloud storage service backends such as
Azure and Google Cloud - although these tend to also support the `basic_external` transfer adapter.

## Multipart Transfer Mode
To support more complex, and especially multi-part uploads (uploads done using more
Expand Down
Loading