![dans](images/dans.png)
![tf](images/tf-small.png)

---
Start with [convert](convert.ipynb)

---

# Getting data from online repos

We show the various ways by which you can get data that is out there on github to your computer.

The work horse is the function `checkoutRepo()` in `tf.applib.repo`.

Text-Fabric uses this function for all operations where data flows from GitHub to your computer.

There are quite some options, and here we explain all the `checkout` options, i.e. the selection of 
data from the history.

See also the [documentation](https://annotation.github.io/text-fabric/Api/Repo/).

In [1]:
%load_ext autoreload
%autoreload 2

# Leading example

We use markdown display from IPython purely for presentation. 
It is not needed to run `checkoutRepo()`.

In [2]:
from IPython.display import display, Markdown

In [3]:
from tf.applib.repo import checkoutRepo

We work with our tiny example TF app: `banks`.

In [4]:
ORG = 'annotation'
REPO = 'tutorials'
MAIN = 'text-fabric/examples/banks/tf'
MOD = 'text-fabric/examples/bankssim/tf'

`MAIN`points to the main data, `MOD` points to a module of data: the similarity feature.

# Presenting the results

The function `do()` just formats the results of a `checkoutRepo()` run.

The result of such a run, after the progress messages, is a tuple.
For the explanation of the tuple, read the [docs](https://annotation.github.io/text-fabric/Api/Repo/).

In [5]:
def do(task):
  result = task
  md = f'''
commit | release | local | base | subdir
--- | --- | --- | --- | ---
`{task[0]}` | `{task[1]}` | `{task[2]}` | `{task[3]}` | `{task[4]}`
'''
  display(Markdown(md))

# All the checkout options

We discuss the meaning and effects of the values you can pass to the `checkout` option.

## `clone`

Look whether the appropriate folder exists under your `~/github` directory.

This is merely a check whether your data exists in the expected location.

No online checks take place.

No data is moved or copied.

**NB**: you cannot select releases and commits in your *local* GitHub clone.
The data will be used as it is found on your file system.

**When to use**

If you are developing new feature data.

When you develop your data in a repository, your development is private as long as you
do not push to GitHub.

You can test your data, even without locally committing your data.

But, if you are ready to share your data, everything is in place, and you only
have to commit and push, and pass the location on github to others, like

```
myorg/myrepo/subfolder
```

In [6]:
do(
  checkoutRepo(org=ORG, repo=REPO, folder=MAIN, version='0.1', checkout='clone')
)

Using data in /Users/dirk/github/annotation/tutorials/text-fabric/examples/banks/tf/0.1:
	repo clone offline under ~/github (local github)



commit | release | local | base | subdir
--- | --- | --- | --- | ---
`None` | `None` | `clone` | `/Users/dirk/github` | `annotation/tutorials/text-fabric/examples/banks/tf`


Now we move our local data away.

In [7]:
%%sh

mv ~/github/annotation/tutorials/text-fabric/examples/banks/tf ~/github/annotation/tutorials/text-fabric/examples/banks/tfxxx

If you do not have a local github clone in `~/github`, this is what you get:

In [8]:
do(
  checkoutRepo(org=ORG, repo=REPO, folder=MAIN, version='0.1', checkout='clone')
)

The requested data is not available offline



commit | release | local | base | subdir
--- | --- | --- | --- | ---
`None` | `None` | `False` | `False` | `None`


We move the data back into place.

In [9]:
%%sh

mv ~/github/annotation/tutorials/text-fabric/examples/banks/tfxxx ~/github/annotation/tutorials/text-fabric/examples/banks/tf

Note that no attempt is made to retrieve that data from online.

## `local`

Look whether the appropriate folder exists under your `~/text-fabric-data` directory.

This is merely a check whether your data exists in the expected location.

No online checks take place.

No data is moved or copied.

**When to use**

If you are using data created and shared by others, and if the data
is already on your system.

You are sure that no updates are downloaded, and that everything works the same as the last time
you ran your program.

No online connection is ever made with this option.

If you do not already have the data, you have to pass `latest` or `hot` or `''` which will be discussed below.

In [10]:
do(
  checkoutRepo(org=ORG, repo=REPO, folder=MAIN, version='0.1', checkout='local')
)

Using data in /Users/dirk/text-fabric-data/annotation/tutorials/text-fabric/examples/banks/tf/0.1:
	rv1.0=#9a66aa2351c07de9163f86294be3f47d792ffd24 offline under ~/text-fabric-data (local release)



commit | release | local | base | subdir
--- | --- | --- | --- | ---
`9a66aa2351c07de9163f86294be3f47d792ffd24` | `v1.0` | `local` | `/Users/dirk/text-fabric-data` | `annotation/tutorials/text-fabric/examples/banks/tf`


You see this data because earlier I have downloaded release `v1.0`, which was committed with
hash `9a66aa2351c07de9163f86294be3f47d792ffd24`

Now we move our local data away.

In [11]:
%%sh

mv ~/text-fabric-data/annotation/tutorials/text-fabric/examples/banks/tf ~/text-fabric-data/annotation/tutorials/text-fabric/examples/banks/tfxxx

If you do not have any corresponding data in your `~/text-fabric-data`, you get this:

In [12]:
do(
  checkoutRepo(org=ORG, repo=REPO, folder=MAIN, version='0.1', checkout='local')
)

The requested data is not available offline



commit | release | local | base | subdir
--- | --- | --- | --- | ---
`None` | `None` | `False` | `False` | `None`


We move the data back into place.

In [13]:
%%sh

mv ~/text-fabric-data/annotation/tutorials/text-fabric/examples/banks/tfxxx ~/text-fabric-data/annotation/tutorials/text-fabric/examples/banks/tf

## `''` (default)

If you omit the `checkout` parameter, or pass `''` to it, the latest online copy will be identified, and the function
will make sure you get that copy.

So, if you do not have data yet, the latest online copy will be downloaded to your `~/text-fabric-data` folder.

If you do have data, but it is not up-to-data, the latest online copy will be downloaded to your `~/text-fabric-data`
folder, and it will replace the data you had.

After the download, you'll see a little file `__checkout__.txt`, which contains the release tag and/or commit hash.

But what is the latest online copy? In this case we mean:

* the latest *release*, and from that release an appropriate attached zip file
* but if there is no such zip file, we take the files from the corresponding commit
* but if there is no release at all, we take the files from the latest commit.

**When to use**

If you need data created/shared by other people and you want to be sure that you always have the
latest *stable* version of that data.

If the data provider makes releases after important modifications, you will get those.
If the data provider is experimenting after the latest release, and commits them to GitHub,
you do not get those.

However, with `hot`, you `can` get the latest commit, to be discussed below.

In [14]:
do(
  checkoutRepo(org=ORG, repo=REPO, folder=MAIN, version='0.1', checkout='')
)

	connecting to online GitHub repo annotation/tutorials ... connected
Using data in /Users/dirk/text-fabric-data/annotation/tutorials/text-fabric/examples/banks/tf/0.1:
	rv1.0=#9a66aa2351c07de9163f86294be3f47d792ffd24 (latest release)



commit | release | local | base | subdir
--- | --- | --- | --- | ---
`9a66aa2351c07de9163f86294be3f47d792ffd24` | `v1.0` | `None` | `/Users/dirk/text-fabric-data` | `annotation/tutorials/text-fabric/examples/banks/tf`


Note that no data has been downloaded, because it has been verified that the latest release is already on your computer.

Now we remove our local data.

In [15]:
%%sh

rm -rf ~/text-fabric-data/annotation/tutorials/text-fabric/examples/banks/tf

If you do not have any checkout of this data on your computer, the data will be downloaded.

In [21]:
do(
  checkoutRepo(org=ORG, repo=REPO, folder=MAIN, version='0.1', checkout='')
)

	connecting to online GitHub repo annotation/tutorials ... connected
	downloading https://github.com/annotation/tutorials/releases/download/v1.0/text-fabric-examples-banks-tf-0.1.zip ... 
	unzipping ... 
	saving data
Using data in /Users/dirk/text-fabric-data/annotation/tutorials/text-fabric/examples/banks/tf/0.1:
	rv1.0=#9a66aa2351c07de9163f86294be3f47d792ffd24 (latest release)



commit | release | local | base | subdir
--- | --- | --- | --- | ---
`9a66aa2351c07de9163f86294be3f47d792ffd24` | `v1.0` | `None` | `/Users/dirk/text-fabric-data` | `annotation/tutorials/text-fabric/examples/banks/tf`


For the verification, an online check is needed. The verification consists of checking the release tag and/or commit hash.

If there is no online connection, you get this:

In [31]:
%%sh

networksetup -setairportpower en0 off

In [32]:
do(
  checkoutRepo(org=ORG, repo=REPO, folder=MAIN, version='0.1', checkout='')
)

	connecting to online GitHub repo annotation/tutorials ... failed
The offline data may not be the latest
Using data in /Users/dirk/text-fabric-data/annotation/tutorials/text-fabric/examples/banks/tf/0.1:
	rv1.0=#9a66aa2351c07de9163f86294be3f47d792ffd24 (latest? release)



commit | release | local | base | subdir
--- | --- | --- | --- | ---
`9a66aa2351c07de9163f86294be3f47d792ffd24` | `v1.0` | `None` | `/Users/dirk/text-fabric-data` | `annotation/tutorials/text-fabric/examples/banks/tf`


or if you do not have local data:

In [33]:
%%sh

mv ~/text-fabric-data/annotation/tutorials/text-fabric/examples/banks/tf ~/text-fabric-data/annotation/tutorials/text-fabric/examples/banks/tfxxx

In [34]:
do(
  checkoutRepo(org=ORG, repo=REPO, folder=MAIN, version='0.1', checkout='')
)

	connecting to online GitHub repo annotation/tutorials ... failed


The requested data is not available offline



commit | release | local | base | subdir
--- | --- | --- | --- | ---
`None` | `None` | `False` | `False` | `None`


In [35]:
%%sh

mv ~/text-fabric-data/annotation/tutorials/text-fabric/examples/banks/tfxxx ~/text-fabric-data/annotation/tutorials/text-fabric/examples/banks/tf

In [36]:
%%sh

networksetup -setairportpower en0 on

## `latest`

The latest online release will be identified, and the function
will make sure you get that the data of that release in your `~/text-fabric-data` folder.

**When to use**

If you need data created/shared by other people and you want to be sure that you always have the
latest *stable* version of that data, but you only want to use it if it has been released.

The difference with `checkout=''` is that if there are no releases, you will not get data.

In [37]:
do(
  checkoutRepo(org=ORG, repo=REPO, folder=MAIN, version='0.1', checkout='latest')
)

	connecting to online GitHub repo annotation/tutorials ... connected
Using data in /Users/dirk/text-fabric-data/annotation/tutorials/text-fabric/examples/banks/tf/0.1:
	rv1.0=#9a66aa2351c07de9163f86294be3f47d792ffd24 (latest release)



commit | release | local | base | subdir
--- | --- | --- | --- | ---
`9a66aa2351c07de9163f86294be3f47d792ffd24` | `v1.0` | `None` | `/Users/dirk/text-fabric-data` | `annotation/tutorials/text-fabric/examples/banks/tf`


## `hot`

The latest online commit will be identified, and the function
will make sure you get that the data of that commit in your `~/text-fabric-data` folder.

**When to use**

If you need data created/shared by other people and you want to be sure that you always have the
latest version of that data, whether released or not.

The difference with `checkout=''` is that if there are releases, you will now get data that may be newer than the latest release.

In [38]:
do(
  checkoutRepo(org=ORG, repo=REPO, folder=MAIN, version='0.1', checkout='hot')
)

	connecting to online GitHub repo annotation/tutorials ... connected
	text-fabric/examples/banks/tf/0.1/author.tf...downloaded
	text-fabric/examples/banks/tf/0.1/gap.tf...downloaded
	text-fabric/examples/banks/tf/0.1/letters.tf...downloaded
	text-fabric/examples/banks/tf/0.1/number.tf...downloaded
	text-fabric/examples/banks/tf/0.1/oslots.tf...downloaded
	text-fabric/examples/banks/tf/0.1/otext.tf...downloaded
	text-fabric/examples/banks/tf/0.1/otype.tf...downloaded
	text-fabric/examples/banks/tf/0.1/punc.tf...downloaded
	text-fabric/examples/banks/tf/0.1/terminator.tf...downloaded
	text-fabric/examples/banks/tf/0.1/title.tf...downloaded
	OK
Using data in /Users/dirk/text-fabric-data/annotation/tutorials/text-fabric/examples/banks/tf/0.1:
	#58865f4e6abb31dec515a1bbc2fabe56420d08c5 (latest commit)



commit | release | local | base | subdir
--- | --- | --- | --- | ---
`58865f4e6abb31dec515a1bbc2fabe56420d08c5` | `None` | `None` | `/Users/dirk/text-fabric-data` | `annotation/tutorials/text-fabric/examples/banks/tf`


Observe that data has been downloaded, and that we have now data corresponding to a different commit hash,
and not corresponding to a release.

If we now ask for the latest *stable* data, the data will be downloaded anew.

In [39]:
do(
  checkoutRepo(org=ORG, repo=REPO, folder=MAIN, version='0.1', checkout='latest')
)

	connecting to online GitHub repo annotation/tutorials ... connected
	downloading https://github.com/annotation/tutorials/releases/download/v1.0/text-fabric-examples-banks-tf-0.1.zip ... 
	unzipping ... 
	saving data
Using data in /Users/dirk/text-fabric-data/annotation/tutorials/text-fabric/examples/banks/tf/0.1:
	rv1.0=#9a66aa2351c07de9163f86294be3f47d792ffd24 (latest release)



commit | release | local | base | subdir
--- | --- | --- | --- | ---
`9a66aa2351c07de9163f86294be3f47d792ffd24` | `v1.0` | `None` | `/Users/dirk/text-fabric-data` | `annotation/tutorials/text-fabric/examples/banks/tf`


## *dest* for another local destination

Look whether the appropriate folder exists under your *dest* directory, where you have passed `dest=`*dest*.

This is merely a check whether your data exists in the expected location.

No online checks take place.

No data is moved or copied.

**When to use**

As above, but you want to work outside the `~/text-fabric-data` directory.

Text-Fabric manages the `~/text-fabric-data` directory, and if you are experimenting you may not want
to interfere with that.

An other case is when you want to clone data into your `~/github` directory.
Then you need to pass `checkout='local'` and `dest=~/github`.
Note that `checkout='clone'` will never download stuff, it only looks at existing data.

Except for the different *dest* location, this works exactly the same as `local` without *dest*.

In [11]:
do(
  checkoutRepo(org=ORG, repo=REPO, folder=MAIN, version='0.1', checkout='local', dest='~/Downloads')
)

Using data in /Users/dirk/Downloads/annotation/tutorials/text-fabric/examples/banks/tf/0.1:
	rv1.0=#9a66aa2351c07de9163f86294be3f47d792ffd24 offline under ~/Downloads (local - without online check)



commit | release | local | base | subdir
--- | --- | --- | --- | ---
`9a66aa2351c07de9163f86294be3f47d792ffd24` | `v1.0` | `local` | `/Users/dirk/Downloads` | `annotation/tutorials/text-fabric/examples/banks/tf`


---
All chapters:

* [convert](convert.ipynb)
* [use](use.ipynb)
* [share](share.ipynb)
* [app](app.ipynb)
* *repo*

---