Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYNPY-1357] Allow multiple values in manifest TSV #1030

Merged
merged 13 commits into from
Jan 12, 2024
53 changes: 46 additions & 7 deletions docs/explanations/manifest_tsv.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,13 +46,44 @@ See:

Any columns that are not in the reserved names described above will be interpreted as annotations of the file

For example this is adding 2 annotations to each row:
Adding 4 annotations to each row:

| path | parent | annot1 | annot2 | annot3 | annot4 | annot5 |
| --- | --- | --- | --- | --- | --- | --- |
| /path/file1.txt | syn1243 | bar | 3.1415 | [aaaa, bbbb] | [14,27,30] | ["Annotation, with a comma", another annotation]
| /path/file2.txt | syn12433 | baz | 2.71 | [value_1,value_2] | [1,2,3] | [test 123, test 456]
| /path/file3.txt | syn12455 | zzz | 3.52 | [value_3,value_4] | [42, 56, 77] | [a single annotation]

#### Multiple values of annotations per key
Using multiple values for a single annotation should be used sparingly as it makes it more
difficult for you to manage the data. However, it is supported.

**Annotations can be comma `,` separated lists surrounded by brackets `[]`.**

If you have a string that requires a `,` to be used in the formatting of the string and
you want it to be a part of a multi-value annotation you will need to wrap it in
double quotes.

This is an annotation with 2 values:

| path | parent | annot1 |
| --- | --- | --- |
| /path/file1.txt | syn1243 | [my first annotation, "my, second, annotation"] |


This is an annotation with 4 value:

| path | parent | annot1 |
| --- | --- | --- |
| /path/file1.txt | syn1243 | [my first annotation, my, second, annotation] |


This is an annotation with 1 value:

| path | parent | annot1 |
| --- | --- | --- |
| /path/file1.txt | syn1243 | my, sentence, with, commas |

| path | parent | annot1 | annot2 |
| --- | --- | --- | --- |
| /path/file1.txt | syn1243 | "bar" | 3.1415 |
| /path/file2.txt | syn12433 | "baz" | 2.71 |
| /path/file3.txt | syn12455 | "zzz" | 3.52 |

See:

Expand All @@ -73,8 +104,16 @@ See:
| /path/file2.txt | syn12433 | "baz" | 2.71 | 2001-01-01 15:00:00+07:00 | "" | "https://github.org/foo/baz" |
| /path/file3.txt | syn12455 | "zzz" | 3.52 | 2023-12-04T07:00:00Z | "" | "https://github.org/foo/zzz" |

## See:
### Dates in the manifest file
Dates within the manifest file will always be written as [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) format in UTC without milliseconds. For example: `2023-12-20T16:55:08Z`.

Dates can be written in other formats specified in ISO 8601 and it will be reconginzed, however, the [synapseutils.syncFromSynapse][] will always write this in the UTC format specified above. For example you may want to specify a datetime at a specific timezone like: `2023-12-20 23:55:08-07:00` and this will be recognized as a valid datetime.


## Refernces:

- [synapseutils.syncFromSynapse][]
thomasyu888 marked this conversation as resolved.
Show resolved Hide resolved
- [synapseutils.syncToSynapse][]
- [Managing custom metadata at scale](https://help.synapse.org/docs/Managing-Custom-Metadata-at-Scale.2004254976.html#ManagingCustomMetadataatScale-BatchUploadFileswithAnnotations)
- [synapseutils.sync.syncToSynapse][]
- [synapseutils.sync.syncFromSynapse][]
13 changes: 11 additions & 2 deletions synapseclient/annotations.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@
entity.lat_long = [47.627477, -122.332154]
```

Record when we collected the data. This will use the current timezone of the machine
running the code.
Record when we collected the data. **This will use the current timezone of the machine
running the code.**

```python
from datetime import datetime as Datetime
Expand All @@ -36,6 +36,15 @@
entity.collection_date = Datetime.utcnow()
```

You may also use a Timezone aware datetime object like the following example. Using the
[pytz library](https://pypi.org/project/pytz/) is recommended for this purpose.:

```python
from datetime import datetime as Datetime, timezone as Timezone, timedelta as Timedelta

date = Datetime(2023, 12, 20, 8, 10, 0, tzinfo=Timezone(Timedelta(hours=-5)))
```

See:

- [synapseclient.Synapse.get_annotations][]
Expand Down
58 changes: 51 additions & 7 deletions synapseclient/core/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -348,6 +348,29 @@ def is_synapse_id_str(obj):
return None


def bool_or_none(input_value: str) -> typing.Union[bool, None]:
"""
Attempts to convert a string to a bool. Returns None if it fails.

Args:
input_value: The string to convert to a bool

Returns:
The bool or None if the conversion fails
"""
if input_value is None or input_value == "":
return None

return_value = None

if input_value.lower() == "true":
return_value = True
elif input_value.lower() == "false":
return_value = False

return return_value


def datetime_or_none(datetime_str: str) -> typing.Union[datetime.datetime, None]:
"""Attempts to convert a string to a datetime object. Returns None if it fails.

Expand Down Expand Up @@ -536,20 +559,41 @@ def from_unix_epoch_time(ms) -> datetime.datetime:
return from_unix_epoch_time_secs(ms / 1000.0)


def datetime_to_iso(dt, sep="T"):
# Round microseconds to milliseconds (as expected by older clients)
# and add back the "Z" at the end.
# see: http://stackoverflow.com/questions/30266188/how-to-convert-date-string-to-iso8601-standard
def datetime_to_iso(
BryanFauble marked this conversation as resolved.
Show resolved Hide resolved
dt: datetime.datetime, sep: str = "T", include_milliseconds_if_zero: bool = True
) -> str:
"""
Round microseconds to milliseconds (as expected by older clients) and add back
the "Z" at the end.
See: http://stackoverflow.com/questions/30266188/how-to-convert-date-string-to-iso8601-standard

Args:
dt: The datetime to convert
sep: Seperator character to use.
include_milliseconds_if_zero: Whether or not to include millseconds in this result
if the number of millseconds is 0.

Returns:
The formatted string.
"""
fmt = (
"{time.year:04}-{time.month:02}-{time.day:02}"
"{sep}{time.hour:02}:{time.minute:02}:{time.second:02}.{millisecond:03}{tz}"
)
fmt_no_mills = (
"{time.year:04}-{time.month:02}-{time.day:02}"
"{sep}{time.hour:02}:{time.minute:02}:{time.second:02}{tz}"
)
if dt.microsecond >= 999500:
dt -= datetime.timedelta(microseconds=dt.microsecond)
dt += datetime.timedelta(seconds=1)
return fmt.format(
time=dt, millisecond=int(round(dt.microsecond / 1000.0)), tz="Z", sep=sep
)
rounded_microseconds = int(round(dt.microsecond / 1000.0))
if include_milliseconds_if_zero or rounded_microseconds:
return fmt.format(time=dt, millisecond=rounded_microseconds, tz="Z", sep=sep)
else:
return fmt_no_mills.format(
time=dt, millisecond=rounded_microseconds, tz="Z", sep=sep
)


def iso_to_datetime(iso_time):
Expand Down
Loading
Loading