# Attachment in pySBOL2

The purpose of the `Attachment` class is to serve as a general container for data files, especially experimental data files. It provides a means for linking files and metadata to SBOL designs.

`Attachment` objects have the following properties:
- `source` : The source property is REQUIRED and MUST contain a URI reference to the source file.
- `format` : The format property is OPTIONAL and MAY contain a URI that specifies the format of the attached file. It is RECOMMENDED that this URI refer to a term from the EMBRACE Data and Methods (EDAM) ontology.
- `size`: The size property is OPTIONAL and MAY contain a long indicating the file size in bytes.
- `hash` : The hash property is OPTIONAL and MAY contain a SHA-1 hash of the file contents represented as a hexadecimal digest.

In this tutorial, we will fetch dummy experimental data file, then attach it to an SBOL document using pySBOL2.

Finally, retrieve the data file, from the Attachment Object.

For more information on the `Attachment` class and its properties, check out page 53 of the SBOL 2.3.0 specifications which can be found at the following [link](https://sbolstandard.org/docs/SBOL2.3.0.pdf).

## Creating an Attachment object

Fetching Example data

In [1]:
url = (
    "https://raw.githubusercontent.com/SynBioDex/SBOL-Notebooks/"
    "1e4d133dfeb313695f2cee394a580d2569ce6892"
    "/examples/sbol2/CreatingSBOL2Objects/plate_reader_exp1.csv"
)

# FIRST APPROACH WITH request library for general use
import requests

# Fetch the file
resp = requests.get(url)
resp.raise_for_status()  # will raise an exception if the fetch failed

# The .content is the raw bytes of the .csv file
file_bytes = resp.content

# 'size' property: just the length in bytes
file_size_bytes = len(file_bytes)


# # 2nd APPROACH with pandas library (for csv files only)
# import pandas as pd

# # Read url directly into a DataFrame
# df = pd.read_csv(url)

# # Convert the DataFrame back to CSV (string) and then to bytes
# csv_str = df.to_csv(index=False)
# file_bytes = csv_str.encode("utf-8")

# # DataFrame‐derived size in bytes
# file_size_bytes = len(df_bytes)

Hashing the file, for 'hash' property: SHA-1 of the raw bytes

In [2]:
import hashlib

hasher = hashlib.sha1()
hasher.update(file_bytes)
file_hash_sha1 = hasher.hexdigest()

import the module

In [3]:
import sbol2

In [4]:
doc = sbol2.Document()

# Set a namespace for the document
sbol2.setHomespace('https://github.com/SynBioDex/SBOL-Notebooks')

Creating `Attachment` object

In [5]:
attachment = sbol2.Attachment("exp1_growth_data")

# Required source property, 
# NOTE: source can be a local file with absolute path.
attachment.source = url

# Optional properties using the metadata
attachment.format = "http://edamontology.org/format_3752" # EDAM for csv
attachment.size = file_size_bytes
attachment.hash = file_hash_sha1

# --- Add the Attachment to the Document ---
doc.addAttachment(attachment)

In [6]:
report = doc.validate()
if report == 'Valid.':
    doc.write('example_attachment.xml')
else :
    print(report)

## Retrieving the File from the Attachment Object

In [7]:
import sbol2
import requests
import io
import csv

In [8]:
# Read the SBOL file
doc2 = sbol2.Document()
sbol2.setHomespace('https://github.com/SynBioDex/SBOL-Notebooks')

In [9]:
doc2.read(filename='example_attachment.xml')

In [10]:
experiment_data = doc2.attachments.find('exp1_growth_data')

In [11]:
url = experiment_data.source

In [12]:
# Fetch the file
resp = requests.get(url)
resp.raise_for_status()  # will raise an exception if the fetch failed

# The .content is the raw bytes of the .csv file
file_bytes = resp.content

There are two approaches for loading the file from the url, depending on your needs.
1. Write the file on disk, for further use
2. Use the bytes data in memory, for further use with 'io' module

In [13]:
# 1. Write the files on disk
with open("plate_reader_exp2.csv", "wb") as f:
    f.write(file_bytes)

In [14]:
# 2. Use the byted data in memory 
## Wrap & decode to text with 'io'
bytes_buf = io.BytesIO(file_bytes)
text_buf = io.TextIOWrapper(bytes_buf, encoding="utf-8")

reader = csv.reader(text_buf)
for row in reader:
    print(row)

['Time (h)', 'OD600', 'GFP_Fluorescence (AU)']
['0', '0.05', '100']
['1', '0.1', '150']
['2', '0.2', '500']
['3', '0.4', '2000']
['4', '0.8', '5000']
['5', '1.2', '8000']
['6', '1.5', '9500']
['7', '1.6', '10000']
['8', '1.65', '10100']
