# How to use Weave

Weave is a custom package used to facilitate the creation and maintenance of complex data warehouses.

Weave was created from a need to track the lineage of data products derived from multiple sources. Weave can be used to upload arbitrary data products to a datastore with options to store metadata and information about how data were derived. When Weave is used to upload the data, Weave can then be used to access the data using Pythonic API calls, as well as giving the user easy access to data provenance.

In this notebook we will demo some of Weave's functionality.

In [1]:
import os

import numpy as np
import pandas as pd

import weave
from weave.pantry import Pantry
from weave.index.index_pandas import IndexPandas
from weave.index import index_abc
from weave.index.index_sqlite import IndexSQLite
from weave import validate

from fsspec.implementations.local import LocalFileSystem

## Creating a Pantry/Index
A pantry is a storage location that holds baskets or collections of baskets. An Index is an object or file that tracks the baskets in a pantry.
Below we will demonstrate how to create a pantry/Index, but first we need a location for the pantry. In this demo we will place all of our pantry's in the local file system. 

In [2]:
local_fs = LocalFileSystem()
text_file = open("WeaveDemoText.txt", "w")
text_file.write("This is some text for the weave notebook demo.")
text_file.close()

Below we create a pantry using the local file system within our working directory. 

In [3]:
pantry_name = "weave-demo-pantry"
local_fs.mkdir(pantry_name)

Next, let's create a pantry where we will store our baskets. For this pantry we will using be using pandas for the index backend. The pandas backend is convenient because it is a familiar library, but it can be slow for large amounts of data. Later in the demo we will use a SQLite backend, which would be much faster for large amounts of data.

In [4]:
pantry1 = Pantry(IndexPandas, pantry_path=pantry_name, file_system=local_fs)
index_df = pantry1.index.to_pandas_df()
index_df

Unnamed: 0,uuid,upload_time,parent_uuids,basket_type,label,weave_version,address,storage_type


## Creating and Uploading Baskets
A basket is a representation of an atomic data product within a pantry. Below we will demonstrate how to upload a basket containing our dummy data, called WeaveDemoText, using a pantry object. When uploading a basket we create a list of the files we want to upload containing a dictionary specifying the path and stub. Path is Path of the file on the local system. Stub is a boolean to indicate whether the basket includes a copy or reference to the file. True indicates a reference is uploaded. The user also can specify the type of the basket using a string and metadata about the basket using a dictionary.

In [5]:
pantry1.upload_basket(upload_items=[{'path':'WeaveDemoText.txt', 'stub':False}], basket_type="test-1", metadata = {'Data Type':'text'})

Unnamed: 0,uuid,upload_time,parent_uuids,basket_type,label,weave_version,address,storage_type
0,f868e7c0626b11f0845fd4d8534edb2f,2025-07-16 17:40:31.719633+00:00,[],test-1,,1.13.9,weave-demo-pantry\test-1\f868e7c0626b11f0845fd...,LocalFileSystem


Exporting the index as a dataframe allows us to easily access information about our pantry and baskets.

In [6]:
pantry1_df = pantry1.index.to_pandas_df()
pantry1_df

Unnamed: 0,uuid,upload_time,parent_uuids,basket_type,label,weave_version,address,storage_type
0,f868e7c0626b11f0845fd4d8534edb2f,2025-07-16 17:40:31.719633+00:00,[],test-1,,1.13.9,weave-demo-pantry\test-1\f868e7c0626b11f0845fd...,LocalFileSystem


Having our pantry index catalog in a dataframe allows us to use standard pandas syntax to access information in the dataframe, like the UUID below.

In [7]:
pantry1_df['uuid'][0]

'f868e7c0626b11f0845fd4d8534edb2f'

## Accessing Basket data

Weave handles much of its data provenance tracking through the creation of baskets. A basket is meant to represent an atomic data product. It can contain whatever a user wishes to put in the basket, but it's intended purpose is to hold a single instance of one type of data, be it an image, video, text file, or curated training set. A basket in its entirety contains the actual data files specified by the user along with the supplemental files that Weave creates. These supplemental files contain data integrity information, arbitrary metadata specified by the user, and lineage artifacts. Baskets are created at their time of upload and uploaded in an organized state to the data store. Parent UUIDS are the UUIDS Basket(s) that created the current basket.

Next, we will demonstrate how to access specific basket data.

In [8]:
basket = pantry1.get_basket(pantry1_df['uuid'][0])
basket

<weave.basket.Basket at 0x275c1d05a90>

The manifest contains a concise description of the basket in dictionary form.

In [9]:
basket.get_manifest()

{'uuid': 'f868e7c0626b11f0845fd4d8534edb2f',
 'upload_time': '2025-07-16T17:40:31.719633+00:00',
 'parent_uuids': [],
 'basket_type': 'test-1',
 'label': '',
 'weave_version': '1.13.9'}

The supplement data gives extended details of basket contents, including integrity data.

In [10]:
basket.get_supplement()

{'integrity_data': [{'file_size': 46,
   'hash': 'd7c3ccaccf38fb503bff57510e470eae1501be661e4a97d3a2007686cf1f9d40',
   'access_date': '2025-07-16T17:40:31.719075+00:00',
   'source_path': 'WeaveDemoText.txt',
   'byte_count': 100000000,
   'stub': False,
   'upload_path': 'weave-demo-pantry\\test-1\\f868e7c0626b11f0845fd4d8534edb2f\\WeaveDemoText.txt'}],
 'upload_items': [{'path': 'WeaveDemoText.txt', 'stub': False}]}

Now let's access the metadata for our basket. Metadata is data the user may add when uploading a basket to a pantry.

In [11]:
basket.get_metadata()

{'Data Type': 'text'}

Much like the linux ls command, Weave's ls lists files and directories within the file system.

In [12]:
basket_contents = basket.ls()
basket_contents

['c:/Users/carso/OneDrive/Documents/daily/weave/weave-demo-pantry/test-1/f868e7c0626b11f0845fd4d8534edb2f/WeaveDemoText.txt']

## Data Provenance

Next, we will demonstrate how data provenance works with different baskets

In [13]:
pantry1.index.get_parents(pantry1_df['uuid'][0])

There are currently no parents or children associated with our pantry, just as we expected. Let's create some parents/children and then check this functionality out again. Notice the new basket is a new type and we supply a parent UUID to indicate it is derived from the previous basket.

*Note: The parent to child relationship can be a many to many relationship*

In [14]:
pantry1.upload_basket(upload_items=[{'path':'WeaveDemoText.txt', 'stub':False}], basket_type="test-2", parent_ids=[pantry1_df['uuid'][0]])

Unnamed: 0,uuid,upload_time,parent_uuids,basket_type,label,weave_version,address,storage_type
0,f879e47a626b11f0acc9d4d8534edb2f,2025-07-16 17:40:31.825584+00:00,[f868e7c0626b11f0845fd4d8534edb2f],test-2,,1.13.9,weave-demo-pantry\test-2\f879e47a626b11f0acc9d...,LocalFileSystem


Below we can see that we have a new basket whose parent is the first basket that we created. 

In [15]:
pantry1_df = pantry1.index.to_pandas_df()
pantry1_df

Unnamed: 0,uuid,upload_time,parent_uuids,basket_type,label,weave_version,address,storage_type
0,f868e7c0626b11f0845fd4d8534edb2f,2025-07-16 17:40:31.719633+00:00,[],test-1,,1.13.9,weave-demo-pantry\test-1\f868e7c0626b11f0845fd...,LocalFileSystem
1,f879e47a626b11f0acc9d4d8534edb2f,2025-07-16 17:40:31.825584+00:00,[f868e7c0626b11f0845fd4d8534edb2f],test-2,,1.13.9,weave-demo-pantry\test-2\f879e47a626b11f0acc9d...,LocalFileSystem


We can quickly get the parent/children of our basket using get_children() and get_parent() passing in the appropriate UUID

In [16]:
pantry1.index.get_children(pantry1_df['uuid'][0])

Unnamed: 0,uuid,upload_time,parent_uuids,basket_type,label,weave_version,address,storage_type,generation_level
0,f879e47a626b11f0acc9d4d8534edb2f,2025-07-16 17:40:31.825584+00:00,[f868e7c0626b11f0845fd4d8534edb2f],test-2,,1.13.9,weave-demo-pantry\test-2\f879e47a626b11f0acc9d...,LocalFileSystem,-1


In [17]:
pantry1.index.get_parents(pantry1_df['uuid'][1])

Unnamed: 0,uuid,upload_time,parent_uuids,basket_type,label,weave_version,address,storage_type,generation_level
0,f868e7c0626b11f0845fd4d8534edb2f,2025-07-16 17:40:31.719633+00:00,[],test-1,,1.13.9,weave-demo-pantry\test-1\f868e7c0626b11f0845fd...,LocalFileSystem,1


## Generating an index using the SQLite backend
When creating a pantry using SQLite the index is represented as a SQLite object. For this pantry we will using SQLite for the index backend. An Index is an object or file that tracks the baskets in a pantry. Weave supports SQLite backend for larger pantries, and users can implement their own backend according to their needs.

In [18]:
pantry2 = Pantry(IndexSQLite, pantry_path=pantry_name, file_system=local_fs)
pantry2.index.generate_index()

Now that we have created the index let's take a look at a dataframe containing our baskets.  

In [19]:
pantry2_df = pantry2.index.to_pandas_df()
pantry2_df

Unnamed: 0,uuid,upload_time,parent_uuids,basket_type,label,weave_version,address,storage_type
0,f868e7c0626b11f0845fd4d8534edb2f,2025-07-16 17:40:31,[],test-1,,1.13.9,weave-demo-pantry\test-1\f868e7c0626b11f0845fd...,LocalFileSystem
1,f879e47a626b11f0acc9d4d8534edb2f,2025-07-16 17:40:31,[f868e7c0626b11f0845fd4d8534edb2f],test-2,,1.13.9,weave-demo-pantry\test-2\f879e47a626b11f0acc9d...,LocalFileSystem


## Validating a Pantry

Weave can validate an existing directory is a valid pantry following the Weave schema:

In [20]:
warnings = validate.validate_pantry(pantry1)
# Or validate using the pantry object.
pantry1.validate()

[]

Since all the basket data is present we return an empty list. IF the basket manifest is deleted, then the list will contain a warning. Let's see this in action.

In [21]:
local_fs.rm(os.path.join('weave-demo-pantry','test-1',str(pantry1_df['uuid'][0]),'basket_manifest.json'))

In [22]:
pantry1.validate()



Finally, lets clean up and remove the pantrys from our local file system

In [23]:
local_fs.rm("weave-demo-pantry", recursive=True)
pantry2.index.drop_index()
os.remove("WeaveDemoText.txt")