Skip to content

kallewesterling/process-tags

Repository files navigation

TAGS

Package used for processing TAGS documents downloaded as tab-separated files.

Setting up a simple document

tags = TAGS.Document(path="./datasets/downloaded_tags_document.tsv")

Setting up a TAGS DocumentSet

If you need to ingest more than one file or perhaps one or more directories into one dataset, you can do so using the DocumentSet object.

If you would only like to include a list of documents, you can do so by using the paths parameter:

tags = TAGS.DocumentSet(paths=["./datasets/downloaded_tags_document.tsv", "./datasets/another_downloaded_tags_document.tsv"])

If you woud rather want to include any number of directories, you can do so using the directories parameter:

tags = TAGS.DocumentSet(directories=["./datasets/", "./another_dataset_folder/"])

Note that if you are including directories, make sure that there are no other .tsv files in the directories added. If there are, the script will likely crash.

Note that you can also combine paths and directories to ingest anything you'd wish into your dataset.

suppress_warnings

There is one more parameter that you can provide to the constructor for both TAGS.Document and TAGS.DocumentSet: suppress_warnings. It must be a booleans (True or False) nd it is by default turned to False, thus generating warnings as you ingest your dataset.

The following two examples will turn it off:

tags = TAGS.Document(path="./datasets/downloaded_tags_document.tsv", suppress_warnings=True)
multiple_tags = TAGS.DocumentSet(paths=["./datasets/folder_1/", "./datasets/folder_2/"], suppress_warnings=True)

Properties and methods

1. All IDs

Both the TAGS.Document and the TAGS.DocumentSet objects have a property that contains a list of all IDs in the file/s in the object for easy processing:

tags.ids

2. Get data for a specific document

A TAGS.Document object can also retrieve data for a specific ID from the file using the get_data_for_id method: (if no data, returns None)

test_id = 1156639282024464385

tags.get_data_for_id(test_id) # get all data for an ID
tags.get_data_for_id(test_id, 'text') # get specific data for an ID

Unfortunately, the TAGS.DocumentSet does not currently include such a method.

About

Package used for processing TAGS documents downloaded as tab-separated files.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published