TAGS

Package used for processing TAGS documents downloaded as tab-separated files.

Setting up a simple document

tags = TAGS.Document(path="./datasets/downloaded_tags_document.tsv")

Setting up a TAGS DocumentSet

If you need to ingest more than one file or perhaps one or more directories into one dataset, you can do so using the DocumentSet object.

If you would only like to include a list of documents, you can do so by using the paths parameter:

tags = TAGS.DocumentSet(paths=["./datasets/downloaded_tags_document.tsv", "./datasets/another_downloaded_tags_document.tsv"])

If you woud rather want to include any number of directories, you can do so using the directories parameter:

tags = TAGS.DocumentSet(directories=["./datasets/", "./another_dataset_folder/"])

Note that if you are including directories, make sure that there are no other .tsv files in the directories added. If there are, the script will likely crash.

Note that you can also combine paths and directories to ingest anything you'd wish into your dataset.

`suppress_warnings`

There is one more parameter that you can provide to the constructor for both TAGS.Document and TAGS.DocumentSet: suppress_warnings. It must be a booleans (True or False) nd it is by default turned to False, thus generating warnings as you ingest your dataset.

The following two examples will turn it off:

tags = TAGS.Document(path="./datasets/downloaded_tags_document.tsv", suppress_warnings=True)
multiple_tags = TAGS.DocumentSet(paths=["./datasets/folder_1/", "./datasets/folder_2/"], suppress_warnings=True)

Properties and methods

1. All IDs

Both the TAGS.Document and the TAGS.DocumentSet objects have a property that contains a list of all IDs in the file/s in the object for easy processing:

tags.ids

2. Get data for a specific document

A TAGS.Document object can also retrieve data for a specific ID from the file using the get_data_for_id method: (if no data, returns None)

test_id = 1156639282024464385

tags.get_data_for_id(test_id) # get all data for an ID
tags.get_data_for_id(test_id, 'text') # get specific data for an ID

Unfortunately, the TAGS.DocumentSet does not currently include such a method.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
configuration		configuration
data		data
locations		locations
.gitignore		.gitignore
How this download works.dio		How this download works.dio
ProcessTweets.py		ProcessTweets.py
README.md		README.md
TAGS.py		TAGS.py
boylesque-dataset.csv		boylesque-dataset.csv
boylesque-geodata-october-2019.csv		boylesque-geodata-october-2019.csv
current workbook (2).ipynb		current workbook (2).ipynb
current workbook.ipynb		current workbook.ipynb
expand-tags.ipynb		expand-tags.ipynb
first attempt at network analysis.ipynb		first attempt at network analysis.ipynb
install.sh		install.sh
locations.txt		locations.txt
preparation-scripts.ipynb		preparation-scripts.ipynb
this.png		this.png
workbook August 2019.ipynb		workbook August 2019.ipynb

kallewesterling/process-tags

Folders and files

Latest commit

History

Repository files navigation

TAGS

Setting up a simple document

Setting up a TAGS DocumentSet

suppress_warnings

Properties and methods

1. All IDs

2. Get data for a specific document

About

Topics

Resources

Stars

Watchers

Forks

Languages

`suppress_warnings`