Sanalyz scripts

Tooling for the Sanalyz project

ETL

An ETL pipeline for data extraction, transformation, and loading in the Sanalyz API.

Prerequisites

Python 3.8 or higher
Required Python libraries:
- pandas
- numpy
- tqdm
- requests

Install the dependencies using:

pip install -r requirements.txt

Usage

First download supported datasets in a folder. Currently, the following datasets are supported:

Ensure that the datasets are in CSV format and placed in a folder. The folder structure should look like this:

data/
├── monkeypox.csv
├── covid19.csv

The name of the files are not important, as long as they are in CSV format.

If you want to add support for new datasets, see the Adding Support for New Datasets section.

When your dataset is ready, run the ETL pipeline with the following command:

python etl <path_to_datasets_folder> <api_base_url>

For example:

python etl data https://api.sanalyz.com

This will extract data from the datasets, clean it, transform it to be ready to be loaded, and then load it into the Sanalyz API.

Adding Support for New Datasets

To add support for a new dataset:

Create a new extractor script in the etl/extractors folder.
Inherit from the Extractor base class and implement the try_extract method.
Refer to existing extractors (e.g., covid.py, mpox.py) for examples.

Once the new extractor is added, the ETL pipeline will automatically detect and use it.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
etl		etl
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sanalyz scripts

ETL

Prerequisites

Usage

Adding Support for New Datasets

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sanalyz scripts

ETL

Prerequisites

Usage

Adding Support for New Datasets

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages