Data Acquisition Service - DAS

DAS initiates and manages the operations of downloading and parsing data sets. It is a Spring Boot application build by maven

Required libraries

Following libraries are necessary to successfully build DAS:

metadata-parser

Required services

DAS requires following services to function properly:

Downloader
Metadata parser
User management

Running DAS locally - demo

To run and test DAS locally you need to install and run User management, Downloader and Metadata parser services. Moreover, you need to publish Metadata parser in version defined in DAS pom.xml to your local maven repository.

User Management

pull user-management from git repository
run service from command line with the following parameters:

mvn spring-boot:run -Dspring.cloud.propertiesFile=spring-cloud.properties

Downloader

pull downloader from a git repository
run service from command line with following parameters:

DOWNLOADS_DIR=/tmp SERVER_PORT=8090 mvn clean spring-boot:run

Metadata parser

pull metadata-parser from a git repository
publish artifact to local repository

mvn install

run service from command line with the following parameters:

DOWNLOADS_DIR=/tmp mvn spring-boot:run

DAS

pull data-acquisition from a git repository
run service from command line with the following parameters:

DOWNLOADER_URL="http://localhost:8090" DOWNLOADS_DIR=/tmp mvn clean test spring-boot:run

where:

DOWNLOADER_URL - we need to turn Downloader into valid CF service and it should make it easy to connect
DOWNLOADS_DIR - this is a folder where object store will put downloaded content. It needs to be shared between DAS and Downloader

To run a simple demo use script ./tools/curl.sh : ./curl.sh <das_app_url> <data_set_uri> <oauth_token> (for data_set_uri - only http/s is implemented)

EXAMPLE:

./curl.sh localhost:8080 https://www.quandl.com/api/v1/datasets/BCHARTS/BITSTAMPUSD.csv "`cf oauth-token | grep bearer`" <organisation's UUID>

This will download requested csv file and save it to memory into /tmp directory

You might see in Metadata parser's logs following exception: ResourceAccessException: I/O error on PUT request for "http://localhost:5000/rest/datasets/(id)":Connection refused; To get rid of that exception you need to run Data catalog service along with Elasticsearch

Deployment

There are two manifest files.

manifest.yml uses queues in memory
manifest-kafka.yml ueses kafka queues

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
license		license
src		src
tools		tools
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

license

license

src

src

tools

tools

.gitignore

.gitignore

LICENSE.txt

LICENSE.txt

README.md

README.md

pom.xml

pom.xml

Repository files navigation

Data Acquisition Service - DAS

Required libraries

Required services

Running DAS locally - demo

User Management

Downloader

Metadata parser

DAS

EXAMPLE:

Deployment

About

Releases

Packages

Languages

License

codeaudit/data-acquisition

Folders and files

Latest commit

History

Repository files navigation

Data Acquisition Service - DAS

Required libraries

Required services

Running DAS locally - demo

User Management

Downloader

Metadata parser

DAS

EXAMPLE:

Deployment

About

Resources

License

Stars

Watchers

Forks

Languages