Skip to content

MoonsetJS/Moonset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

82 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

moonset

Moonset

Moonset is a data processing framework on top of AWS. It provides both batch processing and stream processing. Try command:

npx moonset --help

For example, to run some hive tasks on AWS EMR. Try to following command.

# config the credentials
npx moonset config

# run a job
npx moonset run \
    --plugin '@moonset/plugin-platform-emr'  \
    --plugin '@moonset/plugin-data-glue' \
    --job '{
    "input": [{
        "glue": { "db": "foo", "table": "apple", "partition": {"region_id": "1", "snapshot_date": "2020-01-01"}}
    }],
    "task": [{
        "hive": {"sql": "insert overwrite table foo.pineapple partition (region_id=1, snapshot_date=\"2020-01-01\") select foo from foo.apple;"}
    }],
    "output": [{
        "glue": { "db": "foo", "table": "pineapple", "partition": {"region_id": "1", "snapshot_date": "2020-01-01"}}
    }]
}'

All resources are managed by AWS CDK so there is minimum effort for infrastructure setup. You can run it in a brand new account.

The EMR is created in a VPC's private subnet, You can connect to both master and slave nodes via AWS Session Manager. No ssh key pair or bastion is needed. Follow this guide to start a session to connect to EMR's instances.

License

Moonset is distributed under the Apache License, Version 2.0.

See LICENSE for more information.