Permalink
Browse files

Update with information about current project status.

  • Loading branch information...
danielbwatson committed Jun 23, 2017
1 parent 2e72465 commit cea5ecef79c843db7b4f03a0b1ac07f69933477c
Showing with 10 additions and 1 deletion.
  1. +1 −1 OSSMETADATA
  2. +9 −0 README.md
@@ -1 +1 @@
osslifecycle=active
osslifecycle=maintenance
@@ -1,6 +1,15 @@
Aegisthus
=========
STATUS
------
Aegisthus has been transitioned to maintenance mode. It is still used for ETL at Netflix for Cassandra 2.x clusters,
but it will not be evolving further.
OVERVIEW
--------
A Bulk Data Pipeline out of Cassandra. Aegisthus implements a reader for the
SSTable format and provides a map/reduce program to create a compacted snapshot
of the data contained in a column family.

8 comments on commit cea5ece

@kongchen

This comment has been minimized.

kongchen replied Jul 4, 2017

Is there any reason for the project goes to maintenance mode? Or if is there any replacement for it?

@danielbwatson

This comment has been minimized.

Contributor

danielbwatson replied Jul 5, 2017

Hi @kongchen we are working on a replacement. It's written as an Apache Spark job instead of straight Map/Reduce and is CQL focused (CQL -> Spark DataFrame). We hope to start using it as a beta internally in July. Once we are relying on it we plan to release it as well. I expect that will be the end of 2017 at the earliest.

@kavink

This comment has been minimized.

kavink replied Jul 13, 2017

@danielbwatson is it possible for beta to be able to public or a spark specific development branch or only would be done as general release ?

@danielbwatson

This comment has been minimized.

Contributor

danielbwatson replied Jul 18, 2017

@kavink I am not sure about the release plan, but we do intend to share the source as soon as we are able to.

@kavink

This comment has been minimized.

kavink replied Jul 6, 2018

@danielbwatson just wanted to check if the Spark project is still active internally or its discontinued for something else ?

@danielbwatson

This comment has been minimized.

Contributor

danielbwatson replied Jul 24, 2018

Hi @kavink we are starting to roll out the Spark job that replaces Aegisthus internally. We do still plan to release the code but it won't be before we have migrated all jobs off of Aegisthus.

@andrioni

This comment has been minimized.

andrioni replied Nov 4, 2018

@danielbwatson do you know if there are any public write-ups/presentations about this Spark replacement? Thanks!

@danielbwatson

This comment has been minimized.

Contributor

danielbwatson replied Nov 8, 2018

Hi @andrioni sorry no there haven't been yet.

The extremely high level view is we take in a list of all the backup files for a C* Column Family. Compact these files to produce a set of output files with the replicas removed and all rows consolidated (all the contents of a row stitched back together instead of spread across multiple files). Then there is a second phase that reads these SSTables and converts them to Spark rows which we then write out with the standard Spark writing commands.

So pretty similar in concept to Aegisthus just a different implementation. I hope that helps.

Please sign in to comment.