-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor datasqueeze code structure and namespace as well as add publishing to maven central #4
Conversation
Please update the readme to include all the changes. From what I see there are more changes than just publishing to maven central |
Can we add the DataSqueeze logo commit from our internal code base for readMe as a part of this pull request? |
pom.xml
Outdated
<repository> | ||
<id>central</id> | ||
<name>public-maven-virtual</name> | ||
<url>https://artylab.expedia.biz:443/public-maven-virtual</url> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is internal to Expedia.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Thanks.
The branch fails to build locally |
It builds on my local from both IntelliJ and maven command line. |
README.md
Outdated
|
||
## Documentation | ||
This README is intended to provide detailed technical documentation for advanced users. | ||
|
||
## Changes since last release | ||
|
||
* Edided the pom file for publishing this project's artifacts to the Maven Central |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type - Edited
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much.
README.md
Outdated
|
||
1. CLI - | ||
a. For TEXT/ORC/SEQ | ||
```java | ||
hadoop jar dataSqueeze-manager-1.0-SNAPSHOT.jar com.expedia.edw.data.squeeze.Utility | ||
hadoop jar Datasqueeze.jar com.expedia.dsp.data.squeeze.Utility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo - "datasqueeze"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch. Thanks so much.
README.md
Outdated
-sp s3a://edwprod/user/ysontakke/compactiontest1/ -tp s3a://edwprod/user/ysontakke/compactionoutput_text_yash_1/ | ||
-threshold 12345 | ||
``` | ||
|
||
b. For AVRO | ||
```java | ||
hadoop jar dataSqueeze-manager-1.0-SNAPSHOT.jar com.expedia.edw.data.squeeze.Utility | ||
hadoop jar Datasqueeze.jar com.expedia.dsp.data.squeeze.Utility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo - "datasqueeze"
README.md
Outdated
@@ -87,22 +95,22 @@ There are two different ways of running DataSqueeze: | |||
* fileType - Type of file to be compacted (AVRO / TEXT / SEQ / ORC). It is mandatory for AVRO | |||
* schemaPath - schema used for compaction (mandatory for AVRO) | |||
|
|||
2. API - [CompactionManager](dataSqueeze-manager/src/main/java/com/expedia/edw/data/squeeze/CompactionManager.java) | |||
2. API - [CompactionManager](Datasqueeze/src/main/java/com/expedia/dsp/data/squeeze/CompactionManager.java) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo - "datasqueeze"
pom.xml
Outdated
<apache.commons.io.version>1.3.2</apache.commons.io.version> | ||
<aws.java.sdk.version>1.11.98</aws.java.sdk.version> | ||
|
||
<commons-cli.version>1.2</commons-cli.version> | ||
<commons.io.version>2.4</commons.io.version> | ||
<commons.lang3.version>3.0</commons.lang3.version> | ||
|
||
<hadoop.version>2.7.4</hadoop.version> | ||
<!-- Use 2.7.4 for github--> | ||
<hadoop.version>2.7.1.2.4.3.0-227</hadoop.version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert this change
pom.xml
Outdated
|
||
<version>2.0-SNAPSHOT</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to move right below artifactId
README.md
Outdated
|
||
|
||
a. Fetch the file paths to be compacted from the source path provided. | ||
b. Perform mapreduce job using the following configuration | ||
1. Mapper maps records together based on same parent directory and emits parent directory as key. | ||
2. Reducer reduces records based on same key but writes data to the target directory provided by the user, retaining | ||
the directory structure. | ||
2. Reducer reduces records based on same key but writes data to the target directory provided by the user, retaining the directory structure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Format looks off
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the format looks better now than before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer not adding this in the commit range if the text is not updated.
Essentially, keep the master copy for this as is
Update Pull request description. Looks good otherwise |
can you also update readMe to say where the jars can be found |
This will be easier to update after we actually publish the 2.0 release. So I would suggest we wait on this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
No description provided.