Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

APEXMALHAR-2417 Adding Pojo Outer join accumulation #568

Merged

Conversation

KapoorHitesh
Copy link
Contributor

Copy link
Contributor

@chinmaykolhatkar chinmaykolhatkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make all the files evolving.

* Join Accumulation for Pojo Streams.
*
*/
public abstract class AbstractPojoJoin<InputT1, InputT2>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make this @evolving

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the classes except PojoInnerJoin needs to be marked as Evolving. This is not done completely yet.

Copy link
Contributor

@chinmaykolhatkar chinmaykolhatkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take care of comments. Otherwise, looks good.


public AbstractPojoJoin(Class<?> outClass, String... keys)
{
if (keys.length % 2 != 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why %2? How should user know in which orders the keys should be mentioned... Suggesting to follow either of the 2 approaches:

  1. 2 parameters to contructor with 2 different keyExpressions.
  2. 2 parameters to constructor with 2 string[].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Chinmay for this input, I will create a separate Jira under APEXMALHAR-2413 for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Jira does not capture it. Moreover, please close this Jira and create a seperate Jira for performance improvement of PojoJoins..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{
protected final String[] keys;
protected final Class<?> outClass;
private transient List<KeyValPair<String,PojoUtils.Getter>> gettersStream1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use Map instead of List? for all the 3 variables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

private transient List<KeyValPair<String,PojoUtils.Getter>> gettersStream1;
private transient List<KeyValPair<String,PojoUtils.Getter>> gettersStream2;
private transient List<KeyValPair<String,PojoUtils.Setter>> setters;
public transient Set<String> keySetStream2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can keySetStream1 & keySetStream2 be private variables?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually you can make it protected..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

{
// TODO: If a stream never sends out any tuple during one window, a wrong key would not be detected.

input.getClass().getDeclaredField(keys[index]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the purpose of this line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

public List<?> getOutput(List<List<Map<String, Object>>> accumulatedValue)
{
// List<Map<String, Object>> result = new ArrayList<>();
if (setters == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minor knit... Just to be modular...
if (seters == null) {
// create setters
}

if (keySetStream1 == null) {
// populate set.
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will call additional if condition everytime.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

*/
public class PojoInnerJoin<InputT1, InputT2>
implements MergeAccumulation<InputT1, InputT2, List<List<Map<String, Object>>>, List<?>>
extends AbstractPojoJoin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AbstractPojoJoin<InputT1, InputT2>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly for other classes as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

*
*/
public abstract class AbstractPojoJoin<InputT1, InputT2>
implements MergeAccumulation<InputT1, InputT2, List<List<Map<String, Object>>>, List<?>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a suggest here in terms of how accumulation data structure is created... Instead of having a list of map... we can have List<MultiMap<Object, Object>>.. The key to multimap is a unique object generated for set of keys configured... and value of multimap can contain more than one object for the same key.

This way, while making the comparison, it becomes straightforward how to get the matching pairs.
And I believe it'll reduce the cost of iterating over the various maps in place.

Just something to consider later on for improvsation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Chinmay for this input, I will create a separate Jira under APEXMALHAR-2413 for this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Jira does not capture it. Moreover, please close this Jira and create a seperate Jira for performance improvement of PojoJoins..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

@Override
public Map<String, Object> joinTwoMapsWithKeys(Map map1, Map map2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method seems to be doing the same thing as the one present in super class. Remove this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@KapoorHitesh
Copy link
Contributor Author

@chinmaykolhatkar have made the changes please review and merge.

@chinmaykolhatkar
Copy link
Contributor

@Hitesh-Scorpio Some changes like creation of right Jira and Making the classes Evolving not done.
Please take care of it.
Also please squash and rebase.

I'll merge once this is done.

@KapoorHitesh KapoorHitesh force-pushed the APEXMALHAR-2417_NewOuterJoinAccumulation branch 2 times, most recently from 98e13a8 to a634e64 Compare March 2, 2017 07:09
@KapoorHitesh
Copy link
Contributor Author

@chinmaykolhatkar have created the suggested Jira's have squashed the comments and rebased.

@KapoorHitesh KapoorHitesh force-pushed the APEXMALHAR-2417_NewOuterJoinAccumulation branch from 6d9ae0a to 4b36bf3 Compare March 2, 2017 10:02
@asfgit asfgit merged commit 4b36bf3 into apache:master Mar 2, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
3 participants