Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHOENIX-4981 Add tests for ORDER BY, GROUP BY and salted tables using… #402

Closed
wants to merge 1 commit into from

Conversation

twdsilva
Copy link
Contributor

… phoenix-spark]

@karanmehta93 thanks for the review nice catch, I modified the SparkContext variable to be volatile.
@ChinmaySKulkarni can you please review? I refactored the AggregateIT, OrderByIT and SaltedIT so that it can be used to run queries using phoenix-spark. I created Base*IT based on these tests with two subclasses (one for phoenix , one for phoenix-spark). I added a QueryBuilder to generate a SQL query that is used to setup the spark sql query. I also added SparkResultSet that implements the JDBC interface so that the existing tests can be reused without much changes.

@twdsilva
Copy link
Contributor Author

I also had to bump up the spark version to 2.3.2 as this version has more sql support, in order to get the tests to pass.

import static org.junit.Assert.assertEquals;
import static org.junit.Assert.assertFalse;
import static org.junit.Assert.assertTrue;
import static org.junit.Assert.fail;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Looks like this diff is just due to reordering imports. Please refactor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

private HintNode.Hint hint;
private boolean escapeCols;
private boolean distinct;
private int limit;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make all member variables final and have the builder's build method set all values in a private constructor so that we follow the builder pattern more closely?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO I don't think that necessary as we don't have an object that represents a Query, the build() just returns a a string.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok makes sense

List<KeyRange> splits = TestUtil.getAllSplits(conn, tableName);
assertEquals(nGuidePosts, splits.size());
}

@Test
public void testGroupByWithAliasWithSameColumnName() throws SQLException {
Copy link
Contributor

@ChinmaySKulkarni ChinmaySKulkarni Oct 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this test case not applicable to phoenix-spark?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The query test joins. QueryBuilder currently doesn't have support to generate a join query over two tables.

@@ -487,6 +487,16 @@
<testSourceDirectory>src/it/scala</testSourceDirectory>
<testResources><testResource><directory>src/it/resources</directory></testResource></testResources>
<plugins>
<!--
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this commented section

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had commented this by mistake, fixed now.

catch(Exception e) {
assertTrue(e.getMessage().contains(expectedPhoenixExceptionMsg));
}
return rs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we ever want code to reach here? Or do we want to Assert.fail if the exception doesn't occur?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should fail if an exception is not thrown, I fixed this.

sql="select count(*) from "+intTableName;
QueryBuilder queryBuilder = new QueryBuilder();
queryBuilder.setSelectExpression("COUNT(*)");
queryBuilder.setFullTableName(intTableName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can instead do method chaining here since you have a fluent interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

assertEquals(1, rs.getLong(1));

sql="select count(*) from "+intTableName + " where b.colb is null";
queryBuilder.setWhereClause("`B.COLB` IS NULL");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the only difference between this test and the one in phoenix-core is the backticks provided to querybuilder setter methods. I'm guessing this is a result of how SparkUtil executes queries. Please correct me if I'm wrong, but if not, can we further reuse the code from the phoenix-core tests instead of having more duplication?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a column name contains a dot that spark sql requires the backticks. Automatically generating the sql for this is difficult especially when columns as part of expressions etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok let's leave it the way it is for now.

String expectedPhoenixExceptionMsg, String expectedSparkExceptionMsg) {
ResultSet rs = null;
try {
rs = executeQuery(conn, queryBuilder);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar question/comment about Assert.fail here as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

Dataset phoenixDataSet =
new PhoenixRDD(SparkUtil.getSparkContext(), tableName1,
JavaConverters.collectionAsScalaIterableConverter(table1Columns)
.asScala().toSeq(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some comments here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

import java.util.Map;

/**
* Helper class to convert a List of Rows returns from a dataset to a sql ResultSet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 'returned'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.


public static ResultSet executeQuery(Connection conn, QueryBuilder queryBuilder, String url, Configuration config)
throws SQLException {
SQLContext sqlContext = new SQLContext(SparkUtil.getSparkContext());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks as though SQLContext is deprecated. Quoting: Deprecated. Use SparkSession.builder instead. Since 2.0.0.

@ChinmaySKulkarni
Copy link
Contributor

@twdsilva some comments and questions. Overall looks really good!

@twdsilva twdsilva force-pushed the PHOENIX-4981 branch 2 times, most recently from 49cf9e8 to 565b157 Compare November 1, 2018 00:34
@twdsilva
Copy link
Contributor Author

twdsilva commented Nov 1, 2018

@ChinmaySKulkarni Thanks for the review, I have updated the PR please take a look.

@ChinmaySKulkarni
Copy link
Contributor

@twdsilva changes look good. Do we want to continue using SQLContext? Documentation says that it is deprecated and that SparkSession.builder should be used instead (since spark 2.0.0).

@twdsilva
Copy link
Contributor Author

twdsilva commented Nov 1, 2018

@ChinmaySKulkarni I removed the use of the deprecated SqlContext constructor, please take a look.

stmt.execute();
conn.commit();

// create two PhoenixRDDs using the table namea and columns that are required for the JOIN query
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: typo

}

public static SQLContext getSqlContext() {
return SparkSession.builder().appName("Java Spark Tests").master("local[2]")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Extract all these strings as static final member variables

@ChinmaySKulkarni
Copy link
Contributor

@twdsilva A couple of minor nits, otherwise looks good to go. Thanks!

@twdsilva
Copy link
Contributor Author

twdsilva commented Nov 1, 2018

Thank @ChinmaySKulkarni , fixed the nits, will get this committed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants