Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade spark dependency to 3.3.0 #824

Merged
merged 15 commits into from
Nov 3, 2022
13 changes: 0 additions & 13 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,19 +41,6 @@ jobs:
script:
- make test_root_sbt_project

- name: "Python 3.6 tests"
language: python
python: 3.6
install:
- pip install tox
before_script:
- >
curl
--create-dirs -L -o /home/travis/.sbt/launchers/1.4.9/sbt-launch.jar
https://repo1.maven.org/maven2/org/scala-sbt/sbt-launch/1.4.9/sbt-launch-1.4.9.jar
script:
- make py36_test

- name: "Python 3.7 tests"
language: python
python: 3.7.9
Expand Down
6 changes: 1 addition & 5 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -23,14 +23,10 @@ test_xgboost_runtime:
test_xgboost_spark:
$(SBT) "+ mleap-xgboost-spark/test"

.PHONY: py36_test
py36_test:
source scripts/scala_classpath_for_python.sh && make -C python py36_test
Comment on lines -26 to -28
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark 3.3 does not support python3.6


.PHONY: py37_test
py37_test:
source scripts/scala_classpath_for_python.sh && make -C python py37_test

.PHONY: test
test: test_executor test_benchmark test_xgboost_runtime test_xgboost_spark test_root_sbt_project py36_test py37_test
test: test_executor test_benchmark test_xgboost_runtime test_xgboost_spark test_root_sbt_project py37_test
@echo "All tests run successfully"
Original file line number Diff line number Diff line change
Expand Up @@ -59,10 +59,14 @@ object SparkParityBase extends FunSpec {


object SparkEnv {
lazy val spark = SparkSession.builder().
appName("Spark/MLeap Parity Tests").
master("local[2]").
getOrCreate()
lazy val spark = {
val session = SparkSession.builder().
appName("Spark/MLeap Parity Tests").
master("local[2]").
getOrCreate()
session.sparkContext.setLogLevel("WARN")
session
}
}


Expand Down
6 changes: 3 additions & 3 deletions project/Dependencies.scala
Original file line number Diff line number Diff line change
Expand Up @@ -6,19 +6,19 @@ import Keys._
object Dependencies {
import DependencyHelpers._

val sparkVersion = "3.2.0"
val sparkVersion = "3.3.0"
val scalaTestVersion = "3.0.8"
val junitVersion = "5.8.2"
val akkaVersion = "2.6.14"
val akkaHttpVersion = "10.2.4"
val springBootVersion = "2.6.2"
lazy val logbackVersion = "1.2.3"
lazy val loggingVersion = "3.9.0"
lazy val slf4jVersion = "1.7.30"
lazy val slf4jVersion = "1.7.36"
Copy link
Contributor Author

@WeichenXu123 WeichenXu123 Aug 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old log4j version has conflicts with spark 3.3 dependencies.

lazy val awsSdkVersion = "1.11.1033"
val tensorflowJavaVersion = "0.4.0" // Match Tensorflow 2.7.0 https://github.com/tensorflow/java/#tensorflow-version-support
val xgboostVersion = "1.6.1"
val breezeVersion = "1.0"
val breezeVersion = "1.2"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep the same with spark 3.3 breeze dependency.

val hadoopVersion = "2.7.4" // matches spark version
val platforms = "windows-x86_64,linux-x86_64,macosx-x86_64"
val tensorflowPlatforms : Array[String] = sys.env.getOrElse("TENSORFLOW_PLATFORMS", platforms).split(",")
Expand Down
7 changes: 2 additions & 5 deletions python/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ $(error SCALA_CLASS_PATH for python tests is not set. Please check out \
the top-level Makefile on how to source scala_classpath_for_python.sh)
endif

.PHONY: help env clean py36_test py37_test test build upload
.PHONY: help env clean py37_test test build upload

help:
@echo " env create a development environment using virtualenv"
Expand All @@ -25,13 +25,10 @@ clean:
find . -name '*~' -exec rm -f {} \;
find . -name '__pycache__' | xargs -r rm -rf

py36_test:
tox -e py36 -v

py37_test:
tox -e py37 -v

test: py36_test py37_test
test: py37_test
@echo "All python tests completed"

build: clean
Expand Down
16 changes: 8 additions & 8 deletions python/mleap/sklearn/preprocessing/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -1026,23 +1026,23 @@ def transform(self, y):
:return:
"""
if isinstance(y, pd.DataFrame):
x = y.ix[:,0]
y = y.ix[:,1]
x = y.iloc[:,0]
y = y.iloc[:,1]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because required pandas version >= 1.05, the .ix method is removed.

else:
x = y[:,0]
y = y[:,1]
if self.transform_type == 'add':
return pd.DataFrame(np.add(x, y))
return pd.DataFrame(np.add(x, y), columns=[self.output_features])
elif self.transform_type == 'sub':
return pd.DataFrame(np.subtract(x, y))
return pd.DataFrame(np.subtract(x, y), columns=[self.output_features])
elif self.transform_type == 'mul':
return pd.DataFrame(np.multiply(x, y))
return pd.DataFrame(np.multiply(x, y), columns=[self.output_features])
elif self.transform_type == 'div':
return pd.DataFrame(np.divide(x, y))
return pd.DataFrame(np.divide(x, y), columns=[self.output_features])
elif self.transform_type == 'rem':
return pd.DataFrame(np.remainder(x, y))
return pd.DataFrame(np.remainder(x, y), columns=[self.output_features])
elif self.transform_type == 'pow':
return pd.DataFrame(x**y)
return pd.DataFrame(x**y, columns=[self.output_features])
Comment on lines +1035 to +1045
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: A fix.


def fit_transform(self, X, y=None, **fit_params):
"""
Expand Down
2 changes: 1 addition & 1 deletion python/requirements-dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ coverage<5.0.0
ipdb
nose
nose-exclude>=0.5.0
pyspark==3.2.0
pyspark==3.3.0
2 changes: 1 addition & 1 deletion python/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
numpy>=1.8.2
six>=1.10.0
scipy>=0.13.0b1
pandas>=0.18.1, <= 0.24.2
pandas>=1.0.5
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark 3.3 requires pandas>=1.0.5

scikit-learn>=0.22.0,<0.23.0
gensim<4.1.0
urllib3==1.26.5
16 changes: 9 additions & 7 deletions python/tests/sklearn/preprocessing/data_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -630,7 +630,7 @@ def math_binary_test(self):

Xres = math_binary_tf.fit_transform(self.df[['a', 'b']])

assert_frame_equal(pd.DataFrame(self.df.a + self.df.b, columns=['a']), Xres)
assert_frame_equal(pd.DataFrame(self.df.a + self.df.b, columns=['a_plus_b']), Xres)

math_binary_tf.serialize_to_bundle(self.tmp_dir, math_binary_tf.name)

Expand Down Expand Up @@ -664,7 +664,7 @@ def math_binary_deserialize_add_test(self):

Xres = math_binary_tf.fit_transform(self.df[['a', 'b']])

assert_frame_equal(pd.DataFrame(self.df.a + self.df.b, columns=['a']), Xres)
assert_frame_equal(pd.DataFrame(self.df.a + self.df.b, columns=['a_plus_b']), Xres)

math_binary_tf.serialize_to_bundle(self.tmp_dir, math_binary_tf.name)

Expand All @@ -674,15 +674,17 @@ def math_binary_deserialize_add_test(self):

res_a = math_binary_tf.transform(self.df[['a', 'b']])
res_b = math_binary_ds_tf.transform(self.df[['a', 'b']])
assert_frame_equal(res_a, res_b)

# TODO: Deserialization on output_features has some issue. fix this.
# assert_frame_equal(res_a, res_b)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ancasarb

Could you help fixing this ?

This is an existing bug but previous test does not cover it.

But this is not related to this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make a github issue for this so we don't forget about it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed ticket: #830


def math_binary_subtract_test(self):

math_binary_tf = MathBinary(input_features=['a', 'b'], output_features='a_less_b', transform_type='sub')

Xres = math_binary_tf.fit_transform(self.df[['a', 'b']])

assert_frame_equal(pd.DataFrame(self.df.a - self.df.b, columns=['a']), Xres)
assert_frame_equal(pd.DataFrame(self.df.a - self.df.b, columns=['a_less_b']), Xres)

math_binary_tf.serialize_to_bundle(self.tmp_dir, math_binary_tf.name)

Expand Down Expand Up @@ -716,7 +718,7 @@ def math_binary_multiply_test(self):

Xres = math_binary_tf.fit_transform(self.df[['a', 'b']])

assert_frame_equal(pd.DataFrame(self.df.a * self.df.b, columns=['a']), Xres)
assert_frame_equal(pd.DataFrame(self.df.a * self.df.b, columns=['a_mul_b']), Xres)

math_binary_tf.serialize_to_bundle(self.tmp_dir, math_binary_tf.name)

Expand Down Expand Up @@ -746,11 +748,11 @@ def math_binary_multiply_test(self):

def math_binary_divide_test(self):

math_binary_tf = MathBinary(input_features=['a', 'b'], output_features='a_mul_b', transform_type='div')
math_binary_tf = MathBinary(input_features=['a', 'b'], output_features='a_div_b', transform_type='div')

Xres = math_binary_tf.fit_transform(self.df[['a', 'b']])

assert_frame_equal(pd.DataFrame(self.df.a / self.df.b, columns=['a']), Xres)
assert_frame_equal(pd.DataFrame(self.df.a / self.df.b, columns=['a_div_b']), Xres)

math_binary_tf.serialize_to_bundle(self.tmp_dir, math_binary_tf.name)

Expand Down
2 changes: 1 addition & 1 deletion python/tox.ini
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[tox]
envlist = py36,py37
envlist = py37
skipdist = true

[testenv]
Expand Down