Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boolean operators for DataSeries #99

Merged
merged 3 commits into from Jun 21, 2019

Conversation

@AtharvaKhare
Copy link
Contributor

commented Jun 5, 2019

Issue: #96

Tasks:

  • Boolean operators between scalars
  • Boolean operators between DataSeries

AtharvaKhare added some commits Jun 5, 2019

Added boolean operator comparision to DataSeries
`<`,`>`,`<=`,`>=` have been added for comparision of values
of two DataSeries. Keys are not considered while comparing.
@olekscode

This comment has been minimized.

Copy link
Member

commented Jun 6, 2019

Because DataSeriesTest has many tests, I grouped them using protocols and by naming them in such way that all related tests have common prefix and are shown one after another in the browser.

For example, all arithmetic tests are called testArithmetics....

Could you rename your tests from testGreaterThan, testGreaterThanEqual to testBooleanGreaterThan, testBooleanGreaterThanEqual?

This convention is not perfect and we can change it later, but for now, let's follow it.

PS. For example, I don't know how many tests you have added. Four? Or am I missing some?

@olekscode

This comment has been minimized.

Copy link
Member

commented Jun 6, 2019

You should have one assertion per test. If you have multiple assertions, consider having multiple tests. Imagine that this test is red:

testGreaterThan

	| firstSeries secondSeries |
	
	firstSeries := DataSeries withKeys: #(a b c) values: #(1 0.4 'a').
	secondSeries := DataSeries withKeys: #(1 2 3) values: #(1 0.1 'b').
	
	self assert: firstSeries > secondSeries equals: #(false true false).
	
	firstSeries := DataSeries withKeys: #(a b c) values: #(0.8 0.4 1).
	secondSeries := DataSeries withKeys: #(a b c) values: #(true false true).
	
	self assert: firstSeries >= 0.8 equals: secondSeries.

Then we don't know if it's because comparison of series-to-series fails or is it series-to-scalar.

It is definitely better to split that test into two separate tests.

@olekscode

This comment has been minimized.

Copy link
Member

commented Jun 6, 2019

< aDataSeriesOrNumber
	"Element-wise comparision between two DataSeries.
	 Does not consider keys for comparision.
	 Returns an Array."

Why does it return an array?
Series plus series is series.
Series minus series is series.
Series < series should also be series (of boolean values).

Same logic applies to all boolean operators

@olekscode

This comment has been minimized.

Copy link
Member

commented Jun 6, 2019

(aDataSeriesOrNumber isNumber)
	ifTrue: [ ... ]
	ifFalse: [ ... ]

We should try to avoid type checking with if statements.
Try to use double-dispatch instead (let me know if yo need help with that)

Also, you don't need parenthesis around (aDataSeriesOrNumber isNumber)

@AtharvaKhare

This comment has been minimized.

Copy link
Contributor Author

commented Jun 6, 2019

Why does it return an array?

The problem I had with returning Series is what should the keys/index be? Keys of first series, or second series, or 1, 2, 3...
Like in this example:

firstSeries := DataSeries withKeys: #(a b c) values: #(1 0.4 'a').
secondSeries := DataSeries withKeys: #(1 2 3) values: #(1 0.1 'b').

What should keys of firstSeries > secondSeries be?

@olekscode

This comment has been minimized.

Copy link
Member

commented Jun 6, 2019

What should keys of firstSeries > secondSeries be?

I think that the logic there should be the same as for arithmetic operations.
Currently, we expect that the keys are the same and throw an error if they are not:

testArithmeticsAddSeriesToSeriesDifferentKeys
	| firstSeries secondSeries |
	
	firstSeries := DataSeries withKeys: #(x y z) values: #(1 2 3) name: #X.
	secondSeries := DataSeries withKeys: #(a b c) values: #(3 4 5) name: #X.
	
	self should: [ firstSeries + secondSeries ] raise: Error.

Maybe it is too restrictive (we can discuss that). But I was thinking of the use cases first. DataSeries is not a general-purpose data structure like Array or Dictionary. It only exists to model rows and columns of a DataFrame. And I can not think of a situation when you need to add (or compare) columns or rows with different keys. And if you really need that, you can convert one of them to an Array

@olekscode

This comment has been minimized.

Copy link
Member

commented Jun 6, 2019

You need to write tests for all possible scenarios of comparisons:

  1. array to series
  2. scalar to series
  3. series to array
  4. series to scalar
  5. series to series same keys and names
  6. series to series different keys
  7. series to series same keys different names

Same as for all arithmetic operations.
If I'm not mistaking, you only cover cases (4) and (6)

@olekscode

This comment has been minimized.

Copy link
Member

commented Jun 6, 2019

You have added 4 methods with same source code and only 2 symbols that change:

< aDataSeriesOrNumber
	"Element-wise comparision between two DataSeries.
	 Does not consider keys for comparision.
	 Returns an Array."

	(aDataSeriesOrNumber isNumber)
	ifTrue: [ ^ aDataSeriesOrNumber adaptToCollection: self andSend: #< ]
	ifFalse: [
		^ (1 to: self size) collect: [ :i |
			(self values at: i) < (aDataSeriesOrNumber values at: i)
			]].
	
<= aDataSeriesOrNumber
	"Element-wise comparision between two DataSeries.
	 Does not consider keys for comparision.
	 Returns an Array."

	(aDataSeriesOrNumber isNumber)
	ifTrue: [ ^ aDataSeriesOrNumber adaptToCollection: self andSend: #<= ]
	ifFalse: [
		^ (1 to: self size) collect: [ :i |
			(self values at: i) <= (aDataSeriesOrNumber values at: i)
			]].
> aDataSeriesOrNumber
	"Element-wise comparision between two DataSeries.
	 Does not consider keys for comparision.
	 Returns an Array."

	(aDataSeriesOrNumber isNumber)
	ifTrue: [ ^ aDataSeriesOrNumber adaptToCollection: self andSend: #> ]
	ifFalse: [
		^ (1 to: self size) collect: [ :i |
			(self values at: i) > (aDataSeriesOrNumber values at: i)
			]].
>= aDataSeriesOrNumber
	"Element-wise comparision between two DataSeries.
	 Does not consider keys for comparision.
	 Returns an Array."

	(aDataSeriesOrNumber isNumber)
	ifTrue: [ ^ aDataSeriesOrNumber adaptToCollection: self andSend: #>= ]
	ifFalse: [
		^ (1 to: self size) collect: [ :i |
			(self values at: i) >= (aDataSeriesOrNumber values at: i)
			]].

That's a lot of duplicated code and we should avoid it (imagine that someone changes implementation of one of these methods, will he remember to make equivalent change to other 3?)

@olekscode

This comment has been minimized.

Copy link
Member

commented Jun 6, 2019

I suggest the following implementation:

DataSeries >> < arg
	^ arg adaptToCollection: self andSend: #<.
DataSeries >> > arg
	^ arg adaptToCollection: self andSend: #>.
DataSeries >> <= arg
	^ arg adaptToCollection: self andSend: #<=.
DataSeries >> >= arg
	^ arg adaptToCollection: self andSend: #>=.

It fixes all the problems mentioned above

Simplified DataSeries boolean operators
Refactored and added more tests
@AtharvaKhare

This comment has been minimized.

Copy link
Contributor Author

commented Jun 14, 2019

Yes, it fixes the problems! :) I did try it early and it did not work, I must have been using it in a wrong example.

I have made changes as suggested. array with series does not work, since array does not understand boolean operators > < >= <=.

@olekscode olekscode merged commit 3196e32 into PolyMathOrg:master Jun 21, 2019

3 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.4%) to 90.685%
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.