Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boolean operators for DataSeries #99

Merged
merged 3 commits into from Jun 21, 2019

Conversation

@AtharvaKhare
Copy link
Contributor

@AtharvaKhare AtharvaKhare commented Jun 5, 2019

Issue: #96

Tasks:

  • Boolean operators between scalars
  • Boolean operators between DataSeries
AtharvaKhare added 2 commits Jun 5, 2019
`<`,`>`,`<=`,`>=` have been added for comparision of values
of two DataSeries. Keys are not considered while comparing.
@olekscode
Copy link
Member

@olekscode olekscode commented Jun 6, 2019

Because DataSeriesTest has many tests, I grouped them using protocols and by naming them in such way that all related tests have common prefix and are shown one after another in the browser.

For example, all arithmetic tests are called testArithmetics....

Could you rename your tests from testGreaterThan, testGreaterThanEqual to testBooleanGreaterThan, testBooleanGreaterThanEqual?

This convention is not perfect and we can change it later, but for now, let's follow it.

PS. For example, I don't know how many tests you have added. Four? Or am I missing some?

Loading

@olekscode
Copy link
Member

@olekscode olekscode commented Jun 6, 2019

You should have one assertion per test. If you have multiple assertions, consider having multiple tests. Imagine that this test is red:

testGreaterThan

	| firstSeries secondSeries |
	
	firstSeries := DataSeries withKeys: #(a b c) values: #(1 0.4 'a').
	secondSeries := DataSeries withKeys: #(1 2 3) values: #(1 0.1 'b').
	
	self assert: firstSeries > secondSeries equals: #(false true false).
	
	firstSeries := DataSeries withKeys: #(a b c) values: #(0.8 0.4 1).
	secondSeries := DataSeries withKeys: #(a b c) values: #(true false true).
	
	self assert: firstSeries >= 0.8 equals: secondSeries.

Then we don't know if it's because comparison of series-to-series fails or is it series-to-scalar.

It is definitely better to split that test into two separate tests.

Loading

@olekscode
Copy link
Member

@olekscode olekscode commented Jun 6, 2019

< aDataSeriesOrNumber
	"Element-wise comparision between two DataSeries.
	 Does not consider keys for comparision.
	 Returns an Array."

Why does it return an array?
Series plus series is series.
Series minus series is series.
Series < series should also be series (of boolean values).

Same logic applies to all boolean operators

Loading

@olekscode
Copy link
Member

@olekscode olekscode commented Jun 6, 2019

(aDataSeriesOrNumber isNumber)
	ifTrue: [ ... ]
	ifFalse: [ ... ]

We should try to avoid type checking with if statements.
Try to use double-dispatch instead (let me know if yo need help with that)

Also, you don't need parenthesis around (aDataSeriesOrNumber isNumber)

Loading

@AtharvaKhare
Copy link
Contributor Author

@AtharvaKhare AtharvaKhare commented Jun 6, 2019

Why does it return an array?

The problem I had with returning Series is what should the keys/index be? Keys of first series, or second series, or 1, 2, 3...
Like in this example:

firstSeries := DataSeries withKeys: #(a b c) values: #(1 0.4 'a').
secondSeries := DataSeries withKeys: #(1 2 3) values: #(1 0.1 'b').

What should keys of firstSeries > secondSeries be?

Loading

@olekscode
Copy link
Member

@olekscode olekscode commented Jun 6, 2019

What should keys of firstSeries > secondSeries be?

I think that the logic there should be the same as for arithmetic operations.
Currently, we expect that the keys are the same and throw an error if they are not:

testArithmeticsAddSeriesToSeriesDifferentKeys
	| firstSeries secondSeries |
	
	firstSeries := DataSeries withKeys: #(x y z) values: #(1 2 3) name: #X.
	secondSeries := DataSeries withKeys: #(a b c) values: #(3 4 5) name: #X.
	
	self should: [ firstSeries + secondSeries ] raise: Error.

Maybe it is too restrictive (we can discuss that). But I was thinking of the use cases first. DataSeries is not a general-purpose data structure like Array or Dictionary. It only exists to model rows and columns of a DataFrame. And I can not think of a situation when you need to add (or compare) columns or rows with different keys. And if you really need that, you can convert one of them to an Array

Loading

@olekscode
Copy link
Member

@olekscode olekscode commented Jun 6, 2019

You need to write tests for all possible scenarios of comparisons:

  1. array to series
  2. scalar to series
  3. series to array
  4. series to scalar
  5. series to series same keys and names
  6. series to series different keys
  7. series to series same keys different names

Same as for all arithmetic operations.
If I'm not mistaking, you only cover cases (4) and (6)

Loading

@olekscode
Copy link
Member

@olekscode olekscode commented Jun 6, 2019

You have added 4 methods with same source code and only 2 symbols that change:

< aDataSeriesOrNumber
	"Element-wise comparision between two DataSeries.
	 Does not consider keys for comparision.
	 Returns an Array."

	(aDataSeriesOrNumber isNumber)
	ifTrue: [ ^ aDataSeriesOrNumber adaptToCollection: self andSend: #< ]
	ifFalse: [
		^ (1 to: self size) collect: [ :i |
			(self values at: i) < (aDataSeriesOrNumber values at: i)
			]].
	
<= aDataSeriesOrNumber
	"Element-wise comparision between two DataSeries.
	 Does not consider keys for comparision.
	 Returns an Array."

	(aDataSeriesOrNumber isNumber)
	ifTrue: [ ^ aDataSeriesOrNumber adaptToCollection: self andSend: #<= ]
	ifFalse: [
		^ (1 to: self size) collect: [ :i |
			(self values at: i) <= (aDataSeriesOrNumber values at: i)
			]].
> aDataSeriesOrNumber
	"Element-wise comparision between two DataSeries.
	 Does not consider keys for comparision.
	 Returns an Array."

	(aDataSeriesOrNumber isNumber)
	ifTrue: [ ^ aDataSeriesOrNumber adaptToCollection: self andSend: #> ]
	ifFalse: [
		^ (1 to: self size) collect: [ :i |
			(self values at: i) > (aDataSeriesOrNumber values at: i)
			]].
>= aDataSeriesOrNumber
	"Element-wise comparision between two DataSeries.
	 Does not consider keys for comparision.
	 Returns an Array."

	(aDataSeriesOrNumber isNumber)
	ifTrue: [ ^ aDataSeriesOrNumber adaptToCollection: self andSend: #>= ]
	ifFalse: [
		^ (1 to: self size) collect: [ :i |
			(self values at: i) >= (aDataSeriesOrNumber values at: i)
			]].

That's a lot of duplicated code and we should avoid it (imagine that someone changes implementation of one of these methods, will he remember to make equivalent change to other 3?)

Loading

@olekscode
Copy link
Member

@olekscode olekscode commented Jun 6, 2019

I suggest the following implementation:

DataSeries >> < arg
	^ arg adaptToCollection: self andSend: #<.
DataSeries >> > arg
	^ arg adaptToCollection: self andSend: #>.
DataSeries >> <= arg
	^ arg adaptToCollection: self andSend: #<=.
DataSeries >> >= arg
	^ arg adaptToCollection: self andSend: #>=.

It fixes all the problems mentioned above

Loading

Refactored and added more tests
@AtharvaKhare
Copy link
Contributor Author

@AtharvaKhare AtharvaKhare commented Jun 14, 2019

Yes, it fixes the problems! :) I did try it early and it did not work, I must have been using it in a wrong example.

I have made changes as suggested. array with series does not work, since array does not understand boolean operators > < >= <=.

Loading

@olekscode olekscode merged commit 3196e32 into PolyMathOrg:master Jun 21, 2019
3 checks passed
Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants