Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Math-TSNE is incomplete #115

Closed
AtharvaKhare opened this issue May 14, 2019 · 10 comments

Comments

@AtharvaKhare
Copy link
Contributor

commented May 14, 2019

tsneObj start currently does nothing as:

  • runPCAonX is commented out
  • x2p is incomplete (Calculation of P-values)
  • Core algorithm's iterations are incomplete
@SergeStinckwich

This comment has been minimized.

Copy link
Member

commented May 14, 2019

Yes you are right, you code is not finish. Please contribute if you have time.

@AtharvaKhare

This comment has been minimized.

Copy link
Contributor Author

commented May 15, 2019

Will give it a try today.

@SergeStinckwich

This comment has been minimized.

Copy link
Member

commented May 15, 2019

Great but this is not an easy task. Having a working PCA implementation was the first step :-)

@AtharvaKhare

This comment has been minimized.

Copy link
Contributor Author

commented May 15, 2019

PMPrincipalComponentAnalyserJacobiTransformation is currently working, whereas PMPrincipalComponentAnalyserSVD is not. Both have close-to-same outputs right?

If yes, then I'll use it for now.

@AtharvaKhare

This comment has been minimized.

Copy link
Contributor Author

commented May 29, 2019

I think there still is room for improvement in speed of t-SNE. One way would be using threads while computing joint probabilities. But main thing to do is profiling and checking for which parts are actually causing the slowdown.

@SergeStinckwich

This comment has been minimized.

Copy link
Member

commented May 29, 2019

I don't see why threads will speedup the process. Yes profiling will help you identify the main slowdown.

@AtharvaKhare

This comment has been minimized.

Copy link
Contributor Author

commented May 29, 2019

computePairwiseAffinities does binary search for every row of input - which is a big chunk of time for rows more than 1,000. I think splitting this loop into 4 or 8 threads will make it faster:

1 to: n do: [ :i |
"Job progress gets updated every 10 rows"
(i % 10 = 0) ifTrue: [
job title: ('' join: {'Step 2/3: Computing joint probablity for point '. i asString. ' of '. n asString}).
job progress: (i/n).
].
"Set minimum and maximum value of precision"
betaMin := Float infinity negated.
betaMax := Float infinity.
"Ignore i-th element of the row d[i] and copy rest in distanceVector.
Also initialize pVector to 0"
1 to: n do: [ :index |
(index = i) ifFalse: [
(index < i)
ifTrue: [ distanceVector at: index put: (d rowAt: i columnAt: index).
pVector at: index put: 0. ]
ifFalse: [ distanceVector at: (index - 1) put: (d rowAt: i columnAt: index).
pVector at: (index - 1) put: 0.].
].
].
entropy := self class entropyOf: distanceVector andPRow: pVector withBeta: (beta at: i).
entropyDiff := entropy - logU.
tries := 0.
[ (entropyDiff abs > 1e-5) & (tries < 50)] whileTrue: [
(entropyDiff > 0)
ifTrue: [
betaMin := beta at: i.
((betaMax = Float infinity) | (betaMin = Float infinity negated))
ifTrue: [ beta at: i put: ((beta at: i) * 2) ]
ifFalse: [ beta at: i put: (((beta at: i) + betaMax) / 2)
].
]
ifFalse: [
betaMax := beta at: i.
((betaMax = Float infinity) | (betaMin = Float infinity negated))
ifTrue: [ beta at: i put: ((beta at: i) / 2) ]
ifFalse: [ beta at: i put: (((beta at: i) + betaMin) / 2)
].
].
entropy := self class entropyOf: distanceVector andPRow: pVector withBeta: (beta at: i).
entropyDiff := entropy - logU.
tries := tries + 1.
].
1 to: n do: [ :index |
(index = i) ifFalse: [
(index < i)
ifTrue: [ p rowAt: i columnAt: index put: (pVector at: index) ]
ifFalse: [ p rowAt: i columnAt: index put: (pVector at: index - 1) ].
].
].
].

since every row is independent of other.

@SergeStinckwich

This comment has been minimized.

Copy link
Member

commented May 29, 2019

What do you mean by threads ? threads in the Smalltalk VM ? The VM will not take advantage of this to run your code faster, all Smalltalk processes share the same thread.

@AtharvaKhare

This comment has been minimized.

Copy link
Contributor Author

commented May 29, 2019

I did not know that. So if Pharo is sequential, threads will not help.

@SergeStinckwich

This comment has been minimized.

Copy link
Member

commented May 29, 2019

Pharo is not sequential, but process are green threads all running on the same VM thread. Maybe in the future the VM will manage multi-threads map them to the core but this is not the case right now. The only solution is to run multiple-images.

@SergeStinckwich SergeStinckwich added this to the 1.0 milestone Aug 4, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.