Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added SignatureVectorizer (iisignature) #108

Merged
merged 8 commits into from Nov 25, 2022

Conversation

jh83775
Copy link
Contributor

@jh83775 jh83775 commented Nov 5, 2022

I've implemented SignatureVectorizer, which returns the path signatures for a collection of paths.

This vectorizer essentially wraps the iisignature package such that it fits into the standard sklearn style fit_transform pipeline. While it does require iisignature, the imports are written such that the rest of the library can still be used if the user does not have iisignature installed.

For more details on the path signature technique, I've found this paper quite instructive:
A Primer on the Signature Method in Machine Learning (Chevyrev, I.)

@codecov-commenter
Copy link

codecov-commenter commented Nov 5, 2022

Codecov Report

Merging #108 (d85ceb1) into master (80f5319) will increase coverage by 0.28%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #108      +/-   ##
==========================================
+ Coverage   90.85%   91.13%   +0.28%     
==========================================
  Files          32       34       +2     
  Lines        4649     4774     +125     
==========================================
+ Hits         4224     4351     +127     
+ Misses        425      423       -2     
Impacted Files Coverage Δ
vectorizers/__init__.py 100.00% <100.00%> (ø)
vectorizers/signature_vectorizer.py 100.00% <100.00%> (ø)
vectorizers/tests/test_signature_vectorizer.py 100.00% <100.00%> (ø)
vectorizers/preprocessing.py 88.51% <0.00%> (+0.95%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@lmcinnes
Copy link
Contributor

lmcinnes commented Nov 6, 2022

This looks like a good addition. Would you be willing to add some basic tests, and possibly even a tutorial guide for the documentation?

Also is it worth adding iisignature to the requirements?

@jh83775
Copy link
Contributor Author

jh83775 commented Nov 7, 2022

No problem, I should have some time over the next few weeks to add tests and tutorial guides.

Happy to add iisignature to the requirements too - the current implementation was in case we wanted to avoid having too many dependencies, but if that's not an issue I'll add it in directly.

@lmcinnes
Copy link
Contributor

lmcinnes commented Nov 8, 2022

Thanks -- I know the tests and documentation are a lot of extra work, but they are very valuable for maintenance, and for ensuring users can find and use the feature, so it will definitely be worth the effort in the long run.

As far as dependencies go ... I haven't checked what iisignature requires. If it is reasonably self contained then one extra dependency is probably fine for now.

@jh83775
Copy link
Contributor Author

jh83775 commented Nov 21, 2022

I have added an example notebook and some tests for the SignatureVectorizer. I'll add more functionality (e.g. common path transforms) and more examples in the future too, but this should suffice as a good starting point.

I had some trouble getting the automated tests to work. For some reason including iisignature in the requirements file broke something (error messages suggested it was being installed before numpy and thus breaking since numpy is a dependancy?). I've commented out iisignature in the requirements.txt for now, and added it explicitly to azure-pipelines.yml, which seems to work. Do you have any suggestions? I suspect there might be something trivial which I've missed.
Edit: Seems like iisignature imports numpy in its setup.py file, and this was causing problems in the automated test pipelines. I've re-added iisignature to requirements.txt and put an explicit pip install numpy in azure-pipelines.yml just before the pip install -r requirements.txt, just so its definitely present on the system before that call. I think this resolves the issue I was having with minimal changes to the existing code.

…l to azure-pipelines.yml, before it installs requirements.txt
@lmcinnes
Copy link
Contributor

Thanks for this -- as I say it should be very valuable for a great many people and is a great complement to the techniques we had so far. I'm looking forward to putting this to use myself.

@lmcinnes lmcinnes merged commit 51af4a7 into TutteInstitute:master Nov 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants