Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Functionalities #75

Open
wants to merge 16 commits into
base: master
Choose a base branch
from
Open

Add Functionalities #75

wants to merge 16 commits into from

Conversation

parantak
Copy link
Contributor

  1. Added preprocessing class, @someshsingh22 and @rajaswa , please check if this implementation would work.
  2. Added Optimal String Alignment (OSA) to Levenshtein, and made minor code changes to the original implementation
  3. Made minor documentation additions/changes.

It satisfies the triangle inequality, qualifies as a metric.
1. Introduces transposition edit cost
2. Not restricted by the assumption that every subsequence can only be edited once.
Update with Damerau-Levenshtein tests.
Different from Damerau-Levenshtein. This has a restrictive assumption.
minor documentation fix
See if this structure and then we can start implementing the filters
@sonarcloud
Copy link

sonarcloud bot commented May 22, 2020

Kudos, SonarCloud Quality Gate passed!

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities (and Security Hotspot 0 Security Hotspots to review)
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

@codecov
Copy link

codecov bot commented May 22, 2020

Codecov Report

Merging #75 into master will increase coverage by 0.08%.
The diff coverage is 96.55%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #75      +/-   ##
==========================================
+ Coverage   94.91%   95.00%   +0.08%     
==========================================
  Files           8       10       +2     
  Lines         334      360      +26     
==========================================
+ Hits          317      342      +25     
- Misses         17       18       +1     
Impacted Files Coverage Δ
decepticonlp/preprocessing/preprocessing.py 95.23% <95.23%> (ø)
decepticonlp/__init__.py 100.00% <100.00%> (ø)
decepticonlp/metrics/char_metrics.py 100.00% <100.00%> (ø)
decepticonlp/preprocessing/__init__.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 64f9991...05597a1. Read the comment docs.

Copy link
Member

@someshsingh22 someshsingh22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preprocessing is perfect, but I don't know if OSA is used in literature because it is not a true metric

@someshsingh22
Copy link
Member

Also I would recommend to make PR from branches when you are making several changes, for example suppose we wish to continue with Preprocessing and not OSA you will have to roll back changes, these are completely parallel, so you can instead make two branches OSA and preprocessing and make separate PRs

@parantak
Copy link
Contributor Author

@someshsingh22 , I know it isn't a true metric, but then the thing is it captures a swap perfectly (in our case the swap perturbation), without allowing that subsequence to be edited any further, which is why I included it an option in Levenshtein itself, and not as a separate metric.
As for the complete metric, I have already included Damerau-Levenshtein. Tell me if you still want me to remove it, and I'll roll back certain changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants