Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Suffix Tree #290

Open
prshnt19 opened this issue Jun 23, 2020 · 15 comments · May be fixed by #323
Open

Add Suffix Tree #290

prshnt19 opened this issue Jun 23, 2020 · 15 comments · May be fixed by #323
Labels
Milestone

Comments

@prshnt19
Copy link
Member

Description of the problem

Example of the problem

References/Other comments

[1] https://en.wikipedia.org/wiki/Suffix_tree

@prshnt19 prshnt19 added this to the v0.0.1 milestone Jun 23, 2020
@subhangi2731
Copy link

can you please assign this issue to me? @prshnt19

@prshnt19
Copy link
Member Author

prshnt19 commented Dec 7, 2020

@czgdp1807
Copy link
Member

Thanks @prshnt19 for sharing the issue policy.
@subhangi2731 I'd request you to go through https://github.com/codezonediitj/pydatastructs/wiki/Plan-of-Action-for-Adding-New-Data-Structures since requires adding a new adding new data structure from scratch.

@Arvind-raj06
Copy link
Member

Let me take up this one @czgdp1807

@czgdp1807
Copy link
Member

Sure.

@Arvind-raj06
Copy link
Member

Arvind-raj06 commented Jan 20, 2021

suffix
I believe the API design would be like the below part

__new__(cls, array_flag = 0, key=None, root_data=None)

This would initialise a new suffix tree.

Rest of the parameters will be same as other tree implementation.

insert(self, key, data)
delete(self, key, **kwargs)
__str__(self)

height(self)
Will return the height of the tree.

Additional features

longest_common_substring
This method will return the longest common substring from the string comparison with tree

longest_repeated_substring
This method will return the longest repeated substring from the string comparison with tree

@czgdp1807
Copy link
Member

Can you pick example where suffix tree is used and show how above APIs will be used in that example? There might be an example on Wikipedia or some other website.

@Arvind-raj06
Copy link
Member

Arvind-raj06 commented Jan 20, 2021

https://www.cise.ufl.edu/~sahni/dsaaj/enrich/c16/suffix.htm hope in this site we get the example with analyzation
and implementation example is as https://www.geeksforgeeks.org/ukkonens-suffix-tree-construction-part-6/?ref=rp

@Arvind-raj06
Copy link
Member

As suffix tree isn't like any other binary tree i hope it has to be implemented in separate file and shall I go on with that @czgdp1807

@czgdp1807
Copy link
Member

Following are the questions and concerns,

  1. Is Suffix tree static or dynamic? In other words, is it possible to update it after constructing it from a string/set of strings? Or once created, it can be changed? Can you provide university lectures/papers which provide algorithms updating a suffix tree. I am unable to find anything relevant on Wikipedia or in this document.
  2. As of now we should go for implementing the applications in https://en.wikipedia.org/wiki/Suffix_tree#Applications as methods of the SuffixTree class. Later on once we suffix tree is added, more features can be added given in, https://en.wikipedia.org/wiki/Suffix_tree#Implementation

Comments on API

new(cls, array_flag = 0, key=None, root_data=None)

I feel that this is a bit complicated. Most of the literature writes about creating suffix trees from a set of strings. So, the API should be as simple as,

__new__(cls, strings)

strings should be an iterable(list, set, tuple, arrays, etc).

insert(self, key, data)
delete(self, key, **kwargs)

This depends on whether we want to keep static or dynamic. If there are algorithms for updating a dynamic trees then we can go for it. However, what is meant by key? Shouldn't it be simply a string.

height(self)
Will return the height of the tree.

Why do we need it? Mostly suffix trees are used to perform queries on strings, so this method may not provide very useful information. Any examples where height can be required by the end user?

longest_common_substring
This method will return the longest common substring from the string comparison with tree

longest_repeated_substring
This method will return the longest repeated substring from the string comparison with tree

They both should accept strings (set of strings or a single string should be decided upon). More features given in https://en.wikipedia.org/wiki/Suffix_tree#Applications should also be added.

@Arvind-raj06
Copy link
Member

As far my research on the above tree after the conversation
!. It is static and cannot changed after implementation as given in reference with https://users.dcc.uchile.cl/~gnavarro/ps/cpm08.2.pdf
2. Cool then just the implementation of suffix tree

I will take those changes and reframe the API

@Arvind-raj06
Copy link
Member

suffix

__new__(cls, strings)

The implementation of several features are quite complicated I hope it should be done step by step.

@czgdp1807
Copy link
Member

Sounds good. Though, we should have generalised suffix tree which should be capable of storing information about a set of strings. The tricky part is to end each string in the set with a different terminal symbol.

The implementation of several features are quite complicated I hope it should be done step by step.

You can start with implementing the logic for building the suffix tree and then add features in each new commit. Please do it locally and push only when the tests you add for suffix tree are passing.

@Arvind-raj06
Copy link
Member

@czgdp1807 Cool, I'm working on that

@Arvind-raj06 Arvind-raj06 linked a pull request Jan 31, 2021 that will close this issue
@subhangi2731
Copy link

If this issue is open can u please assign me

@czgdp1807 czgdp1807 removed the level3 label Dec 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants