Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input for Single-Doc Summerization #113

Closed
johnhutx opened this issue Nov 18, 2021 · 6 comments
Closed

Input for Single-Doc Summerization #113

johnhutx opened this issue Nov 18, 2021 · 6 comments
Labels
question Further information is requested

Comments

@johnhutx
Copy link

Hello, Is it possible to provide a list of (already split) sentences as the source input to the summarizer, as opposed to a single source document? The goal is to treat each list of sentences as one long sequence during extractive summarization.

@niansong1996
Copy link
Collaborator

Hi, I am not sure if I understand it correctly. Do you want to provide a List[List[str]] where the first layer is the list of documents and the second layer is the list of sentences in that document?

If so, what's the level of extraction for the extractive summarization?

@niansong1996 niansong1996 added the question Further information is requested label Nov 18, 2021
@johnhutx
Copy link
Author

Yes, I would want to provide a List[List[str]]. Instead of extracting at the sentence level, I would like to extract at the List[str] level. It's just like extracting a group of sentences every time.

@niansong1996
Copy link
Collaborator

Hi @johnhutx, I am not sure if I understand the situation correctly, can you make an example? Is it possible to merge the inner list and use the current API instead?

@johnhutx
Copy link
Author

No problem @niansong1996. Consider a TV screenplay that contains multiple scenes List[scenes]. I would like to extract the important scenes instead of sentences from the screenplay. Each scene usually contains multiple sentences, which can be represented as a List[str]. The goal is to extract the scenes (List[str]) from the screenplay (List[List[str]]).

@niansong1996
Copy link
Collaborator

Thanks for the clarification, it's much clearer to me now.

If you would like to extract scenes using summarization, I assume there is no query? Does this mean that the model would need to figure out which scenes are more important than others?

For the task you described, I think probably the best choice is to make a subclass of our lexrank model here and customize it (change L27-40). In our implementation, we split the document into a list of sentences, but you could potentially input a list of scenes, each one of which is concatenated sentences from the scene.

Hope this is helpful.

@johnhutx
Copy link
Author

Thank you for the clarification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants