Input for Single-Doc Summerization #113

johnhutx · 2021-11-18T18:55:30Z

Hello, Is it possible to provide a list of (already split) sentences as the source input to the summarizer, as opposed to a single source document? The goal is to treat each list of sentences as one long sequence during extractive summarization.

niansong1996 · 2021-11-18T21:57:23Z

Hi, I am not sure if I understand it correctly. Do you want to provide a List[List[str]] where the first layer is the list of documents and the second layer is the list of sentences in that document?

If so, what's the level of extraction for the extractive summarization?

johnhutx · 2021-11-18T22:37:30Z

Yes, I would want to provide a List[List[str]]. Instead of extracting at the sentence level, I would like to extract at the List[str] level. It's just like extracting a group of sentences every time.

niansong1996 · 2021-11-24T04:43:14Z

Hi @johnhutx, I am not sure if I understand the situation correctly, can you make an example? Is it possible to merge the inner list and use the current API instead?

johnhutx · 2021-11-24T05:58:05Z

No problem @niansong1996. Consider a TV screenplay that contains multiple scenes List[scenes]. I would like to extract the important scenes instead of sentences from the screenplay. Each scene usually contains multiple sentences, which can be represented as a List[str]. The goal is to extract the scenes (List[str]) from the screenplay (List[List[str]]).

niansong1996 · 2021-11-25T03:38:16Z

Thanks for the clarification, it's much clearer to me now.

If you would like to extract scenes using summarization, I assume there is no query? Does this mean that the model would need to figure out which scenes are more important than others?

For the task you described, I think probably the best choice is to make a subclass of our lexrank model here and customize it (change L27-40). In our implementation, we split the document into a list of sentences, but you could potentially input a list of scenes, each one of which is concatenated sentences from the scene.

Hope this is helpful.

johnhutx · 2021-11-25T20:12:40Z

Thank you for the clarification.

niansong1996 added the question Further information is requested label Nov 18, 2021

niansong1996 closed this as completed Dec 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Input for Single-Doc Summerization #113

Input for Single-Doc Summerization #113

johnhutx commented Nov 18, 2021

niansong1996 commented Nov 18, 2021

johnhutx commented Nov 18, 2021

niansong1996 commented Nov 24, 2021

johnhutx commented Nov 24, 2021

niansong1996 commented Nov 25, 2021

johnhutx commented Nov 25, 2021

Input for Single-Doc Summerization #113

Input for Single-Doc Summerization #113

Comments

johnhutx commented Nov 18, 2021

niansong1996 commented Nov 18, 2021

johnhutx commented Nov 18, 2021

niansong1996 commented Nov 24, 2021

johnhutx commented Nov 24, 2021

niansong1996 commented Nov 25, 2021

johnhutx commented Nov 25, 2021