Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Refactor documentation API Reference for gensim.summarization #1709

Merged
merged 29 commits into from Dec 12, 2017

Conversation

yurkai
Copy link
Contributor

@yurkai yurkai commented Nov 12, 2017

Fix #1668.

Copy link
Contributor

@menshikh-iv menshikh-iv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start! Thanks @yurkai, let's continue your work.

"""Text Cleaner

This module contains functions and processors used for processing text,
extracting sentences from text, working with acronyms and abbreviations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add examples/highlights/motivation here (after you finish with docstrings in this file).

RE_SENTENCE = re.compile(r'(\S.+?[.!?])(?=\s+|$)|(\S.+?)(?=[\n]|$)', re.UNICODE)
"""str: special separator used in abbreviations."""
RE_SENTENCE = re.compile(r'(\S.+?[.!?])(?=\s+|$)|(\S.+?)(?=[\n]|$)', re.UNICODE) # backup (\S.+?[.!?])(?=\s+|$)|(\S.+?)(?=[\n]|$)
"""SRE_Pattern: pattern to split text to sentences."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem with building here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.



def split_sentences(text):
"""Splits and returns list of sentences from given text. It preserves
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Examples section should be nice (here and everywhere).


Returns
-------
str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't match with return type

Input text.
separator : str
The separator between words to be replaced.
regexs : str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't match

----------
words : list
List of words.
separator : str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

str -> str, optional

words : list
List of words.
separator : str
The separator bertween elements. Blank set as default.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blank? I see " ", not ""

----------
text : list
Input text.
deacc : bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bool, option - here and everywhere


Parameters
----------
text : list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type doesn't match


Parameters
----------
text : list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type doesn't match

@menshikh-iv menshikh-iv changed the title Added docstrings in textcleaner.py [WIP] Refactor documentation API Reference for gensim.summarization Nov 13, 2017
@menshikh-iv menshikh-iv added this to In Progress in Documentation Nov 13, 2017
@menshikh-iv menshikh-iv moved this from In Progress to Yury (in progress) in Documentation Nov 13, 2017
Copy link
Contributor

@menshikh-iv menshikh-iv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review for the small part (I'll review fully later)


Parameters
----------
sequence : list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list of ?

Returns
-------
Graph
Created graph.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graph produced by sequence


Returns
-------
Graph
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concrete link to type like

:class:`~gensim. ... ...

here and everywhere (for "gensim-defined" types).

from gensim.summarization.graph import Graph


def build_graph(sequence):
"""Creates and returns graph with given sequence of values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's "type" of graph (oriented, etc)?

@@ -15,6 +51,15 @@ def build_graph(sequence):


def remove_unreachable_nodes(graph):
"""Removes unreachable nodes (nodes with no edges). Works inplace.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

. Works inplace. -> , inplace.

@param node: Node identifier
Parameters
----------
node : str or float
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

float?

@attention: While nodes can be of any type, it's strongly recommended
"""Adds given node to the graph.

Note
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note -> Warning

----------
text : str
Sequence of values.
ratio : float
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is optional too

If no "words" option is selected, the number of sentences is
reduced by the provided ratio, else, the ratio is ignored.
words : list
.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add descriptions to parameters

@@ -3,20 +3,72 @@
#
# Licensed under the GNU LGPL v2.1 - http://www.gnu.org/licenses/lgpl.html

"""This module contains function of computing BM25 scores for documents in
corpus and helper class `BM25` used in calculations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed link to BM25 algorithm (wiki for example)

@menshikh-iv
Copy link
Contributor

Nice work @yurkai 👍, pay attention to my commits and continue your work 🥇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Documentation
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

2 participants