[MXNET-691]Add LabelBot prediction module #11935

YuelinZhang0822 · 2018-07-30T23:27:46Z

Description

This bot will send daily GitHub issue reports with predictions of unlabeled issues.
It contains 2 parts:

Machine Learning part:
A web server built based on AWS Elastic Beanstalk which can response to GET/POST requests and realize self-maintenance. It mainly has 2 features:
- Train models: it will retrain Machine Learning models every 24 hours automatically using latest data.
- Predict labels: once it receives GET/POST requests with issues ID, it will send predictions back.
Send Daily Emails:
An AWS Lambda function which will be triggered everyday. Once this lambda function is executed, it will send POST requests to the Elastic Beanstalk web server asking predictions. Then it will general email content and send email.
@Roshrini @lanking520 @marcoabreu

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
Add unittests
Code is well-documented:
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

marcoabreu · 2018-07-31T14:23:13Z

Committers, please don't merge this PR. We only use this repository for review purposes and will then move this code into a separate repository.

lanking520 · 2018-07-31T18:36:31Z

mxnet-bot/LabelBotPredict/elastic_beanstalk_server/predict_labels.py

+            l.append("CUDA")
+    return l
+
+def predict(issues):


Can we split this portion into a class. It's not necessary to load model whenever you do the prediction. Instead we should try control load and update into a function so we can trigger once the auto-training is finished. This will also improve the prediction speed you have for now

Yes, will write this portion into a class

lanking520 · 2018-07-31T18:36:52Z

mxnet-bot/LabelBotPredict/elastic_beanstalk_server/predict_labels.py

+    recommendations = []
+    for i in range(len(best_n)):
+        l = rule_based(df_test.loc[i, 'title'])
+        l += [le.classes_[best_n[i][j]]  for j in range(-1, -3, -1) if probs[i][best_n[i][j]] > 0.3]


Can we have this threshold configurable outside?

Will set the threshold as the input

lanking520 · 2018-07-31T18:37:49Z

mxnet-bot/LabelBotPredict/elastic_beanstalk_server/train.py

+    LOGGER.info("Start training issues of Operator label")
+    # Step1: Fetch all labeled issues
+    LOGGER.info("Fetching Data..")
+    DF = DataFetcher()


Why do we need to load a second time, can we bring load data into a single place?

lanking520 · 2018-07-31T18:41:47Z

mxnet-bot/LabelBotPredict/label_bot_send_predictions/LabelBot.py

+
+LOGGER = logging.getLogger(__name__)
+
+"""


put this down below:

if "__name__" == "__main__":

Roshrini · 2018-08-01T21:22:08Z

mxnet-bot/LabelBotPredict/README.md

+* Send Daily Emails: 
+  An AWS Lambda function which will be triggered everyday. 
+  Once this lambda function is executed, it will send POST requests to the Elastic Beanstalk web server asking predictions. 
+  Then it will general email content and send email.


typo: *generate?

Roshrini · 2018-08-01T21:22:25Z

mxnet-bot/LabelBotPredict/README.md

+This bot will send daily [GitHub issue](https://github.com/apache/incubator-mxnet/issues) reports with predictions of unlabeled issues.
+It contains 2 parts:
+* Machine Learning part:
+  A web server built based on AWS Elastic Beanstalk which can response to GET/POST requests and realize self-maintenance. It mainly has 2 features:


Add link to "AWS Elastic Beanstalk"

Roshrini · 2018-08-01T21:30:17Z

mxnet-bot/LabelBotPredict/elastic_beanstalk_server/DataFetcher.py

+    def __init__(self):
+        self.json_data = None
+
+    def cleanstr(self, s, e):


can you change s, e to be more descriptive?

Roshrini · 2018-08-01T21:33:12Z

mxnet-bot/LabelBotPredict/elastic_beanstalk_server/DataFetcher.py

+        return filename
+
+
+


remove extra empty lines

Roshrini · 2018-08-01T21:40:15Z

mxnet-bot/LabelBotPredict/elastic_beanstalk_server/SentenceParser.py

+
+logger = logging.getLogger(__name__)
+
+# English Stopwords


Keep it in a separate file?

Roshrini · 2018-08-01T21:43:43Z

mxnet-bot/LabelBotPredict/elastic_beanstalk_server/SentenceParser.py

+        self.data[column] = self.data[column].str.replace(self.regex_str[3], ' ')
+        tempcol = self.data[column].values.tolist()
+        for i in range(len(tempcol)):
+            row = BeautifulSoup(tempcol[i], 'html.parser').get_text().lower()


Make sure you add all external dependencies in requirements

yes, all external dependencies in the requirements.txt

Roshrini · 2018-08-02T23:47:17Z

mxnet-bot/LabelBotPredict/elastic_beanstalk_server/DataFetcher.py

+        return int(self.cleanstr(response.headers['link'], " ").split()[-3])
+
+    def fetch_issues(self, numbers):
+        # number: a list of issue ids


Can you change "numbers" to more understandable variable name?

Roshrini · 2018-08-02T23:51:11Z

mxnet-bot/LabelBotPredict/elastic_beanstalk_server/SentenceParser.py

+        r'(?:\S)'                                                                       # anything else
+    ]
+
+    def __init__(self, loggingLevel = 20):


Add docstring

lanking520 · 2018-08-02T23:54:02Z

mxnet-bot/LabelBotPredict/elastic_beanstalk_server/DataFetcher.py

+    def __init__(self):
+        self.json_data = None
+
+    def cleanstr(self, string, substring):


clean_str?

lanking520 · 2018-08-02T23:55:08Z

mxnet-bot/LabelBotPredict/elastic_beanstalk_server/SentenceParser.py

+import logging
+import nltk
+import ssl
+try:


Some comments in here

lanking520 · 2018-08-03T00:00:25Z

mxnet-bot/LabelBotPredict/elastic_beanstalk_server/test_sentenceparse.py

+									  'body': " bug ``` import pandas``` ## Environment info", 
+									  'labels': ['Doc']}])
+
+	def tearDown(self):


What is the purpose of this function, shall we remove?

1. set constructor variabls for each class 2. add doc strings 3. improve coding style 3. add README

Yuelin Zhang added 2 commits July 30, 2018 16:23

upload labelbot_with_predictions

8e61b0f

add license to .yaml file

c48a4be

lanking520 suggested changes Jul 31, 2018

View reviewed changes

Roshrini suggested changes Aug 1, 2018

View reviewed changes

add unittests and revise code

0b86254

Roshrini suggested changes Aug 2, 2018

View reviewed changes

lanking520 suggested changes Aug 3, 2018

View reviewed changes

revise code based on code review

19781f2

1. set constructor variabls for each class 2. add doc strings 3. improve coding style 3. add README

YuelinZhang0822 changed the title ~~[MXNET-691][WIP] Add LabelBot prediction module~~ [MXNET-691]Add LabelBot prediction module Aug 5, 2018

nswamy added Build pr-awaiting-review PR is waiting for code review Feature labels Aug 9, 2018

YuelinZhang0822 closed this Aug 9, 2018

szha added Feature request and removed Feature labels Nov 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MXNET-691]Add LabelBot prediction module #11935

[MXNET-691]Add LabelBot prediction module #11935

YuelinZhang0822 commented Jul 30, 2018 •

edited

marcoabreu commented Jul 31, 2018

lanking520 Jul 31, 2018

YuelinZhang0822 Aug 1, 2018

lanking520 Jul 31, 2018

YuelinZhang0822 Aug 1, 2018

lanking520 Jul 31, 2018

lanking520 Jul 31, 2018

Roshrini Aug 1, 2018

Roshrini Aug 1, 2018

Roshrini Aug 1, 2018

Roshrini Aug 1, 2018

Roshrini Aug 1, 2018

Roshrini Aug 1, 2018

YuelinZhang0822 Aug 1, 2018

Roshrini Aug 2, 2018

YuelinZhang0822 Aug 5, 2018

Roshrini Aug 2, 2018

YuelinZhang0822 Aug 5, 2018

lanking520 Aug 2, 2018

lanking520 Aug 2, 2018

YuelinZhang0822 Aug 5, 2018

lanking520 Aug 3, 2018

marcoabreu Aug 3, 2018

YuelinZhang0822 Aug 4, 2018


		LOGGER = logging.getLogger(__name__)

		"""

[MXNET-691]Add LabelBot prediction module #11935

[MXNET-691]Add LabelBot prediction module #11935

Conversation

YuelinZhang0822 commented Jul 30, 2018 • edited

Description

Checklist

Essentials

marcoabreu commented Jul 31, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

YuelinZhang0822 commented Jul 30, 2018 •

edited