Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MXNET-691]Add LabelBot prediction module #11935

Closed
wants to merge 4 commits into from
Closed

[MXNET-691]Add LabelBot prediction module #11935

wants to merge 4 commits into from

Conversation

YuelinZhang0822
Copy link

@YuelinZhang0822 YuelinZhang0822 commented Jul 30, 2018

Description

This bot will send daily GitHub issue reports with predictions of unlabeled issues.
It contains 2 parts:

  • Machine Learning part:
    A web server built based on AWS Elastic Beanstalk which can response to GET/POST requests and realize self-maintenance. It mainly has 2 features:
    • Train models: it will retrain Machine Learning models every 24 hours automatically using latest data.
    • Predict labels: once it receives GET/POST requests with issues ID, it will send predictions back.
  • Send Daily Emails:
    An AWS Lambda function which will be triggered everyday. Once this lambda function is executed, it will send POST requests to the Elastic Beanstalk web server asking predictions. Then it will general email content and send email.
    @Roshrini @lanking520 @marcoabreu

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • Add unittests
  • Code is well-documented:
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

@marcoabreu
Copy link
Contributor

Committers, please don't merge this PR. We only use this repository for review purposes and will then move this code into a separate repository.

l.append("CUDA")
return l

def predict(issues):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we split this portion into a class. It's not necessary to load model whenever you do the prediction. Instead we should try control load and update into a function so we can trigger once the auto-training is finished. This will also improve the prediction speed you have for now

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will write this portion into a class

recommendations = []
for i in range(len(best_n)):
l = rule_based(df_test.loc[i, 'title'])
l += [le.classes_[best_n[i][j]] for j in range(-1, -3, -1) if probs[i][best_n[i][j]] > 0.3]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have this threshold configurable outside?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will set the threshold as the input

LOGGER.info("Start training issues of Operator label")
# Step1: Fetch all labeled issues
LOGGER.info("Fetching Data..")
DF = DataFetcher()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to load a second time, can we bring load data into a single place?


LOGGER = logging.getLogger(__name__)

"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put this down below:

if "__name__" == "__main__":

* Send Daily Emails:
An AWS Lambda function which will be triggered everyday.
Once this lambda function is executed, it will send POST requests to the Elastic Beanstalk web server asking predictions.
Then it will general email content and send email.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: *generate?

This bot will send daily [GitHub issue](https://github.com/apache/incubator-mxnet/issues) reports with predictions of unlabeled issues.
It contains 2 parts:
* Machine Learning part:
A web server built based on AWS Elastic Beanstalk which can response to GET/POST requests and realize self-maintenance. It mainly has 2 features:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add link to "AWS Elastic Beanstalk"

def __init__(self):
self.json_data = None

def cleanstr(self, s, e):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you change s, e to be more descriptive?

return filename



Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove extra empty lines


logger = logging.getLogger(__name__)

# English Stopwords
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep it in a separate file?

self.data[column] = self.data[column].str.replace(self.regex_str[3], ' ')
tempcol = self.data[column].values.tolist()
for i in range(len(tempcol)):
row = BeautifulSoup(tempcol[i], 'html.parser').get_text().lower()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure you add all external dependencies in requirements

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, all external dependencies in the requirements.txt

return int(self.cleanstr(response.headers['link'], " ").split()[-3])

def fetch_issues(self, numbers):
# number: a list of issue ids
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change "numbers" to more understandable variable name?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

r'(?:\S)' # anything else
]

def __init__(self, loggingLevel = 20):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add docstring

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

def __init__(self):
self.json_data = None

def cleanstr(self, string, substring):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clean_str?

import logging
import nltk
import ssl
try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments in here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

'body': " bug ``` import pandas``` ## Environment info",
'labels': ['Doc']}])

def tearDown(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this function, shall we remove?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

1. set constructor variabls for each class
2. add doc strings
3. improve coding style
3. add README
@YuelinZhang0822 YuelinZhang0822 changed the title [MXNET-691][WIP] Add LabelBot prediction module [MXNET-691]Add LabelBot prediction module Aug 5, 2018
@nswamy nswamy added Build pr-awaiting-review PR is waiting for code review Feature labels Aug 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants