[MXNET-691]Add LabelBot prediction module #11935
Conversation
Committers, please don't merge this PR. We only use this repository for review purposes and will then move this code into a separate repository. |
l.append("CUDA") | ||
return l | ||
|
||
def predict(issues): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we split this portion into a class. It's not necessary to load model whenever you do the prediction. Instead we should try control load and update into a function so we can trigger once the auto-training is finished. This will also improve the prediction speed you have for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will write this portion into a class
recommendations = [] | ||
for i in range(len(best_n)): | ||
l = rule_based(df_test.loc[i, 'title']) | ||
l += [le.classes_[best_n[i][j]] for j in range(-1, -3, -1) if probs[i][best_n[i][j]] > 0.3] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have this threshold configurable outside?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will set the threshold as the input
LOGGER.info("Start training issues of Operator label") | ||
# Step1: Fetch all labeled issues | ||
LOGGER.info("Fetching Data..") | ||
DF = DataFetcher() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to load a second time, can we bring load data into a single place?
|
||
LOGGER = logging.getLogger(__name__) | ||
|
||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put this down below:
if "__name__" == "__main__":
mxnet-bot/LabelBotPredict/README.md
Outdated
* Send Daily Emails: | ||
An AWS Lambda function which will be triggered everyday. | ||
Once this lambda function is executed, it will send POST requests to the Elastic Beanstalk web server asking predictions. | ||
Then it will general email content and send email. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: *generate?
mxnet-bot/LabelBotPredict/README.md
Outdated
This bot will send daily [GitHub issue](https://github.com/apache/incubator-mxnet/issues) reports with predictions of unlabeled issues. | ||
It contains 2 parts: | ||
* Machine Learning part: | ||
A web server built based on AWS Elastic Beanstalk which can response to GET/POST requests and realize self-maintenance. It mainly has 2 features: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add link to "AWS Elastic Beanstalk"
def __init__(self): | ||
self.json_data = None | ||
|
||
def cleanstr(self, s, e): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you change s, e to be more descriptive?
return filename | ||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove extra empty lines
|
||
logger = logging.getLogger(__name__) | ||
|
||
# English Stopwords |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep it in a separate file?
self.data[column] = self.data[column].str.replace(self.regex_str[3], ' ') | ||
tempcol = self.data[column].values.tolist() | ||
for i in range(len(tempcol)): | ||
row = BeautifulSoup(tempcol[i], 'html.parser').get_text().lower() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure you add all external dependencies in requirements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, all external dependencies in the requirements.txt
return int(self.cleanstr(response.headers['link'], " ").split()[-3]) | ||
|
||
def fetch_issues(self, numbers): | ||
# number: a list of issue ids |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you change "numbers" to more understandable variable name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
r'(?:\S)' # anything else | ||
] | ||
|
||
def __init__(self, loggingLevel = 20): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add docstring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
def __init__(self): | ||
self.json_data = None | ||
|
||
def cleanstr(self, string, substring): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clean_str
?
import logging | ||
import nltk | ||
import ssl | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments in here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
'body': " bug ``` import pandas``` ## Environment info", | ||
'labels': ['Doc']}]) | ||
|
||
def tearDown(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of this function, shall we remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
1. set constructor variabls for each class 2. add doc strings 3. improve coding style 3. add README
Description
This bot will send daily GitHub issue reports with predictions of unlabeled issues.
It contains 2 parts:
A web server built based on AWS Elastic Beanstalk which can response to GET/POST requests and realize self-maintenance. It mainly has 2 features:
An AWS Lambda function which will be triggered everyday. Once this lambda function is executed, it will send POST requests to the Elastic Beanstalk web server asking predictions. Then it will general email content and send email.
@Roshrini @lanking520 @marcoabreu
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.