0.5.4 #112

yiwen-h · 2023-06-15T11:32:31Z

New version has:

Improved test coverage - up to 47%
Attempted 2 layer sklearn model (targeting major categories before the minor categories), performed poorly
Improved train_test_splitting process so that all model types use the same split for the same random_seed for better comparability
Fixed how distilbert model handles categorical features using CategoryEncoder layer
Added analysis to bert pipeline so Excel performance and label files are also outputted
Added sentiment async endpoint to API
Updated params and dataset to v6 of framework

…er sklearn models which are 3d

101 two model layers

… bert

same output as predict_multilabel_sklearn - aim is to be able to use both for API. Also added tests

90 bert analysis

…d in preprocessing due to missing values in q_type

…of whether text is preprocessed

…or_cats in params

107 v6 framework

…dataset

107 v6 framework

…bert pipeline

… API endpoints

still basic - not using async

…alse

97 sentiment api

yiwen-h and others added 30 commits June 1, 2023 15:14

fixed xgb model outputting 2d np.ndarray for probabilities unlike oth…

2c6ebb0

…er sklearn models which are 3d

added "labelling not possible" to params

2ca49a9

reduced params in gridsearch

acf814a

added two layer pipeline

1a1dc1d

added fillna to process comments with no FFT question

4550098

finalised two layer sklearn model

219eabb

Merge pull request #105 from CDU-data-science-team/101_two-model-layers

529cb6c

101 two model layers

removed unnecessary extra models in current_best folder

a9930a9

added new tests for predict with sklearn, to prepare for predict with…

035443f

… bert

fixed padding to "max_length" for bert_data_to_dataset

04ca301

added new function predict_multilabel_bert

a5cf9d9

same output as predict_multilabel_sklearn - aim is to be able to use both for API. Also added tests

added more tests for predict_unlabelled_text

8853d06

added test for write_multilabel_models_and_metrics

e3b0db6

added test for write_model_preds

6e60fa7

added test for bert_write_model_preds

cc8d5f7

amended write_model_preds so it can write bert predictions too

f334cbf

added temppath to test_write_results

0c15946

fixed unnecessary param in docstring

d8526d9

test using same random split with merged categories, 3 different models

72c9ad9

coverage up to 44%

07d8e49

Merge pull request #106 from CDU-data-science-team/90_bert_analysis

871252a

90 bert analysis

added new minor_cats labels and amended pipeline for v6 of framework

999a0ad

fixed wrong minor_cats in params and too many comments getting cleane…

bf05679

…d in preprocessing due to missing values in q_type

train_test_split now exactly the same for all model types regardless …

6537588

…of whether text is preprocessed

final pipeline for v6 of framework

ae4cc5a

fixed minor_cats which should still include Impact of treatment/ care

02e05d8

updated major_cats_dict with new v6 categories

fccc98c

added v5 labels specifically to API, rather than reading from params

81ab2f6

rewrote test_raw_data test ficture so it is flexible depending on min…

eb7dcf3

…or_cats in params

Merge pull request #108 from CDU-data-science-team/107_v6_framework

e078e12

107 v6 framework

yiwen-h and others added 24 commits June 13, 2023 10:14

formatted multilabel_pipeline with black, amended param for final v6 …

bd8c71c

…dataset

final models for v6 framework

71d7bd8

Merge pull request #109 from CDU-data-science-team/107_v6_framework

60a1f29

107 v6 framework

wrote the test first ;)

04939f3

added bert model to gitignore, updated test_api

f988978

added test for predict_sentiment_bert

5f08357

added test for predict_sentiment_bert

e7746c2

added predict_sentiment_bert and its test

46ca964

added predict_sentiment_bert and its test (formatted w black)

fbbdfae

took out test used for debugging from predict_unlabelled_text

2fe5bc6

api partly done. have discovered bug with how q_type is onehotted in …

76e9482

…bert pipeline

fixing bert_additional_feats with CategoricalEncoder layer

c5fcd76

converting 0-4 scale of sentiment back to 1-5 as in original data

d157b05

Better names for ItemOut type definitions for sentiment vs multilabel…

5dedda1

… API endpoints

api with sentiment complete

ff5b275

still basic - not using async

updated coverage (47%)

186976b

fixed bug with predict_sentiment_bert crashing if preprocess_text = F…

d3059bf

…alse

made sentiment API async

ca009cf

added updated coverage file

ccab149

amended test_api to use Mocked model as distilbert too large for github

9f4713e

Merge pull request #111 from CDU-data-science-team/97_sentiment_API

77bb495

97 sentiment api

updated poetry.lock for 0.5.4

5c444ab

updated version number in pyproject.toml

1d5582f

updated requirements.txt in api folder

842271f

yiwen-h requested a review from ChrisBeeley June 15, 2023 11:32

yiwen-h merged commit 0b65e33 into main Jun 15, 2023

yiwen-h deleted the 0.5.4 branch June 15, 2023 12:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.5.4 #112

0.5.4 #112

yiwen-h commented Jun 15, 2023 •

edited

Loading

0.5.4 #112

0.5.4 #112

Conversation

yiwen-h commented Jun 15, 2023 • edited Loading

yiwen-h commented Jun 15, 2023 •

edited

Loading