AutoML practitioners Challenges

This repository contains code and dataset for for the research work "Challenges and Barriers of Using Low Code Software for Machine Learning"

Low-code (i.e., AutoML) Development

A machine learning program is a program that can learn from experience, i.e., data. In the traditional approach, a human expert analyses data and explores the search space to find the best model. AutoML aims to democratize machine learning to domain experts by automating and abstracting machine learning-related complexities. It aims to solve the challenge of automating the Combined Algorithm Selection and Hyper-parameter tuning (CASH) problem. AutoML is a combination of automation and ML. It automates various tasks on the ML pipeline such as data prepossessing, model selection, hyper-parameter tuning, and model parameter optimization. They employ various types of techniques such as grid search, genetics, and Bayesian algorithms. Some AutoML services also help with data visualization, model interpretability, and deployment.

In this study, we download a large number of Stack Overflow posts that contain discussions about various low-code platforms. We apply topic modeling on the textual contents of the posts. We label the topics and categorize the topics into hierarchies. We analyze the popularity and difficulty of the topics. Our study offers several findings based on the four research questions as discussed below;

Stack Overflow dataset

The data is collected from low-code development-related 14.3K SO posts. The Stack Overflow June 2022 data dump is used. The dump can be downloaded from archive.org.

The final list of 41 AutoML cloud and non-cloud service related tags that we used to collect posts from StackOverflow is as follows:

['amazon-machine-learning', 'automl', 'aws-chatbot', 'aws-lex', 'azure-machine-learning-studio', 'azure-machine-learning-workbench', 'azureml', 'azureml-python-sdk', 'azuremlsdk', 'driverless-ai', 'ensemble-learning', 'gbm', 'google-cloud-automl-nl', 'google-cloud-vertex-ai', 'google-natural-language', 'h2o.ai', 'h2o4gpu', 'mlops', 'sparkling-water', 'splunk-calculation', 'splunk-dashboard', 'splunk-query', 'splunk-sdk', 'amazon-sagemaker', 'tpot', 'auto-sklearn', 'rapidminer', 'pycaret', 'amazon-lex', 'auto-keras', 'bigml', 'dataiku', 'datarobot', 'google-cloud-automl', 'h2o', 'mljar', 'splunk', 'transmogrifai', 'ludwig', 'azure-machine-learning-service', 'pycaret']

The extracted questions and answers based on these final tags are in our dataset folder All questions.csv and All answers.csv files, respectively.

Replication Materials to Answer to RQs in the paper:

'Codes' folder includes all the data and code we used to filter data, run topic modeling, generate graphs, charts, tables etc inside a Jupyter notebook.
'code/Utilites': This contains helper method used in this project.
generated files: It contains the intermediary filed generated for RQ1, RQ2, and RQ3 results.
RQ1 supporting files: TopicModeling.zip
RQ3 supporting files: topic popularity.csv, topic difficulty.csv
environment.yml: This contains the anaconda environment used in this project. Python version 3.8 or higher is required for this project.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Utility		Utility
code		code
dataset		dataset
generated files		generated files
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
TopicModeling.zip		TopicModeling.zip
environment.yaml		environment.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AutoML practitioners Challenges

Low-code (i.e., AutoML) Development

Stack Overflow dataset

Replication Materials to Answer to RQs in the paper:

About

Releases

Packages

Contributors 2

Languages

License

disa-lab/automl-challenge-so

Folders and files

Latest commit

History

Repository files navigation

AutoML practitioners Challenges

Low-code (i.e., AutoML) Development

Stack Overflow dataset

Replication Materials to Answer to RQs in the paper:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages