Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Goup 1 #4

Open
ScottyB opened this issue Nov 17, 2021 · 2 comments
Open

Goup 1 #4

ScottyB opened this issue Nov 17, 2021 · 2 comments

Comments

@ScottyB
Copy link
Collaborator

ScottyB commented Nov 17, 2021

Smells are an indication and not concrete, subjective and romantic. Tie smells to productivity.

Do code smells impact review time?
Definition of code smells, and catalogue.
Assumptions:
Developers know what they are
Developers care at review time
Developers disagree of severity/importance

Evolution of code smells: Which ones can be ignored?
Identify code smells first and then evaluate them.
Tie code smells to:
Reproducibility
Performance
*ilities - definition of stakeholders, hard to make concrete

Commented out code indicative of versioning in addition to version control (Not just DS). Solved through education, workstyle of people -> experimentation.
Reading code that has commented out of code.
Does commented out code impact readability of the source code?

Code smells: personal preference of people who made up these phrases, mere guidelines

Small companies don’t care about code smells (experiential)

Code smells should be avoided (low impact and unreliable)

Data versioning tool -> large datasets, experiments, storage perspective.

Motivation: data storage is a problem for cloud service providers. Redundancy between versions of the data and you shouldn’t be storing all the features.
Store code transformations rather than datasets
Efficient caching

Think about the scale of data storage.
Is the transformed data the challenge with storing versions of the data?

Linting data with diffs applied at the smells level -> in the context of ML, data and code smells are all impactful.

Identify smells that change the behaviour of ML. There is a need to define the definition of an ML smell -> this is different from code smells. Thus, ML smells are a) technical debt, b) actual defect, and c) are concrete.

Guidelines for dealing with technical debt ignores the commercial reality and focuses on the ideal. This ties in with the context idea.

-> Case study of ML in small clients/companies (Related work exists for this???)efficiency is vital. Tool support is key to realise the solutions in organisations.

Is static analysis sufficient for ML smells? Interactive environment: development phase, deployment phase.

CI/CD for ML (Thoughtworks)

Start at the upstream process at the data level rather than modelling.

Conclusions:
Code smells should be avoided (low impact and unreliable)
Focus on the messy upstream process at data collection rather than modelling
There is a need to define the definition of an ML smell
Does commented out code impact readability of the source code?
Can diff algorithms be used for data versioning?
Data quality issues should be easy to validate/verify -> portable as data their own
The barrier between data and code is still valid

@rlogothetis01
Copy link
Collaborator

Commented Out Code Impact Readability?
Very precise research
Evidence to support / reject
Conflict of code & different parameters etc.
Look at evolution of projects
Focus on ML & evolution
Find what it depends on
What do engineers think of this - are they interested in outcome

@rlogothetis01
Copy link
Collaborator

Code Smells
Subject
ML Smells need to be defined (data or model) - could be an actual defect to be dealt with
Map Code smells to concrete definitions
Empirical study to measure readability → can affect deployment + production
Do ML smells have an impact on the real world?
Could only be a smell bc of user (smell for ML not for statistician - could be education problem)
RQ: how prevalent is this issue in projects?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants