Goup 1 #4

ScottyB · 2021-11-17T08:43:04Z

Smells are an indication and not concrete, subjective and romantic. Tie smells to productivity.

Do code smells impact review time?
Definition of code smells, and catalogue.
Assumptions:
Developers know what they are
Developers care at review time
Developers disagree of severity/importance

Evolution of code smells: Which ones can be ignored?
Identify code smells first and then evaluate them.
Tie code smells to:
Reproducibility
Performance
*ilities - definition of stakeholders, hard to make concrete

Commented out code indicative of versioning in addition to version control (Not just DS). Solved through education, workstyle of people -> experimentation.
Reading code that has commented out of code.
Does commented out code impact readability of the source code?

Code smells: personal preference of people who made up these phrases, mere guidelines

Small companies don’t care about code smells (experiential)

Code smells should be avoided (low impact and unreliable)

Data versioning tool -> large datasets, experiments, storage perspective.

Motivation: data storage is a problem for cloud service providers. Redundancy between versions of the data and you shouldn’t be storing all the features.
Store code transformations rather than datasets
Efficient caching

Think about the scale of data storage.
Is the transformed data the challenge with storing versions of the data?

Linting data with diffs applied at the smells level -> in the context of ML, data and code smells are all impactful.

Identify smells that change the behaviour of ML. There is a need to define the definition of an ML smell -> this is different from code smells. Thus, ML smells are a) technical debt, b) actual defect, and c) are concrete.

Guidelines for dealing with technical debt ignores the commercial reality and focuses on the ideal. This ties in with the context idea.

-> Case study of ML in small clients/companies (Related work exists for this???)efficiency is vital. Tool support is key to realise the solutions in organisations.

Is static analysis sufficient for ML smells? Interactive environment: development phase, deployment phase.

CI/CD for ML (Thoughtworks)

Start at the upstream process at the data level rather than modelling.

Conclusions:
Code smells should be avoided (low impact and unreliable)
Focus on the messy upstream process at data collection rather than modelling
There is a need to define the definition of an ML smell
Does commented out code impact readability of the source code?
Can diff algorithms be used for data versioning?
Data quality issues should be easy to validate/verify -> portable as data their own
The barrier between data and code is still valid

rlogothetis01 · 2021-11-17T10:39:57Z

Commented Out Code Impact Readability?
Very precise research
Evidence to support / reject
Conflict of code & different parameters etc.
Look at evolution of projects
Focus on ML & evolution
Find what it depends on
What do engineers think of this - are they interested in outcome

rlogothetis01 · 2021-11-17T10:40:11Z

Code Smells
Subject
ML Smells need to be defined (data or model) - could be an actual defect to be dealt with
Map Code smells to concrete definitions
Empirical study to measure readability → can affect deployment + production
Do ML smells have an impact on the real world?
Could only be a smell bc of user (smell for ML not for statistician - could be education problem)
RQ: how prevalent is this issue in projects?

anjsimmo mentioned this issue Nov 17, 2021

How smells affect the system #3

Open

anjsimmo mentioned this issue Nov 17, 2021

Portable data integrity/invariant checks #8

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Goup 1 #4

Goup 1 #4

ScottyB commented Nov 17, 2021 •

edited

Loading

rlogothetis01 commented Nov 17, 2021

rlogothetis01 commented Nov 17, 2021

Goup 1 #4

Goup 1 #4

Comments

ScottyB commented Nov 17, 2021 • edited Loading

rlogothetis01 commented Nov 17, 2021

rlogothetis01 commented Nov 17, 2021

ScottyB commented Nov 17, 2021 •

edited

Loading