TITULO DEL PROYECTO
===

*Datos del proyecto:*
- Subtitulo
- Descripción
- Tabla de integrantes y roles

## 1. Metodología

> Tipo de metodología a utilizar para el proyecto.

---
**Ejemplo:**

En el campo de aprendizaje de máquina, la comunidad todavía está definiendo un proceso sistemático y estructurado para el cliclo de vida de soluciones basadas en aprendizaje automático. El objetivo es enfocar los procedimientos y estándares de calidad de la ingeniería de software clásica a metodologías de este sub-campo de la inteligencia artificial.

Es en este punto que surge [CRISP-ML(Q)](https://arxiv.org/pdf/2003.05155.pdf), como una metodología que integra las mejores prácticas de la ingeniería de software sobre todo el ciclo de vida de soluciones enfocadas en resolver problemas con aprendizaje de máquina.

CRISP-ML(Q) surge a partir de [CRISP-DM](https://es.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining) como un intento de ampliar dicho "framework" al área de "machine learning".

En palabras de los autores (Stefan Studer et al),  CRISP-ML(Q) propone un modelo de proceso al que llaman modelo de proceso estándar "CRoss-Industry" para el desarrollo de aplicaciones de "Machine Learning" con metodología de aseguramiento de la Calidad, donde resalta su compatibilidad con CRISP-DM. Está diseñado para el desarrollo de aplicaciones de máquina, es decir, escenarios de aplicaciones donde se implementa y mantiene un modelo de ML
como parte de un producto o servicio.

Consta de seis fases:
1. Business and Data Understanding
2. Data Engineering (Data Preparation) 
3. Machine Learning Model Engineering 
4. Quality Assurance for Machine Learning Applications 
5. Deployment 
6. Monitoring and Maintenance. 

![Figura 1: Machine Learning Development Life Cycle Process](.\resources\crisp-ml-process.jpg)

*Figura 1: Machine Learning Development Life Cycle Process* 

En donde cada fase requiere el siguiente proceso:

![Figura 2: Proceso dentro de cada fase](.\resources\crisp-ml-phase.jpg)

*Figura 2: Proceso dentro de cada fase [^1]*

Resumen de tareas de cada fase:
*También se agrega infromación adiccional. Toada esta tabla se puede tomar como una especie de "checklist".*

| CRISP-ML(Q) Phase                | Tasks                                                                                                                                                                             |
| -------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| *Business and Data Understanding*  | • Define the Scope of the ML Application<br>• Sucess criteria<br>• Feasibility<br>• Data collection<br>• Data quality verification<br>• Review of output documents                |
| *Data Engineering*                 | • Select data<br>• Clean data<br>• Construct data<br>• Standarize data                                                                                                            |
| *ML Model Engineering*             | • Modeling<br>• Assure reproducibility                                                                                                                                            |
| *ML Model Evaluation*              | • Validate performance<br>• Determine robustness<br>•Increase explainability for ML practitioner and end user<br>• Compare results with defined success criteria                  |
| *Model Deployment*                 | • Define inference hardware<br>• Model evaluation under production<br>• Assure user acceptance and usability<br>• Minimize the risks of unforseen errors<br>• Deployment strategy |
| *Model Monitoring and Maintenance* | • Monitor<br>• Update                                                                                                                                                             |

Un ejemplo de algunos de los modelos más utilizados se puede observar en la siguiente imagen:

![Figura 3: Machine learning models example](.\resources\Machine-learning-models-overview.png)

*Figura 3: Ejemplo de modelos de aprendizaje automático [^2]*

## 2. Aplicación de metodología
<!-- ## 2. CRISP-ML(Q) -->

---
Ejemplo: CRISP-ML(Q)

## 2. CRISP-ML(Q)

> Cada fase debería aplicar la metodología en cuestión. O sea, incluir un análisis de riesgos (identificar y tratar -mitigar, transferir, etc.-)

### 2.1 Entendimiento del negocio y los datos (Business and Data Understanding)

> Debería tener en cuenta los siguientes puntos:
> - Define the Scope of the ML Application
>     - Business objetives
>         - Business description (business scenario)
>         - ML requirements
>         - Data constrains
>             - Security
>             - Availability
>         - Translate business objetives into ML objetives
>         - Type of analysis
>             - Descriptive analysis
>             - Diagnostic analysis
>             - Predictive analysis
>             - Prescriptive analysis
>         - Use ML Canvas (https://www.ownml.co/machine-learning-canvas)
> - Sucess criteria
>     - Mesurables requirements (check software engineering requirements specification)
>         - Correctness
>         - Completeness
>         - Consistency
>         - Unambiguosness
>         - Ranking for importance and stability
>         - Modificability
>         - Verificability
>         - Traceability
>         - Design independence
>         - Testability
>         - Undenstandable by the customer
>         - Right level ob abstraction
>     - Business sucess criteria
>     - ML sucess criteria
>         - Performance
>         - Robustness
>         - Scalability
>         - Explainability
>         - Model complexity
>         - Resource demand
>     - Economic sucess criteria
>     - Check also system quality atributes
> - Feasibility
>     - Availability, size and quality of the training set
>     - Applicability of ML technology
>     - Legal constraints
>     - Requirements on the application
> - Data collection
>     - Data version control
>     - Data generation process
> - Data quality verification
>     - Data description
>         - Data exploration with domain knownledge
>            - Data attributes and meaning
>            - Duplicated data
>            - Data types
>            - Momentums
>            - Missing values
>            - Outliers
>            - Correlation between data
>            - Data visualization
>         - Data requirements
>             - Missing values
>             - Outliers
>         - Data verification
>             - Avoid low quality data
> - Review of output documents
>     - Re-define tasks and risks

### 2.2 Ingeniería de datos (Data Engineering)

> Debería tener en cuenta los siguientes puntos:
> - Select data
>     - Feature selection (uppon domain kwnonledge)
>         - Filter methods
>         - Wrapper methods
>         - Embedded methods
>     - Data selection
>         - Descarting samples that don't satisfy with quality criteria
>     - Unbalanced classes
>         - Over-sampling and under-sampling
> - Clean data
>     - Noise reduction
>     - Data imputation
>     - Compare the model performance between different imputation techniques
> - Construct data
>     - Feature engineering
>         - Feature transformation (time domain to frecuency domain)
>         - Discretization of continuous fetures
>         - Augmenting the features with additional features
>         - Clustering
>         - Dimensional reduction
>         - Codificar variables nominales
>         - Transformar variables ordinales a numéricas
>         - Data masking if needed
>     - Data augmentation
>         - Ex: Gaussian noise to an image or elastic deformation
> - Standarize data
>     - File format
>     - Normalization

### 2.3 Ingeniería de modelos de aprendizaje automático (ML Model Engineering)

> Debería tener en cuenta los siguientes puntos:
> - Modelado
>     - Literature reserch on similar problems (models baselines)
>     - Define quality measures of the model
>         - Besides a performance metric, also soft measures such as:
>             - Robustness
>             - Explainability
>             - Scalability
>             - Resource demand
>             - Model compexity
>         - Additionallly: model's fairness or trust
>     - Model selection (https://en.wikipedia.org/wiki/Outline_of_machine_learning#Machine_learning_methods)
>     - Incorporate domain knowledge (comparing vs baseline or domain knowledge)
>         - Beware of false bias
>     - Model training
>         - Learining problem: objetive, optimizer, regularization and cross-validation
>         - Specify model metadata (algorithm, training, validation, data testing, hipper-parameters, pre-conditions, model assumptions, environment)
>         - Model asumptions
>     - Using unlabeled data and pre-trained models (transfer learning?)
>     - Model compresion (Model optimization)
>     - Ensemble methods (boosting, baging, mixture of experts)
>     - Add model to pipeline
> - Assure reproducibility
>     - Method reproducibility
>     - Result reproducibility
>     - Experimental documentation

### 2.4 Evaluación de modelos de aprendizaje automático (ML Model Evaluation)

> Debería tener en cuenta los siguientes puntos:
> - Validate performance (test set)
>     - Apply data transformation production pipelines
> - Determine robustness
>     - Noisy data testing
>     - Wrong data testing
>     - Statistically estimate model's local and globarl robustness
>     - Should match model's quality attributes
> - Increase explainability for ML practitioner and end user
>     - Enriching the data set
> - Compare results with defined success criteria
>     - Iterations if success criteria are not met
>     - Make a decision whether to deploy de model
>     - Document the evaluation phase
> - Recomendations
>     - Offline testing
>     - Confussion matrix
>     - Show explainability reports

### 2.5 Despliegue del modelo (Model Deployment)

> Debería tener en cuenta los siguientes puntos:
> - Define inference hardware (based on already defined requirements)
> - Model evaluation under production condition (incrementally increasing production conditions by iteratively runing the evaluation tasks)
> - Assure user acceptance and usability
>     - User guide and disclaimer
> - Minimize the risks of unforseen errors
>     - Fall-back plan (e.g. rollback to a previous version, a pre-defined baseline or a rule-base system)
> - Delpoyment strategy
>     - Incremental deployment strategy that includes a pipeline for models and data
> - Recomendations
>     - Model governance
>     - Deploy according to the selected strategy (A/B testing, multi-armed bandits)
>     - CD/CI
>     - DRP
>     - Use pre-production if available

### 2.6 Monitoreo y mantenimiento del modelo (Model Monitoring and Maintenance)

> Debería tener en cuenta los siguientes puntos:
> - Monitor
>     - Evaluate the staleness of the model (does the model has to be updated?)
> - Update
>     - New data collection plan and re-train
>     - Fine-tune model on new data
> - Recomendations
>     - Monitor the efficiency and efficacy of the model prediction serving
>     - Compare to the previously specified success criteria (thresholds)
>     - Retrain model if required
>     - Collect new data
>     - Perform labelling of the new data points
>     - Repeat tasks from the *Model Engineering* and *Model Evaluation* phases
>     - Continuous, integration, training, and deployment of the model
>     - Check model's degradation (https://en.wikipedia.org/wiki/Lehman%27s_laws_of_software_evolution and https://en.wikipedia.org/wiki/Software_rot)
>         - Non-stationary data distribution
>         - Degradation of hardware
>         - System updates
>     - Plan technical debts payment and reduction
>     - Maintenance plan and emergency plans
>     - Create alarms of all the pipeline. Include data generation differences also
>     - Plan for software substitution or destruction
>     - Do correct loging of new data

## 3. Mejora continua

> Debería tener en cuenta los siguientes puntos:
> - Fases que faltaron implementar o mejorar
> - Mejorar los procesos
> - Mejorar la documentación de los procesos

## 4. Referencias

[^1]: https://ml-ops.org/content/crisp-ml
[^2]: https://www.researchgate.net/publication/369194767_A_Different_Traditional_Approach_for_Automatic_Comparative_Machine_Learning_in_Multimodality_Covid-19_Severity_Recognition

## 5. Apendices