# CRISP-DM

Cross-industry standard process for data mining. Methodologies like CRISP-DM help us to organize the ML project in a way that is manageable (what needs to happen in which order).

### CRISP-DM Process

CRISP-DM is an iterative process with 6 steps:

1. **Business Understanding**
   - Identify the business problem
   - Determine available data sources
   - Specify requirements, assumptions, and conditions
   - Clarify risks and uncertainties
   - Assess the importance of the problem
   - Understand potential solutions
   - Define success metrics for the project (Cost-Benefit Analysis)
   - Evaluate if machine learning is the appropriate solution

2. **Data Understanding**
   - Analyze available data sources
   - Collect and examine data
   - Identify any missing or incomplete data
   - Assess the quality, reliability, and sufficiency of the data
   - Determine if additional data is needed

3. **Data Preparation (Feature Engineering)**
   - Transform the data to make it suitable for a machine learning algorithm
   - Typically involves extracting relevant features
   - Clean the data by removing noise and inconsistencies
   - Build data pipelines to transform raw data into clean data
   - Convert data into a tabular form suitable for machine learning models

4. **Modeling**
   - Train the machine learning model
   - Experiment with different models (e.g., Logistic Regression, Decision Trees, Neural Networks)
   - Select appropriate model parameters
   - Optimize the model's performance
   - Select the best performing model
   - Iterate on data preparation if necessary (e.g., add new features, address data issues)
   - Remember: model quality depends heavily on data quality (Garbage In, Garbage Out!)

5. **Evaluation**
   - Assess how well the model solves the business problem
   - Determine if the model meets the desired performance
   - Evaluate if the project goals are achieved
   - Analyze improvement metrics
     - Example goal: Reduce spam by 50%
     - Measure the actual reduction
     - Evaluate performance on a test set
   - Conduct a retrospective analysis:
     - Was the goal realistic?
     - Did we measure the right outcomes?
   - Based on the evaluation, decide on the next steps:
     - Adjust the goal if necessary
     - Deploy the model to more users or all users
     - Conclude the project if objectives are met

6. **Evaluation and Deployment (Often Concurrent)**
   - Perform online evaluation with live users
     - Deploy the model and assess its performance
   - Deployment (Engineering Practices)
     - After initial evaluation, deploy the model to production for all users
     - Ensure proper monitoring for performance and reliability
     - Maintain quality and scalability
   - Include final reporting and documentation as in project management

### Iteration!
- Machine learning projects require multiple iterations.
- Post-deployment, revisit the business understanding to identify further improvements or enhancements for the model.

### General Advice
- Start with a simple model
- Learn from feedback and results
- Gradually enhance the model complexity based on insights and feedback
