## Data Science Lifecycle

The data science lifecycle provides a clear, structured path for turning raw data into meaningful insights and actionable solutions, exemplified by the CRISP-DM (Cross Industry Standard Process for Data Mining) framework, the industry-proven standard used to manage data science projects across domains.

The lifecycle is a non-linear but cyclical process that frequently revisits earlier steps to refine understanding or outputs. It begins with defining a question and ends with deployment and ongoing monitoring of solutions. CRISP-DM consists of six cyclical phases:

- Business Understanding
- Data Understanding
- Data Preparation
- Modeling
- Evaluation
- Deployment

This structure helps both beginners and professionals ensure their work is systematic and reproducible, maximizing business impact while addressing technical challenges.

#### Business Understanding

Purpose: Understand project objectives, business context, and success criteria.

This phase defines what “success” means for the project, e.g., achieving a certain prediction accuracy or supporting investment decisions.

Clarify goals through documentation and stakeholder discussions.

#### Data Understanding

Purpose: Collect initial data and gain familiarity by examining its structure, missingness, and statistics.

Expected to produce visualizations and reports  describing data quantity, quality, and interesting patterns.

Visualization for understanding distributions and correlations.

#### Data Preparation
Goal: Select, clean, format, and engineer the data for modeling.

Handling missing values (e.g., dropping rows or imputing), feature selection, transformation and encoding might be needed for categorical data.

Documentation of cleaning choices should be maintained for transparency and reproducibility.

#### Modeling
Purpose: Choose and train algorithms to extract patterns and make predictions.

#### Evaluation
Purpose: Quantify model performance, often using Root Mean Squared Error (RMSE) for regression.

Evaluation ensures the model generalizes well and meets business success criteria.

#### Deployment
Goal: Put models into production to generate predictions on new data, maintain and update models over time.

Deployment may include building APIs or dashboards to expose model results to stakeholders.

##### Why CRISP-DM Is Important

- Provides a structured, repeatable roadmap ensuring no key step is overlooked and work is auditable.
- Fosters communication across teams by standardizing terminology, outputs, and project plans.
- Designed to be flexible, allowing feedback loops and iterative refinement, which are common in real projects.
- It applies across industries and datasets, making it an invaluable guide for data scientists and analysts at all levels.

| Phase                | Purpose                               | Key Outputs                          | Python & Tooling Highlights                     |
|-----------------------|---------------------------------------|---------------------------------------|------------------------------------------------|
| Business Understanding | Define objectives & success criteria  | Project goals, risks, constraints     | Documentation, planning                        |
| Data Understanding    | Collect and explore data              | Data description, exploration, quality report | pandas, matplotlib, seaborn            |
| Data Preparation      | Clean, select, transform data         | Clean dataset, engineered features    | pandas (fillna, dropna), sklearn preprocessing |
| Modeling              | Train predictive or descriptive models| Trained model, test design            | sklearn, statsmodels                           |
| Evaluation            | Quantify model performance            | RMSE, accuracy reports                | sklearn.metrics                                |
| Deployment            | Put model in production, monitor, maintain | Deployment plan, saved model artifacts | joblib, Flask/Django (for APIs)              |


In [None]:
# Sources:

# [1](https://www.sv-europe.com/crisp-dm-methodology/)
# [2](https://keithmccormick.com/wp-content/uploads/CRISP-DM%20No%20Brand.pdf)
# [3](https://github.com/RudeusG110/CRISP-DM)
# [4](http://cs.unibo.it/~danilo.montesi/CBD/Beatriz/10.1.1.198.5133.pdf)
# [5](https://www.datascience-pm.com/crisp-dm-2/)
# [6](https://www.ibm.com/docs/it/SS3RA7_18.3.0/pdf/ModelerCRISPDM.pdf)
# [7](https://en.wikipedia.org/wiki/Cross-industry_standard_process_for_data_mining)
# [8](https://www.isaca.org/resources/isaca-journal/issues/2018/volume-1/the-machine-learning-auditcrisp-dm-framework)

EV Sales Volume Forecasting: Applying CRISP-DM
Predict monthly, quarterly, and annual sales volumes of electric vehicles globally and by region to support production planning, supply chain management, and inventory optimization.

Buisiness understanding:  
What can be attained with this idea? Predictions of sales volumes that can be used for planning of production, inventory optimization and supply chain management.
What level of categorization in the results would give meaning to the solution? Monthly, quarterly and annual sale predictions, that too globally and region wise.
What data is required for the above predictions? 
Sales data by country and by region, selling prce of EVs, with incentives for regions, fuel and electricity prices, charging facility details etc.

Data Understanding and Preparation:
Gathering sales data by region, by country for a set time period, along with selling price, incentives and charging facility details.
Summarizing data and analyzing visualizations for trends.
Checking and fixing  missing values and outliers and clean data.
Transforming and formatting data, and checking data quality and repeating the steps, if required.

Modelling, Evaluation and deployment:
Since its time based data, I believe, a linear regression model can work for forecasting EV sales volume too, as in the house prices example. Training the model with the prepared dataset and evaluating model performance with Root mean squared error (RMSE). Different models can also be used other than linear regression models and trained, evaluated, and compared for better results. Planning of deployment, monitoring and maintenance can be done as per business requirements.
 
I think, this is a perfect usecase for applying CRISP-DM framework.  
Strengths: Structured process and the iterative nature of going back and revisiting earlier phases is an adavantage for this specific usecase
Weaknesses: Handling of real time data might be required in this process for accuracy, which could be a problem in applying the framework, as its more meaningful batch style handling
From my first level research, in the prediction of EV sales, both OSEMN and TDSP can complement or extend CRISP-DM framework in different ways. However, TDSP best addresses enterprise level solutions end to end, whereas CRISP-DM and OSEMN serves as good conceptual start points, being ideal for early exploration.