# 📜 IBM Data Science Professional Certificate  
*Curiosity to Capability — One Notebook at a Time*

---

**Compiled and Authored by:**  
**Partho Sarothi Das**  
Dhaka, Bangladesh  
🎓 Bachelor's & Master's in Statistics  
💼 Investment Banking Professional → Aspiring Data Scientist  

>**Disclaimer:** This notebook is based on content from the [IBM Data Science Professional Certificate](https://www.coursera.org/professional-certificates/ibm-data-science) offered on Coursera. It is intended for personal learning and review purposes.

---
---

# Introduction to CRISP-DM (Cross-Industry Standard Process for Data Mining)

CRISP-DM is a widely used, **structured and iterative methodology** designed to guide data mining projects across various industries. This approach provides a clear framework to tackle data-driven problems and supports effective communication among data scientists, stakeholders, and decision-makers.

### Key Components of CRISP-DM:

#### 1. **Business Understanding**

* The most critical stage.
* Establishes the project's **goals, scope, and success criteria** from a business perspective.
* Involves aligning stakeholders and clarifying intentions, objectives, and potential biases.
* Similar to the initial phase in John Rollins’ data science methodology.

#### 2. **Data Understanding**

* Combines **data requirements, collection, and exploration** into one phase.
* Involves identifying data sources, acquiring data, and performing initial exploration.
* Helps assess data quality and suitability for modeling.

#### 3. **Data Preparation**

* Converts raw data into a **clean and usable format** for modeling.
* Includes **handling missing or ambiguous values**, selecting relevant variables, and forming the modeling dataset.
* Often the most time-consuming stage.

#### 4. **Modeling**

* Core stage of data mining.
* Involves selecting and applying algorithms to discover patterns and structures in the data.
* Includes **parameter tuning and model optimization**.
* Focus is on transforming data into actionable knowledge.

#### 5. **Evaluation**

* Assesses how well the model meets the business objectives.
* Uses test data (previously unseen by the model) to evaluate **accuracy, performance, and relevance**.
* Determines whether the model is ready for deployment or needs refinement.

#### 6. **Deployment**

* Implements the model in a real-world setting.
* Involves integrating the model into business processes or applications.
* Can reveal new data needs or adjustments required for effectiveness.

### Iteration & Feedback:

* CRISP-DM is **cyclical and flexible**—you may return to any previous stage based on insights or stakeholder feedback.
* Although not explicitly named, a **feedback phase** occurs post-deployment, similar to John Rollins’ methodology.
* Final decisions about the project's success depend on a **post-deployment review** with stakeholders.

### Summary Statement:

CRISP-DM supports **data-driven decision making** by providing a repeatable, flexible framework with six core stages:
**Business Understanding → Data Understanding → Data Preparation → Modeling → Evaluation → Deployment**
This process continues iteratively until the model successfully meets the business goals.

---
---

# Final Assignment Overview

Project Submission Requirements

Now it's your turn to demonstrate your understanding of data science methodology.

During this final project you'll complete 3 tasks for a total of 10 points to demonstrate your knowledge of CRISP-DM data methodology.

First, you'll take on both the role of the client and the data scientist to develop a business problem related to one of the following topics:

- Emails
- Hospitals
- Credit Cards


You'll use the business problem you defined to demonstrate your knowledge of the Business Understanding stage.

Then, taking on the role of a data scientist, you'll describe how you would apply data science methodology practices at each of the the listed stages to address the business problem you identified.

You'll enter your answers in the text fields provided online. After you submit your assignment, one of your peers who are is completing this assignment within the same session will grade your final project. You will also grade a peer's assignment.

Please note that this assignment is worth 10% of your final grade.

Note: You can take as many breaks as needed between the exercises.

**Author(s)**   
Patsy Kravitz

### Review what you learned

After completing this course, you learned many facts about data science methodology. Here are 14 key, high-level takeaway facts you’ll want to remember from this course.

- Foundational methodology, a cyclical, iterative data science methodology developed by John Rollins, consists of 10 stages, starting with Business Understanding and ending with Feedback.

- CRISP-DM, an open source data methodology, combines several data-related methodology stages into one stage and omits the Feedback stage resulting in a six-stage data methodology.

- The primary goal of the Business Understanding stage is to understand the business problem and determine the data needed to answer the core business question. 

- During the Analytic Approach stage, you can choose from descriptive diagnostic, predictive, and prescriptive analytic approaches and whether to use machine learning techniques.

- During the Data Requirements stage, scientists identify the correct and necessary data content, formats, and sources needed for the specific analytical approach.

- During the Data Collection stage, expert data scientists revise data requirements and make critical decisions regarding the quantity and quality of data. Data scientists apply descriptive statistics and visualization techniques to thoroughly assess the content, quality, and initial insights gained from the collected data, identify gaps, and determine if new data is needed, or if they should substitute existing data.

- The Data Understanding stage encompasses all activities related to constructing the data set. This stage answers the question of whether the collected data represents the data needed to solve the business problem. Data scientists might use descriptive statistics, predictive statistics, or both.

- Data scientists commonly apply Hurst, univariates, and statistics such as mean, median, minimum, maximum, standard deviation, pairwise correlation, and histograms. 

- During the Data Preparation stage, data scientists must address missing or invalid values, remove duplicates, and validate that the data is properly formatted. Feature engineering and text analysis are key techniques data scientists apply to validate and analyze data during the Data Preparation stage.

- The end goal of the Modeling stage is that the data model answers the business question. During the Modeling stage, data scientists use a training data set. Data scientists test multiple algorithms on the training set data to determine whether the variables are required and whether the data supports answering the business question. The outcome of those models is either descriptive or predictive. 

- The Evaluation stage consists of two phases, the diagnostic measures phase, and the statistical significance phase. Data scientists and others assess the quality of the model and determine if the model answers the initial Business Understanding question or if the data model needs adjustment. 

- During the Deployment stage, data scientists release the data model to a targeted group of stakeholders, including solution owners, marketing staff, application developers, and IT administration., 

- During the Feedback stage, stakeholders and users evaluate the model and contribute feedback to assess the model’s performance. 

- The data model’s value depends on its ability to iterate; that is, how successfully the data model incorporates user feedback.