# business understanding
------------

This initial phase focuses on understanding the project objectives and requirements from a business perspective, then converting this knowledge into a data mining problem definition and a preliminary plan designed to achieve the objectives.


## determine business objectives
----------

### task

The first objective of the data analyst is to thoroughly understand, from a business perspective, what the client really wants to accomplish. Often the client has many competing objectives and constraints that must be properly balanced. The analyst's goal is to uncover important factors, at the beginning, that can influence the outcome of the project. A possible consequence of neglecting this step is to expend a great deal of effort producing the right answers to the wrong questions.

### output

#### background

Record the information that is known about the organization's business situation at the beginning of the project.

#### business objectives 

Describe the customer's primary objective, from a business perspective. In addition to the primary business objective, there are typically other related business questions that the customer would like to address. For example, the primary business goal might be to keep current customers by predicting when they are prone to move to a competitor. Examples of related business questions are "How does the primary channel (e.g., ATM, visit branch, internet) a bank customer uses affect whether they stay or go?" or "Will lower ATM fees significantly reduce the number of high-value customers who leave?"

#### business success criteria 

Describe the criteria for a successful or useful outcome to the project from the business point of view. This might be quite specific and able to be measured objectively, such as reduction of customer churn to a certain level or general and subjective such as "give useful insights into the relationships." In the latter case it should be indicated who makes the subjective judgment.


### ANSWER

#### Background
Retail store sells various goods from different countries to customers.

#### Business Objectives
Primary Objective: Increase the income of the store.  
Related Business Question: What is the customer segmentation?

#### Business Success Criteria
The income is increased.

## assess situation
-------------

### task

This task involves more detailed fact-finding about all of the resources,constraints, assumptions and other factors that should be considered in determining the data analysis goal and project plan. In the previous task, your objective is to quickly get to the crux of the situation. Here, you want to flesh out the details.

### output

#### inventory of resources

List the resources available to the project, including: personnel (business experts, data experts, technical support, data mining personnel), data (fixed extracts, access to live warehoused or operational data), computing resources (hardware platforms) and software (data miningtools, other relevant software).

#### requirements, assumptions and constraints

List all requirements of the project including schedule of completion, comprehensibility and quality of results and security as well as legal issues.As part of this output, make sure that you are allowed to use the data. List the assumptions made by the project. 

These may be assumptions about the data that can be checked during data mining, but may also include non-checkable assumptions about the business upon which the project rests. It is particularly important to list the latter if they form conditions on the validity of the results.

List the constraints on the project. These may be constraints on the availability of resources, but may also include technological constraints such as the size of data that it is practical to use for modeling.

#### Risks and contingencies 

List the risks or events that might occur to delay the project or cause it to fail. List the corresponding contingency plans; what action will be taken if the risks happen.

#### Terminology

Compile a glossary of terminology relevant to the project. This may include two components: 
(1) A glossary of relevant business terminology, which forms part of the business understanding available to the project. Constructing this glossary is a useful "knowledge elicitation" and education exercise.
(2) A glossary of data mining terminology, illustrated with examples relevant to the business problem in question.

#### Costs and benefits

Construct a cost-benefit analysis for the project, which compares the costs of the project with the potential benefit to the business if it is successful. The comparison should be as specific as possible, for example using monetary measures in a commercial situation.


### ANSWER

#### Inventory of the Resources
Data mining expert, Online Retail Table (OnlineRetail.xlsx), Google Colab

#### Requirements

Deadline: 26 Jan, 14:20  
Make customer segmentation analysis on the base of the given table in the form of Google Colab.  
Granted to use the table.

#### Assumptions
The Online Retail a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail.The company mainly sells unique all-occasion gifts. Many customers of the company are wholesalers.

The description of each column:

- InvoiceNo: A unique identifier for the invoice. An invoice number shared across rows means that those transactions were performed in a single invoice (multiple purchases).
- StockCode: Identifier for items contained in an invoice.
- Description: Textual description of each of the stock item.
- Quantity: The quantity of the item purchased.
- InvoiceDate: Date of purchase.
- UnitPrice: Value of each item.
- CustomerID: Identifier for customer making the purchase.
- Country: Country of customer

#### Constraints
Individual work, Google Colab

#### Risks and contingencies
No visible risks.

#### Terminology
- Income - money received, especially on a regular basis, for work or through investments.  
- Customer Segmentation - the process of dividing a broad consumer or business market, normally consisting of existing and potential customers, into sub-groups of consumers based on some type of shared characteristics.

#### Costs and Benefits
Cost: 100 000 Rubles for the analysis + 300 000 Rubles for transformations  
Benefit: Increased income 

## determine data mining goals
-----------

### task

A business goal states objectives in business terminology. A data mining goal states project objectives in technical terms. For example, the business goal might be "Increase catalog sales to existing customers." A data mining goal might be "Predict how many widgets a customer will buy, given their purchases over the past three years, demographic information (age, salary, city, etc.) and the price of the item."

### output

#### data mining goals

Describe the intended outputs of the project that enables the achievement of the business objectives.

#### data mining success criteria

Define the criteria for a successful outcome to the project in technical terms, for example a certain level of predictive accuracy or a propensity to purchase profile with a given degree of "lift." As with business success criteria, it may be necessary to describe these in subjective terms, in which case the person or persons making the subjective judgment should be identified.


### ANSWER

#### Data Mining Goals
The most obvious method to perform customer segmentation is using unsupervised Machine Learning methods like clustering. The method is as simple as collecting as much data about the customers as possible in the form of features or attributes and then finding out the different clusters that can be obtained from that data. Finally, we can find traits of customer segments by analyzing the characteristics of the clusters.

#### Data Mining Success Criteria
Segmentation results are obtained and the decisions for transformations could be made.

## produce project plan
----------------

### task

Describe the intended plan for achieving the data mining goals and thereby achieving the business goals. The plan should specify the anticipated set of steps to be performed during the rest of the project including an initial selection of tools and techniques.

### output

#### project plan

List the stages to be executed in the project, together with duration, resources required, inputs, outputs and dependencies. Where possible make explicit the large-scale iterations in the data mining process, for example repetitions of the modeling and evaluation phases. As part of the project plan, it is also important to analyze dependencies between time schedule and risks. Mark results of these analyses explicitly in the project plan, ideally with actions and recommendations if the risks appear.

Note: the project plan contains detailed plans for each phase. For example, decide at this point which evaluation strategy will be used in the evaluation phase. The project plan is a dynamic document in the sense that at the end of each phase a review of progress and achievements is necessary and an update of the project plan accordingly is recommended. Specific review points for these reviews are part of the project plan, too.

#### initial assessment of tools and techniques

At the end of the first phase, the project also performs an initial assessment of tools and techniques. Here, you select a data mining tool that supports various methods for different stages of the process, for example. It is important to assess tools and techniques early in the process since the selection of tools and techniques possibly influences the entire project.


### ANSWER

#### Project Plan

1. Business Understanding  
This initial phase focuses on understanding the project objectives and requirements from a business perspective, then converting this knowledge into a data mining problem definition and a preliminary plan designed to achieve the objectives.  
duration: 1 day  
inputs: task background  
outputs: business objectives, situation assessment, data mining goals, project plan  
dependencies: problem background
2. Data Understanding  
The data understanding phase starts with an initial data collection and proceeds with activities in order to get familiar with the data, to identify data quality problems, to discover first insights into the data or to detect interesting subsets to form hypotheses for hidden information. Here we will use the .xlsx table and make profiling.  
duration: 1 day  
inputs: related data  
outputs: data description, findings and hypotheses about the data, data quality verification 
3. Data Preparation  
The data preparation phase covers all activities to construct the final dataset (data that will be fed into the modeling tool(s)) from the initial raw data. Tasks include table, record and attribute selection as well as transformation and cleaning of data for modeling tools. Here we clean the data and derive new attributes.  
duration: 1 day  
inputs: related data, indings and hypotheses about the data  
ouputs: cleaned data, derived data, formatted data
4. Modelling  
In this phase, various modeling techniques are selected and applied and their parameters are calibrated to optimal values. We are going to use K-means model to perform customer clustering.    
duration: 1 day  
inputs: prepared data  
outputs: test design, built model, model assessment 
5. Evaluation  
A key objective is to determine if there is some important business issue that has not been sufficiently considered. At the end of this phase, a decision on the use of the data mining results should be reached.  
duration: 1 day
inputs: prepared data, model  
outpurs: model's results evaluation, process review, decision about the next steps

## note/questions
-------------

### determine business objectives

### assess situation

### determine data mining goals

### produce project plan
