# 1) Business Understanding
## Business Objectives

*The first objective of the data analyst is to thoroughly understand, from a business perspective, what the
customer really wants to accomplish. Often the customer has many competing objectives and constraints
that must be properly balanced. The analyst’s goal is to uncover important factors, at the beginning, that
can influence the outcome of the project. A possible consequence of neglecting this step is to expend a
great deal of effort producing the right answers to the wrong questions.*

### Background

*Record the information that is known about the organization’s business situation at the beginning of the project.*

Based on measurements using a polar meter, a data set can be compiled that includes the amount of rainfall, the maximum, minimum and average temperatures for each day.

### Business Objectives

*Describe the customer’s primary objective, from a business perspective. In addition to the primary business
objective, there are typically other related business questions that the customer would like to address. For
example, the primary business goal might be to keep current customers by predicting when they are prone
to move to a competitor. Examples of related business questions are “How does the primary channel used
(e.g., ATM, branch visit, Internet) affect whether customers stay or go?” or “Will lower ATM fees significantly
reduce the number of high-value customers who leave?”*

To better understand the weather patterns and make a prediction of the expected high,low and average temperature 1 week into the future.

### Success Criteria

*Describe the criteria for a successful or useful outcome to the project from the business point of view. This
might be quite specific and able to be measured objectively, for example, reduction of customer churn to a
certain level, or it might be general and subjective, such as “give useful insights into the relationships.” In
the latter case, it should be indicated who makes the subjective judgment.*

When we can with 95% accuracy predict what the high,low and average temperature will be for 7days into the future based on historical data.

## Assess Situation
### Inventory of resources

*List the resources available to the project:*
- *Personnel (business experts, data experts, technical
support, data mining experts)*
- *Data (fixed extracts, access to live, warehoused, or operational data)*
- *computing resources (hardware platforms)*
- *software (data mining tools, other relevant software).*

a csv dataset in the form of a text file.
containing the following features:
- stationID
- date
- temperature avg
- temperature high
- temperature low
- amount of rainfall

### Requirements, Assumptions and constraints

*List all requirements of the project, including schedule of completion, comprehensibility and quality of
results, and security, as well as legal issues. As part of this output, make sure that you are allowed to use
the data.
List the assumptions made by the project. These may be assumptions about the data that can be verified
during data mining, but may also include non-verifiable assumptions about the business related to the
project. It is particularly important to list the latter if it will affect the validity of the results.
List the constraints on the project. These may be constraints on the availability of resources, but may also
include technological constraints such as the size of dataset that it is practical to use for modeling.*

Functional requirements:
- a dataset that comprises multiyear readings in order to determine seasonality

Assumptions:
- The dataset provided by KNMI is accurate
- The dataset provided by KNMI is without errors
- The dataset provided by KNMI does not contain null values
- The dataset provided by KNMI has been accurately recorded and that no dates are missing
- temperatures are recorded in degrees celsius
- values are grouped per day

Constraints:
- Data is limited to the dates provided by the KNMI

### Risks and contingencies
*List the risks or events that might delay the project or cause it to fail. List the corresponding contingency
plans, what action will be taken if these risks or events take place.*

### Terminology

*Compile a glossary of terminology relevant to the project. This may include two components:
(1) A glossary of relevant business terminology, which forms part of the business understanding
available to the project. Constructing this glossary is a useful “knowledge elicitation” and
education exercise.
(2) A glossary of data mining terminology, illustrated with examples relevant to the business
problem in question*

rainfall refers to the amount fo rainfall in millimeters
min refers to the minimum temperature
max refers to the maximum temperature
date refers to a single day

### Costs and benefits
*Construct a cost-benefit analysis for the project, which compares the costs of the project with the potential
benefits to the business if it is successful. The comparison should be as specific as possible. For example,
use monetary measures in a commercial situation.*

Costs:
- time
- money

Benefits:
- being able to accurately predict temperatures can lead to better decisions regarding agriculture
- being able to make accurate predictions about the rainfall can help in sustaining a workable model for maintaining dykes.

## Determine Data mining goals

*A business goal states objectives in business terminology. A data mining goal states project objectives in
technical terms. For example, the business goal might be “Increase catalog sales to existing customers.” A
data mining goal might be “Predict how many widgets a customer will buy, given their purchases over the
past three years, demographic information (age, salary, city, etc.), and the price of the item.”*

### Data mining goals

*Describe the intended outputs of the project that enable the achievement of the business objectives.*


- obtain a multi-year data set
- data contains the features: minimum, average, and maximum temperatures inlcuding rainfall per station, per day

### Data mining success criteria

*Define the criteria for a successful outcome to the project in technical terms—for example, a certain level
of predictive accuracy or a propensity-to-purchase profile with a given degree of “lift.” As with business
success criteria, it may be necessary to describe these in subjective terms, in which case the person or
persons making the subjective judgment should be identified.*

- when we have a multi-year data set
- data set is complete and without missing values
- data contains the features: minimum, average, and maximum temperatures inlcuding rainfall per station, per day

## Project plan
### Produce Project plan

*Describe the intended plan for achieving the data mining goals and thereby achieving the business goals.
The plan should specify the steps to be performed during the rest of the project, including the initial
selection of tools and techniques.*

*List the stages to be executed in the project, together with their duration, resources required, inputs,
outputs, and dependencies. Where possible, make explicit the large-scale iterations in the data mining
process—for example, repetitions of the modeling and evaluation phases.
As part of the project plan, it is also important to analyze dependencies between time schedule and risks.
Mark results of these analyses explicitly in the project plan, ideally with actions and recommendations if
the risks are manifested.*


1) Gather data
2) Perform textual exploratory analysis
3) Perform visualized exploratory analysis
4) Create initial model
5) Test effectiveness of the model
6) Repeat step 2-6 until the goal criteria is achieved

### Tools and techniques

*At the end of the first phase, an initial assessment of tools and techniques should be performed. Here, for
example, you select a data mining tool that supports various methods for different stages of the process.
It is important to assess tools and techniques early in the process since the selection of tools and techniques
may influence the entire project.*


<div>
<table class="tg">
<thead>
  <tr>
    <th>Application Domain</th>
    <th>Data Mining Problem Type</th>
    <th>Technical Aspect</th>
    <th>Tool &amp; Technique</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td>Data gathering</td>
    <td>Reading and importing data</td>
    <td>reading csv files</td>
    <td>pandas</td>
  </tr>
  <tr>
    <td>Response Modeling</td>
    <td>Description &amp; Summarization</td>
    <td>Missing Values</td>
    <td>pandas</td>
  </tr>
    <tr>
    <td>Response Modeling</td>
    <td>Description &amp; Summarization</td>
    <td>statistical overview</td>
    <td>pandas</td>
  </tr>
    </tr>
    <tr>
    <td>Response Modeling</td>
    <td>Description &amp; Summarization</td>
    <td>Visualization</td>
    <td>matplotlib</td>
  </tr>
</tbody>
</table>
</div>