# 1. Business Understanding

### **1.1 Business Objectives**

The objective of this project is to analyze the PISA database, a database in the education sector, and find correlations that explain some student behaviors.

Within a list of suggested questions, we started with the following:

1) Profile of repeating students in PISA. What variables contribute to explaining their performance in mathematics? How has it evolved over the various assessment cycles?
2) What pedagogical strategies developed by teachers seem to promote better reading performance (PISA 2018 - teacher context data)? And what pedagogical strategies, according to the students' perspective, seem to promote better reading performance (PISA 2018 - student context data)? Do teachers and students have the same perceptions?
3) Profile of students in vocational and professional courses in PISA. What variables contribute to explaining their performance in mathematics? How has it evolved over the various assessment cycles?

**Business Success Criteria**

The success of this project will be measured by the following criteria:

1) **Identification of key data Features**: Successfully identify and document the data key features contributing to repeating students and mathematics performance. This includes understanding how these variables have evolved over various assessment cycles.
- **Success Indicator**: A comprehensive report detailing the identified variables and their impact on student performance, reviewed and validated by educational experts.

2) **Evaluation of Pedagogical Strategies**: Analyze and compare the pedagogical strategies developed by teachers and perceived by students that promote better reading performance. Determine if there are any discrepancies between teacher and student perceptions.
- **Success Indicator**: A detailed analysis report highlighting effective pedagogical strategies with recommendations for educators. This report should be peer-reviewed.

3) **Profile Analysis of Vocational and Professional Students**: Develop a detailed profile of students based on their questionnaire answers, identifying variables that contribute to their mathematics performance and tracking their evolution over various assessment cycles.
- **Success Indicator**: A thorough profile report that provides insights into the performance of vocational and professional students, validated by educational stakeholders.

4) **Actionable Insights**:Provide actionable insights and recommendations based on the findings to help improve educational strategies and student performance.
- **Success Indicator**: A set of actionable recommendations presented to educational authorities and stakeholders, with feedback indicating their usefulness and potential for implementation.

5) **Stakeholder Satisfaction**: Ensure that the findings and recommendations are deemed useful and insightful by the stakeholders, including educators, policymakers, academic researchers, data analysts and machine learning developers
- **Success Indicator**: Positive feedback from stakeholders through surveys or interviews, indicating that the project has provided valuable insights and practical recommendations.



### 1.2 Assess Situation  TODO

**Task Assess Situation**

This task involves more detailed fact-finding about all of the resources,
constraints, assumptions, and other factors that should be considered in
determining the data analysis goal and project plan. In the previous task,
your objective is to quickly get to the crux of the situation. Here, you
want to flesh out the details.

**Outputs Inventory of Resources**

List the resources available to the project, including: personnel (business
experts, data experts, technical support, data mining personnel), data
(fixed extracts, access to live warehoused or operational data), computing
resources (hardware platforms), software (data mining tools, other
relevant software).

**Requirements, Assumptions, and Constraints**

List all requirements of the project including schedule of completion,
comprehensibility and quality of results, and security as well as legal
issues. As part of this output, make sure that you are allowed to use the
data.

List the assumptions made by the project. These may be assumptions
about the data which can be checked during data mining, but may also
include non-checkable assumptions about the business upon which the
project rests. It is particularly important to list the latter if they form
conditions on the validity of the results.

List the constraints on the project. These may be constraints on the
availability of resources, but may also include technological constraints
such as the size of data which it is practical to use for modeling.

**Risks and Contingencies**

List the risks, that is events which might occur to delay the project or
cause it to fail. List the corresponding contingency plans; what action
will be taken if the risks happen.

**Terminology**

A glossary of terminology relevant to the project. This may include two
components:
(1) A glossary of relevant business terminology, which forms part of the
business understanding available to the project. Constructing this
glossary is a useful "knowledge elicitation" and education exercise.
(2) A glossary of data mining terminology, illustrated with examples
relevant to the business problem in question.

**Costs and Benefits**

A cost-benefit analysis for the project; compare the costs of the project
with the potential benefit to the business if it is successful. The
comparison should be as specific as possible, for example using
monetary measures in a commercial situation.

### 1.3 Determine Data Mining Goals  TODO

Task Determine Data Mining Goals

A business goal states objectives in business terminology. A data mining
goal states project objectives in technical terms. For example, the
business goal might be "Increase catalog sales to existing customers"
while a data mining goal might be "Predict how many widgets a customer
will buy, given their purchases over the past three years, demographic
information (age, salary, city, etc.), and the price of the item".

**Outputs Data Mining Goals**

Describe the intended outputs of the project which will enable the
achievement of the business objectives.

**Data Mining Success Criteria**

Define the criteria for a successful outcome to the project in technical
terms, for example a certain level of predictive accuracy, or a propensity
to purchase profile with a given degree of "lift". As with business success
criteria, it may be necessary to describe these in subjective terms, in
which case the person or persons making the subjective judgment should
be identified.

### 1.4 Produce Project Plan TODO

**Task Produce Project Plan**

Describe the intended plan for achieving the data mining goals, and
thereby achieving the business goals. The plan should specify the
anticipated set of steps to be performed during the rest of the project
including an initial selection of tools and techniques.

**Outputs Project Plan**

List the stages to be executed in the project, together with duration,
resources required, inputs, outputs and dependencies. Where possible
make explicit the large-scale iterations in the data mining process, for
example repetitions of the modeling and evaluation phases.

As part of the project plan, it is also important to analyse dependencies
between time schedule and risks. Mark results of these analyses explicitly
in the project plan, ideally with actions and recommendations if the risks
appear.

Note, the project plan contains detailed plans for each phase. For
example, decide at this point which evaluation strategy will be used in
the evaluation phase.

The project plan is a dynamic document in the sense that at the end of
each phase a review of progress and achievements is necessary and an
update of the project plan accordingly is recommended. Specific review
points for these reviews are part of the project plan, too.

**Initial Assessment of Tools and Techniques**

At the end of the first phase, the project also performs an initial
assessment of tools and techniques. Here, you select a data mining tool
which supports various methods for different stages of the process, for
example. It is important to assess tools and techniques early in the
process since the selection of tools and techniques possibly influences
the entire project.

### Informações dos dados:

Os dados do PISA correspondem a respostas de múltiplos questionários preenchidos no ano anterior. Os questionários direcionados aos alunos foram medidos em adolescentes de 15 anos.

### Informações técnicas:

1. Certas variáveis nos datasets não estão presentes nos questionários. Estas variáveis são calculadas com base em algumas das perguntas do questionário, e medem alguns índices como por exemplo "Creative Activities outside of school" (CREATOOS), e são calculadas através de WLE (Weighted likelihood estimates)

2. O dataset dos estudantes contém informações de vários questionários: 
- Códigos ST são respostas ao questionário Student (computer ou paper based)
- Códigos FL são respostas ao questionário Financial Literacy
- Códigos IC são respostas ao questionário ICT
- Códigos WB são respostas ao questionário Well-being
- Códigos PA são respostas ao questionário Parent


3. Parâmetro REPEAT - Se for 1 indica que repetiu um ano pelo menos uma vez
