# PISA Data Exploration
## by Anna Pedroni



From the [PISA Data Visualization Contest](http://www.oecd.org/pisa/pisaproducts/datavisualizationcontest.htm) webpage:
> PISA is a worldwide study developed by the Organisation for Economic Co-operation and Development (OECD) which examines the skills of 15-year-old school students around the world. The study assesses students’ mathematics, science, and reading skills and contains a wealth of information on students’ background, their school and the organisation of education systems. For most countries, the sample is around 5,000 students, but in some countries the number is even higher. In total, the PISA 2012 dataset contains data on 485 490 pupils.

A detailed description of the methodology of the PISA surveys can be found in the [PISA 2012 Technical Report](http://www.oecd.org/pisa/pisaproducts/PISA-2012-technical-report-final.pdf).


## A few points of interest.

### PISA:

- is an age-based survey, assessing 15-year-old students in school in grade 7 or higher. These students are approaching the end of compulsory schooling in most participating countries, and school enrolment at this level is close to universal in almost all OECD countries;
- take a literacy perspective, which focuses on the extent to which students can apply the knowledge and skills they have learned and practised at school when confronted with situations and challenges for which that knowledge may be relevant;
- allows for the assessment of additional cross-curricular competencies [...]. For 2012 a computer-delivered assessment of mathematics and problem solving was added, along with an assessment of financial literacy;
- uses Student Questionnaires to collect information from students on various aspects of their home, family and school background;
- uses School Questionnaires to collect information from schools about various aspects of organisation and educational provision in schools
- uses Parent Questionnaires administered to the parents of the students participating in PISA (in 11 countries for the 2012 survey).

## Focus and Partecipation

PISA 2012, the fifth PISA survey covered reading, mathematics, science, problem solving and financial literacy with a primary focus on mathematics.

It was conducted in 34 OECD countries and 31 partner countries/economies.
All 65 countries/economies completed the paper-based tests, with assessments lasting a total of two hours for each student.

An additional 40 minutes were devoted to the computer-based assessment of
- problem solving, in 44 countries/economies;
- mathematics and reading, in 32 countries/economies;
- financial literacy, in 18 countries/economies.

The full list of participants can be found [here](http://www.oecd.org/pisa/aboutpisa/pisa-2012-participants.htm).

Whether they took part in the additional computer-based assessments or not can be found in the [Technical Report](http://www.oecd.org/pisa/pisaproducts/PISA-2012-technical-report-final.pdf) at pp.23-24.

### Accordingly to the PISA Technical report, using the data from Student, Parent and School Questionnaires, analyses linking contextual information with student achievement could address:


- differences between countries in the relationships between student-level factors (such as gender and socio-economic background) and achievement;
- differences in the relationships between school-level factors and achievement across countries;
- differences in the proportion of variation in achievement between (rather than within) schools, and differences in this value across countries;
- differences between countries in the extent to which schools moderate or increase the effects of individual-level student factors and student achievement;
- differences in education systems and national context that are related to differences in student achievement across countries; and
- through links to PISA 2000, PISA 2003, PISA 2006 and PISA 2009, changes in any or all of these relationships over time.

## Preliminary Wrangling

The links to 2 files where provided with the [Udacity description of the databases](https://video.udacity-data.com/topher/2019/April/5ca78b26_dataset-project-communicate-data-findings/dataset-project-communicate-data-findings.pdf) for the project:

- PISA Data: pisa2012.csv(.zip) -file with the data about the 485 490 pupils
- PISA Data Dictionary: pisa.dict2012.csv - name of the columns and their description

The PISA2012 zip file has been manually unzipped, because I first had a look at it in Excel.

In [1]:
# import all packages and set plots to be embedded inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb

%matplotlib inline

> Load in your dataset and describe its properties through the questions below.
Try and motivate your exploration goals through this section.

### What is the structure of your dataset?

> Your answer here!

### What is/are the main feature(s) of interest in your dataset?

> Your answer here!

### What features in the dataset do you think will help support your investigation into your feature(s) of interest?

> Your answer here!

## Univariate Exploration

> In this section, investigate distributions of individual variables. If
you see unusual points or outliers, take a deeper look to clean things up
and prepare yourself to look at relationships between variables.

> Make sure that, after every plot or related series of plots, that you
include a Markdown cell with comments about what you observed, and what
you plan on investigating next.

### Discuss the distribution(s) of your variable(s) of interest. Were there any unusual points? Did you need to perform any transformations?

> Your answer here!

### Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

> Your answer here!

## Bivariate Exploration

> In this section, investigate relationships between pairs of variables in your
data. Make sure the variables that you cover here have been introduced in some
fashion in the previous section (univariate exploration).

### Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?

> Your answer here!

### Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

> Your answer here!

## Multivariate Exploration

> Create plots of three or more variables to investigate your data even
further. Make sure that your investigations are justified, and follow from
your work in the previous sections.

### Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

> Your answer here!

### Were there any interesting or surprising interactions between features?

> Your answer here!

> At the end of your report, make sure that you export the notebook as an
html file from the `File > Download as... > HTML` menu. Make sure you keep
track of where the exported file goes, so you can put it in the same folder
as this notebook for project submission. Also, make sure you remove all of
the quote-formatted guide notes like this one before you finish your report!