# Reshaping your Data

Since we want

* Each variable as a separate column
* Each row as a separate observation

We would want to reshape a table like:

|Account|Checking|Savings|
|:------|:-------|:------|
|"12456543"|8500|8900|
|"12283942"|6410|8020|
|"12839485"|78000|92000|

Into a table that looks more like:

|Account|Account Type|Amount|
|:------|:-----------|:-----|
|"12456543"|"Checking"|8500|
|"12456543"|"Savings"|8900|
|"12283942"|"Checking"|6410|
|"12283942"|"Savings"|8020|
|"12839485"|"Checking"|78000|
|"12839485"|"Savings"|920000|

We can use `pd.melt()` to do this transformation. `.melt()` takes in a DataFrame, and the columns to unpack:

`pd.melt(frame=df, id_vars="name", value_vars=["Checking","Savings"], value_name="Amount", var_name="Account Type")`

The parameters you provide are:

* `frame`: the DataFrame you want to melt
* `id_vars`: the column(s) of the old DataFrame to preserve
* `value_vars`: the column(s) of the old DataFrame that you want to turn into variables
* `value_name`: what to call the column of the new DataFrame that stores the values
* `var_name`: what to call the column of the new DataFrame that stores the variables

The default names may work in certain situations, but it is best to always have data that is self-explanatory. Thus, we often use `.columns()` to rename the columns after melting:

`df.columns(["Account", "Account Type", "Amount"])`

In [1]:
import pandas as pd
from students import students

print(students)

             full_name gender_age fractions probability       grade
0       Moses Kirckman        M14       69%         89%  11th grade
1      Timofei Strowan        M18       63%         76%  11th grade
2         Silvain Poll        M18       69%         77%   9th grade
3       Lezley Pinxton        M18       NaN         72%  11th grade
4    Bernadene Saunper        F17       72%         84%  11th grade
..                 ...        ...       ...         ...         ...
995     Wilie Stillert        F14       72%         69%   9th grade
996     Gertie Flicker        F15       NaN         86%  11th grade
997       Yettie Labes        F14       81%         82%  12th grade
998     Lock McGuinley        M18       NaN         84%  10th grade
999       Bebe Lebbern        F15       66%         91%  12th grade

[1000 rows x 5 columns]


1. There is a column for the scores on the `fractions` exam, and a column for the scores on the `probabilities` exam.

    We want to make each row an observation, so we want to transform this table to look like:

    |full_name|exam|score|gender_age|grade|
    |:--------|:---|:----|:---------|:----|
    |"First Student"|"Fractions"|score%|…|…|
    |"First Student"|"Probabilities"|score%|…|…|
    |"Second Student"|"Fractions"|score%|…|…|
    |"Second Student"|"Probabilities"|score%|…|…|
    |…|…|…| | |

    Use `pd.melt()` to create a new table (still called `students`) that follows this structure.

In [2]:
students = pd.melt(
  frame=students,
  id_vars=["full_name", "gender_age", "grade"],
  value_vars=["fractions", "probability"],
  value_name="score",
  var_name="exam"
)

students

Unnamed: 0,full_name,gender_age,grade,exam,score
0,Moses Kirckman,M14,11th grade,fractions,69%
1,Timofei Strowan,M18,11th grade,fractions,63%
2,Silvain Poll,M18,9th grade,fractions,69%
3,Lezley Pinxton,M18,11th grade,fractions,
4,Bernadene Saunper,F17,11th grade,fractions,72%
...,...,...,...,...,...
1995,Wilie Stillert,F14,9th grade,probability,69%
1996,Gertie Flicker,F15,11th grade,probability,86%
1997,Yettie Labes,F14,12th grade,probability,82%
1998,Lock McGuinley,M18,10th grade,probability,84%


2. Print out the .value_counts() of the column exam.

In [3]:
students["exam"].value_counts()

probability    1000
fractions      1000
Name: exam, dtype: int64