# Extract a function

While developing a model to predict college graduations, you wrote the code below to get the z-scores of students' yearly GPAs (z-scores indicate standard deviation from the mean). Now you're ready to turn it into a production-quality system, so you need to do something about the repetition. Writing a function to calculate z-scores would improve it.

```python
# Standardize the GPAs for each year
df['y1_z'] = (df.y1_gpa - df.y1_gpa.mean()) / df.y1_gpa.std()
df['y2_z'] = (df.y2_gpa - df.y2_gpa.mean()) / df.y2_gpa.std()
df['y3_z'] = (df.y3_gpa - df.y3_gpa.mean()) / df.y3_gpa.std()
df['y4_z'] = (df.y4_gpa - df.y4_gpa.mean()) / df.y4_gpa.std()
```

Note: df is a pandas DataFrame where each row is a student with 4 columns of yearly student GPAs: `y1_gpa`, `y2_gpa`, `y3_gpa`, `y4_gpa`.

* Finish the function so that it returns the z-scores of a column.
* Use the function to calculate the z-scores for each year (`df['y1_z']`, `df['y2_z']`, etc.) from the raw GPA scores (`df.y1_gpa`, `df.y2_gpa`, etc.).

In [14]:
# Import pandas
import pandas as pd

# Load CSV into the rides variable
df = pd.read_csv('z://Students.csv',sep=';')
gpa_columns = ['y1_gpa', 'y2_gpa', 'y3_gpa', 'y4_gpa']
for col in gpa_columns:
    df[col] = df[col].str.replace(',', '.').astype(float)
df.head()

Unnamed: 0,y1_gpa,y2_gpa,y3_gpa,y4_gpa
0,4.907,1.872,1.344,3.255
1,3.827,4.855,2.101,3.067
2,1.508,2.984,1.625,1.009
3,0.865,1.761,1.138,0.622
4,4.122,4.84,4.074,0.63


In [15]:
def standardize(column):
  """Standardize the values in a column.

  Args:
    column (pandas Series): The data to standardize.

  Returns:
    pandas Series: the values as z-scores
  """
  # Finish the function so that it returns the z-scores
  z_score = (df[column] - df[column].mean()) / df[column].std()
  return z_score

# Use the standardize() function to calculate the z-scores
df['y1_z'] = standardize('y1_gpa')
df['y2_z'] = standardize('y2_gpa')
df['y3_z'] = standardize('y3_gpa')
df['y4_z'] = standardize('y4_gpa')

In [16]:
print(df['y1_z'], " ",df['y2_z'], " ",df['y3_z'], " ",df['y4_z'])

0     1.430649
1     0.715963
2    -0.818626
3    -1.244129
4     0.911178
        ...   
95   -1.209056
96    1.022351
97    0.573026
98    0.071422
99   -1.488313
Name: y1_z, Length: 100, dtype: float64   0    -0.546037
1     1.488244
2     0.212300
3    -0.621734
4     1.478014
        ...   
95   -1.255273
96    0.628977
97   -1.050004
98   -0.020247
99   -0.057755
Name: y2_z, Length: 100, dtype: float64   0    -0.840052
1    -0.317354
2    -0.646026
3    -0.982292
4     1.044975
        ...   
95    0.975926
96   -0.623240
97    1.528315
98   -0.320116
99   -1.293701
Name: y3_z, Length: 100, dtype: float64   0     0.479704
1     0.355370
2    -1.005690
3    -1.261633
4    -1.256342
        ...   
95   -0.235877
96    1.131135
97   -0.137997
98   -1.570484
99   -1.370094
Name: y4_z, Length: 100, dtype: float64
