# Activity: Modifying & Replacing Values

## Introduction

In this activity you will practice modifying and replacing values in a DataFrame using the various method that Pandas has to offer.
This activity will cover the following, not necessarily in this order:
- Checking for anomalous values
- Using `.isnumeric()`
- Using `min()` and `max()` methods
- Using `.loc[]` to replace values
- Using `isnull()` and `notnull()` methods


In [21]:
import pandas as pd

#### Question 1

Create a `DataFrame` called `df` from the given CSV file `employee_data.csv`, and then create a mask called `valid_names` that checks the `Name` column for any non-numeric values.


In [22]:
# Your code here
df = pd.read_csv('employee_data.csv')
valid_names = ~df.Name.str.isnumeric()
df[valid_names]

Unnamed: 0,Name,Years of Employment,Weeks of Vacation,Position
0,Jennifer Jackson,9,4.0,Engineer
1,Michael Johnson,9,6.0,Analyst
2,Robert Lee,13,3.0,Engineer
3,Linda Jones,3,6.0,Manager
4,Karen Thomas,14,2.0,Intern
...,...,...,...,...
70,Michael Jones,6,5.0,Manager
71,John White,1,3.0,Analyst
72,Jennifer Harris,1,4.0,Intern
73,Emily White,16,6.0,Intern


In [23]:
# Question 1 Grading Checks

assert isinstance(df, pd.DataFrame), 'Have you created a DataFrame named df?'
assert isinstance(valid_names, pd.Series), 'Have you created a Series named valid_names?'


#### Question 2

Using the `valid_names` mask from the previous question, replace all values in the `Name` column that are numeric with the string `Unknown`.


In [24]:
# Your code here
df.loc[~valid_names,'Name'] = "Unknown"
df

Unnamed: 0,Name,Years of Employment,Weeks of Vacation,Position
0,Jennifer Jackson,9,4.0,Engineer
1,Michael Johnson,9,6.0,Analyst
2,Robert Lee,13,3.0,Engineer
3,Linda Jones,3,6.0,Manager
4,Karen Thomas,14,2.0,Intern
...,...,...,...,...
78,Unknown,0,49.0,Unknown
79,Unknown,0,47.0,Unknown
80,Unknown,-5,46.0,Unknown
81,Unknown,-4,52.0,Unknown


In [25]:
# Question 2 Grading Checks

assert isinstance(df, pd.DataFrame), 'Have you created a DataFrame named df?'

#### Question 3

Using the original `DataFrame` `df`, create a mask called `unknown_position` that checks the `Position` column for any values that are equal to the string `Unknown`. Then, replace all such values with `Engineer`.


In [26]:
# Your code here
unknown_position = df.loc[:,'Position'] == "Unknown"
df.loc[unknown_position,'Position'] = "Engineer"
df

Unnamed: 0,Name,Years of Employment,Weeks of Vacation,Position
0,Jennifer Jackson,9,4.0,Engineer
1,Michael Johnson,9,6.0,Analyst
2,Robert Lee,13,3.0,Engineer
3,Linda Jones,3,6.0,Manager
4,Karen Thomas,14,2.0,Intern
...,...,...,...,...
78,Unknown,0,49.0,Engineer
79,Unknown,0,47.0,Engineer
80,Unknown,-5,46.0,Engineer
81,Unknown,-4,52.0,Engineer


In [27]:
# Question 3 Grading Checks

assert isinstance(unknown_position, pd.Series), 'Have you created a Series named unknown_position?'


#### Question 4

Using the original `DataFrame` `df`, replace all values in the `Years of Employment` column that are negative with their absolute value. Then, check that the minimum value in the `Years of Employment` column is no longer negative and assign it to the variable `min_years_after`.


In [47]:
# Your code here
neg_yrs = df['Years of Employment'] < 0
#neg_yrs
df.loc[neg_yrs,'Years of Employment'] = df[neg_yrs] * -1
min_years_after = df['Years of Employment'].min()
min_years_after
# fg = df[:20]
# fg_mask = fg['Weeks of Vacation'] > 5
# fg.loc[fg_mask,'Weeks of Vacation'] = "Resigned"
# fg

0

In [48]:
# Question 4 Grading Checks

assert isinstance(df, pd.DataFrame), 'Have you created a DataFrame named df?'



#### Question 5

Using the original `DataFrame` `df`, create a mask called `invalid_vacation` that checks the `Weeks of Vacation` column for any values that are null or missing. Then, use that mask to assign the value 0 to them.


In [44]:
# Your code here
invalid_vacation = df['Weeks of Vacation'].isna()
df.loc[invalid_vacation,'Weeks of Vacation'] = 0


In [45]:
# Question 5 Grading Checks

assert isinstance(invalid_vacation, pd.Series), 'Have you created a Series named invalid_vacation?'



#### Question 6

Using the original `DataFrame` `df`, find the maximum value in the `Weeks of Vacation` column and assign it to the variable `max_vac_before`. Then, replace all values in the `Weeks of Vacation` column that are greater than 6 with 6.


In [50]:
max_vac_before =df['Weeks of Vacation'].max()
max_vac_before
gr8_than_6 = df['Weeks of Vacation'] > 6
df.loc[gr8_than_6,'Weeks of Vacation'] = 6

Unnamed: 0,Name,Years of Employment,Weeks of Vacation,Position


In [51]:
# Question 6 Grading Checks

assert isinstance(df, pd.DataFrame), 'Have you created a DataFrame named df?'

