# Activity: Modifying & Replacing Values

## Introduction

In this activity you will practice modifying and replacing values in a DataFrame using the various method that Pandas has to offer.
This activity will cover the following, not necessarily in this order:
- Checking for anomalous values
- Using `.isnumeric()`
- Using `min()` and `max()` methods
- Using `.loc[]` to replace values
- Using `isnull()` and `notnull()` methods


In [1]:
import pandas as pd

#### Question 1

Create a `DataFrame` called `df` from the given CSV file `employee_data.csv`, and then create a mask called `valid_names` that checks the `Name` column for any non-numeric values.


In [2]:
# Your code here

df = pd.read_csv("employee_data.csv")

df

Unnamed: 0,Name,Years of Employment,Weeks of Vacation,Position
0,Jennifer Jackson,9,4.0,Engineer
1,Michael Johnson,9,6.0,Analyst
2,Robert Lee,13,3.0,Engineer
3,Linda Jones,3,6.0,Manager
4,Karen Thomas,14,2.0,Intern
...,...,...,...,...
78,1,0,49.0,Unknown
79,1,0,47.0,Unknown
80,1,-5,46.0,Unknown
81,1,-4,52.0,Unknown


In [3]:
df.Name.unique()

array(['Jennifer Jackson', 'Michael Johnson', 'Robert Lee', 'Linda Jones',
       'Karen Thomas', 'Sarah Smith', 'David Moore', 'Emily Taylor',
       'Mary Jones', 'Mary Anderson', 'Sarah Harris', 'Michael Wilson',
       'John Lee', 'Jennifer Lee', 'Mary Davis', 'David Lee',
       'Emily Wilson', 'Susan Jones', 'John Harris', 'Emily Smith',
       'James Wilson', 'Karen Taylor', 'Sarah Jackson', 'Linda Taylor',
       'Michael Lee', 'Jennifer Johnson', 'Robert Smith',
       'Christopher Davis', 'William Jackson', 'Linda Johnson',
       'William Johnson', 'James Brown', 'Christopher Harris',
       'Jane Davis', 'Emily Miller', 'Sarah Lee', 'James Jackson',
       'David Johnson', 'Jane Jackson', 'Michael Jones',
       'Jennifer Harris', 'James Moore', 'Linda Jackson',
       'Susan Anderson', 'Robert Taylor', 'Jane Jones', 'Emily Brown',
       'Robert Wilson', 'Jane Thomas', 'Linda Thomas', 'Emily Jackson',
       'Robert Jones', 'Michael Miller', 'Emily Johnson', 'Mary Smith',


In [4]:
valid_names = df['Name'].str.isnumeric()
valid_names

0     False
1     False
2     False
3     False
4     False
      ...  
78     True
79     True
80     True
81     True
82     True
Name: Name, Length: 83, dtype: bool

In [5]:
# Question 1 Grading Checks

assert isinstance(df, pd.DataFrame), 'Have you created a DataFrame named df?'
assert isinstance(valid_names, pd.Series), 'Have you created a Series named valid_names?'


#### Question 2

Using the `valid_names` mask from the previous question, replace all values in the `Name` column that are numeric with the string `Unknown`.


In [6]:
# Your code here

df.loc[~valid_names, 'Name'] = 'Unknown'

In [7]:
# Question 2 Grading Checks

assert isinstance(df, pd.DataFrame), 'Have you created a DataFrame named df?'

#### Question 3

Using the original `DataFrame` `df`, create a mask called `unknown_position` that checks the `Position` column for any values that are equal to the string `Unknown`. Then, replace all such values with `Engineer`.


In [8]:
# Your code here

unknown_position = df['Position'] == 'Unknown'

unknown_position

0     False
1     False
2     False
3     False
4     False
      ...  
78     True
79     True
80     True
81     True
82     True
Name: Position, Length: 83, dtype: bool

In [9]:
df.loc[unknown_position, 'Position'] = 'Engineer'

In [10]:
# Question 3 Grading Checks

assert isinstance(unknown_position, pd.Series), 'Have you created a Series named unknown_position?'


#### Question 4

Using the original `DataFrame` `df`, replace all values in the `Years of Employment` column that are negative with their absolute value. Then, check that the minimum value in the `Years of Employment` column is no longer negative and assign it to the variable `min_years_after`.


In [11]:
# Your code here
df["Years of Employment"].value_counts()

 1     7
 15    7
 11    7
 8     7
 16    6
 14    5
 9     5
 6     4
 18    4
 0     3
 19    3
 17    3
 13    3
 2     3
 3     2
-5     2
 10    2
 20    2
 5     2
 4     1
 12    1
 7     1
-3     1
-4     1
-1     1
Name: Years of Employment, dtype: int64

In [12]:
df['Years of Employment'] = df['Years of Employment'].abs()

In [13]:
min_years_after = df['Years of Employment'].min()
min_years_after

0

In [14]:
# Question 4 Grading Checks

assert isinstance(df, pd.DataFrame), 'Have you created a DataFrame named df?'



#### Question 5

Using the original `DataFrame` `df`, create a mask called `invalid_vacation` that checks the `Weeks of Vacation` column for any values that are null or missing. Then, use that mask to assign the value 0 to them.


In [15]:
# Your code here

invalid_vacation = df['Weeks of Vacation'].isnull()

In [16]:
df.loc[invalid_vacation, 'Weeks of Vacation'] = 0

In [17]:
# Question 5 Grading Checks

assert isinstance(invalid_vacation, pd.Series), 'Have you created a Series named invalid_vacation?'



#### Question 6

Using the original `DataFrame` `df`, find the maximum value in the `Weeks of Vacation` column and assign it to the variable `max_vac_before`. Then, replace all values in the `Weeks of Vacation` column that are greater than 6 with 6.


In [18]:
# Your code here

max_vac_before = df['Weeks of Vacation'].max()

In [19]:
df.loc[df['Weeks of Vacation'] > 6, 'Weeks of Vacation'] = 6

In [20]:
# Question 6 Grading Checks

assert isinstance(df, pd.DataFrame), 'Have you created a DataFrame named df?'

