# Modifying data within data frames

Here we're going to focus more on modifying the data frame itself. Updating rows and columns essentially.

In [5]:
'''
+ Ex. 1: 
'''

import pandas as pd
users = {
  'first': ["Kevin", "Abby", "John"],
  'last': ["Nguyen", "Wendell", "Nguyen"],
  'email': ["knguyen44@gmail.com", "abbyWendel@outlook.com", "JNguyen@gmail.com"]
}
df = pd.DataFrame(users)

# Updating a row's data
df.loc[1, ['last','email']] =  ["Wendell", "abbyWendell@outlook.com"] 

# Everyone with the last name 'Nguyen' will have their first names updated to 'Aleister'. However we didn't reassign things so 
# while expression does return a data frame, it doesn't update our original one.
myFilter = (df["last"] == "Nguyen")
df.loc[myFilter, "first"] = "Aleister"

'''
+ Ex. 2: Changing all values of a given column. However this is a common technique, and pandas 
has created functions for these types of operations
'''

# Updates our data frame such that each email is now lowercased 
df["email"] = df["email"].str.lower();


      first     last                    email
0  Aleister   Nguyen      knguyen44@gmail.com
1      Abby  Wendell  abbyWendell@outlook.com
2  Aleister   Nguyen        JNguyen@gmail.com


## Apply, Map, and ApplyMap

### Apply
Applies a function along an axis (rows or columns) of a data frame or series. If working with a series, it applies the function to each value of the series. But if you use it on a data-frame it'll analyze the series objects that make up the data frame, rather than 
their value .

### Map
Applies a function or 'mapping' to each element of a series (only a series). Substitutes each values in a series with another value using a custom dictionary.

### Apply map
Applies a function to each element in the data frame. This only works on data frames.

### Replace
Given a series 

In [6]:
import pandas as pd

df = pd.DataFrame({
  'first': ["Kevin", "Abby", "John"],
  'last': ["Nguyen", "Wendel", "Nguyen"],
  'email': ["knguyen44@gmail.com", "abbyWendel@outlook.com", "JNguyen@gmail.com"]
})


def updateEmail(email):
  return email.upper()

'''
- Ex.1: Apply
1. Here it's using the 'len()' function on each value in 
the 'email' column/series, and it's returning a series with the lengths of those emails.

2. In this one, we're applying the 'updateEmail' function on every value in the 'email' series/column.
Now we'd get back a series with those upper-case emails. We're going to save those changes to our data frame 
by reassigning the 'email' column.

3. Finally we used a lambda function to get a series of the lower-cased emails. 

4. Using .apply with a data frame. So it will run a function on each row or column of the data frame, rather 
than the values of each row or column. Here it applies the len() function to each column and it'll tell us that 
each column has a length of 3, the number of rows in each column.

5. It's applying the 'len()' on the rows now. Which is still 3 on all because all rows have the same amount of columns.

6. Let's do something cool and get the minimum value from each column

7. Here we're trying to find the 'minimum' value in each column. So minimum value in the first, last, and email column. Since 
we're working with strings, it calculates ascii values, but this defin
'''
df["email"].apply(len)
df['email'] = df["email"].apply(updateEmail)
df["email"] = df["email"].apply(lambda email: email.lower())


df.apply(len); # len[df["first"], len(df[]"last") ]len(df["email"]) is what it's doing on each series ('vertical' series)

df.apply(len, axis="columns") # now every row (horizontal) series 

df.apply

df.apply(pd.Series.min);

SyntaxError: expected ':' (3785808145.py, line 3)

In [None]:
import pandas as pd
'''
+ Ex.2 

1. Using applymap on a data frame. It'll return a data frame containing the result of 
the callback function!

2. Searches the 'first' series and looks for the names 'Corey' and 'Jane' and replaces them with their 
appropriate counterparts. If the value of a value in the sequence wasn't 'Corey' or 'Jane' then it was replaced 
with NaN. Now let's say you didn't want to make the non-matches NaN? Use the replace method. Again this returns a 
series with the results, and to save these changes we'd have to do reassignment 
'''

df = pd.DataFrame({
  'first': ["Kevin", "Abby", "John"],
  'last': ["Nguyen", "Wendel", "Nguyen"],
  'email': ["knguyen44@gmail.com", "abbyWendel@outlook.com", "JNguyen@gmail.com"]
})


df.applymap(len)

df["first"].map({'Corey': 'Christ', 'Jane': "Mary"})
df['first'] = df["first"].replace({'Corey': 'Christ', 'Jane': "Mary"})

In [None]:
'''
+ Back to stack overflow data

1. Rename 'ConvertedComp' to 'SalaryUSD'. After seeing that it's correctly changed we 
can set inplace=true to save changes to the original data frame. Always a good idea to check 
that the change was good before we solidify those changes.

2. 'Hobbyist' column was 'Yes' and 'No' values. Let's turn these all into boolean values, which we can 
   easily do with map or replace. Here we use map, so any key that isn't defined in our dictionary willl
   be converted to NaN. In the hypothetical case where you wanted to leave values that weren't caught in your 
   dictionary, you'd use replace.
'''
import pandas as pd
csvPath = "../data/survey_results_public.csv"
df = pd.read_csv(csvPath, index_col="Respondent")

# 1
df.rename(columns={"ConvertedComp": "SalaryUSD"}, inplace=True)

# 2 
df["Hobbyist"] = df["Hobbyist"].map({"Yes": True, "No": False})




