# Excercises from [pandaspractice.com](https://pandaspractice.com/)

In [54]:
import pandas as pd

df = pd.DataFrame({'name': ['Jeff', 'Esha', 'Jia'], 
                   'age': [30, 56, 8]})
df

Unnamed: 0,name,age
0,Jeff,30
1,Esha,56
2,Jia,8


### Select column

Now, you want to select just the name column from the DataFrame.

Complete the function, `name_column(df)`, by having it return only the name column.

In [2]:
def name_column(df):
    return df['name']

### Select rows by label (.loc)

You can select the rows of a DataFrame one of two ways. One way is with `df.loc[...]`. `df.loc[...]` selects rows based on their index value. To select a single row, you can do `df.loc[index_value]`, for example, `df.loc[156]`. To select multiple rows, you can do `df.loc[[index_value1, index_value2]]`, for example, `df.loc[[132, 156]]`.

Suppose you constructed a DataFrame by

Complete the function, `select_Jia_row(df)`, by having it return the row with Jia in it. This should involve hard-coding the index value associated with Jia.

In [3]:
def select_Jia_row(df):
    return df.loc[27]

### Select rows by position (.iloc)

To select a single row, you can do `df.iloc[position]`, for example, `df.loc[1]` selects the row with Esha below. To select multiple rows, you can do `df.iloc[[position1, position2]]`, for example, `df.loc[[0, 2]]`.

Complete the function, `select_first_row(df)`, by having it return the first row in the DataFrame.

In [4]:
def select_first_row(df):
    return df.iloc[0]

### Aggregate a column

What is the minimum value of age? Define a variable `min_ag`e with the value of `df.age.min()` would output.

In [5]:
min_age = df.age.min()

### Creating a new column

Write a function, `add_can_drive(df)` which takes in the DataFrame and returns a new DataFrame with a *column can_drive which is True if the person is 16 or older and False otherwise*.

In [6]:
def add_can_drive(df):
  df['can_drive'] = df['age'] >= 16
  return df

another variant:

In [7]:
def add_can_drive(df):
  df['can_drive'] = [True if x >= 16 else False for x in df['age']]
  return df

### Modify a column

Write a function, `age_in_days(df)` which takes in the DataFrame and returns a new DataFrame with the column age modified to be in terms of days instead of years.

In [9]:
def age_in_days(df):
  df['age'] = [x * 365 for x in df['age']] 
  return df

In [10]:
age_in_days(df)

Unnamed: 0,name,age
0,Jeff,10950
1,Esha,20440
2,Jia,2920


second variant:

In [16]:
def age_in_days2(df):
  df['age'] *= 365
  return df

In [17]:
age_in_days2(df)

Unnamed: 0,name,age
0,Jeff,10950
1,Esha,20440
2,Jia,2920


## Filter rows

Write a function, `only_names_that_start_with_j(df`) which takes in the DataFrame and returns a new DataFrame with rows of names that start with 'J'. Hint: the pandas `.str` functions would be helpful here.

In [51]:
def only_names_that_start_with_j(df):
  df = df[df.name.str.startswith('J')]
  return df

In [53]:
only_names_that_start_with_j(df)

Unnamed: 0,name,age
0,Jeff,30
2,Jia,8


second variant:

In [58]:
def only_names_that_start_with_j2(df):
  filt1 = df['name'].str.startswith('J')
  df = df.loc[filt1]
  return df

## Select rows by boolean expressions

Complete the function, `select_target_audience(df)`, by having it return the rows with people who are **between 21 and 40** years old (inclusive) and live in **New York or Tokyo**.

In [121]:
df = pd.DataFrame({'name': ['Jeff', 'Esha', 'Jia', 'Hatori', 'Ashley'], 
                   'age': [30, 56, 8, 38, 20],
                   'city': ['New York', 'Atlanta', 'Shanghai', 'Tokyo', 'New York']})

In [122]:
def select_target_audience(df):
    df = df[(df.age >=21) & (df.age <=40) & (df.city == 'New York') | (df.city == 'Tokyo')]
    return df  

In [124]:
select_target_audience(df)

Unnamed: 0,name,age,city
0,Jeff,30,New York
3,Hatori,38,Tokyo


You can make complex boolean (true/false) expressions to select or filter rows in your DataFrame. For example, if our DataFrame had a column age, we could select people who are 16 or older by doing `df[df.age >= 16]`. But suppose we wanted people who are 16 and older and live in New York. Then we could select them by doing `df[(df.age >= 16) & (df.city == 'New York')]`. The & in that expression means "and", | means "or" and ~ means "not". Unfortunately, you can't use the python operators and, or, and not with Pandas objects.

One gotcha about using & (and) and | (or) to make boolean expressions in Pandas is that you need to wrap each piece between a & or | in parenthesis. This is an unfortunate issue due to the way python evaluates operators. If you are interested, you can read more about it here. In either case, remember to wrap each piece in parenthesis!

## Conditional column update

Suppose we realize after collecting a bunch of data that our process incorrectly set the age of people in New York and Atlanta one year less than it was suppose to. Complete the function, correct_age_in_error_cities(df), by having it increment the age of people living in New York or Atlanta by one year.

In [147]:
df = pd.DataFrame({'name': ['Jeff', 'Esha', 'Jia', 'Hatori', 'Ashley'], 
                   'age': [30, 56, 8, 38, 20],
                   'city': ['New York', 'Atlanta', 'Shanghai', 'Tokyo', 'New York']})

In [148]:
def correct_age_in_error_cities(df):
    df.loc[((df['city']=='New York')| (df['city']=='Tokyo')),'age']=df['age']+1
    return df

In [149]:
correct_age_in_error_cities(df)

Unnamed: 0,name,age,city
0,Jeff,31,New York
1,Esha,56,Atlanta
2,Jia,8,Shanghai
3,Hatori,39,Tokyo
4,Ashley,21,New York
