# **Lab 4: Transforming data: Into to dplyr**

## **dplyr verbs(functions)**
dplyr utilities handle the vast majority of your data manipulation needs:

*   filter() - for picking observations by their values,
*   select() - for picking variables by their names,
*   arrange() - for reorder the rows,
*   mutate() - for creating new variables with functions on existing variables,
*   summarise() - for collapse many values down to a single summary.




## **The structure of dplyr functions**

All verbs work similarly:


*   The first argument is a tibble (or data frame)
*   The subsequent ones describe what to do, using the variable names
*   The result is a new tibble



## **The movie industry dataset**

`movies.csv` contains information on last three decades of movies.
The data has been scraped from the IMDb website and can be accessed from a [github repo](https://raw.githubusercontent.com/Juanets/movie-stats/master/movies.csv).

In [1]:
library (tidyverse)

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.0 ──

[32m✔[39m [34mggplot2[39m 3.2.1     [32m✔[39m [34mpurrr  [39m 0.3.3
[32m✔[39m [34mtibble [39m 2.1.3     [32m✔[39m [34mdplyr  [39m 0.8.3
[32m✔[39m [34mtidyr  [39m 1.0.2     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 1.3.1     [32m✔[39m [34mforcats[39m 0.4.0

── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()



In [2]:
url <- "https://raw.githubusercontent.com/Juanets/movie-stats/master/movies.csv"
movies <- read_csv(url)
locale <- Sys.setlocale(category = "LC_ALL", locale = "C")
movies

Parsed with column specification:
cols(
  budget = [32mcol_double()[39m,
  company = [31mcol_character()[39m,
  country = [31mcol_character()[39m,
  director = [31mcol_character()[39m,
  genre = [31mcol_character()[39m,
  gross = [32mcol_double()[39m,
  name = [31mcol_character()[39m,
  rating = [31mcol_character()[39m,
  released = [31mcol_character()[39m,
  runtime = [32mcol_double()[39m,
  score = [32mcol_double()[39m,
  star = [31mcol_character()[39m,
  votes = [32mcol_double()[39m,
  writer = [31mcol_character()[39m,
  year = [32mcol_double()[39m
)



budget,company,country,director,genre,gross,name,rating,released,runtime,score,star,votes,writer,year
<dbl>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<chr>,<dbl>,<chr>,<dbl>
8000000,Columbia Pictures Corporation,USA,Rob Reiner,Adventure,52287414,Stand by Me,R,1986-08-22,89,8.1,Wil Wheaton,299174,Stephen King,1986
6000000,Paramount Pictures,USA,John Hughes,Comedy,70136369,Ferris Bueller's Day Off,PG-13,1986-06-11,103,7.8,Matthew Broderick,264740,John Hughes,1986
15000000,Paramount Pictures,USA,Tony Scott,Action,179800601,Top Gun,PG,1986-05-16,110,6.9,Tom Cruise,236909,Jim Cash,1986
18500000,Twentieth Century Fox Film Corporation,USA,James Cameron,Action,85160248,Aliens,R,1986-07-18,137,8.4,Sigourney Weaver,540152,James Cameron,1986
9000000,Walt Disney Pictures,USA,Randal Kleiser,Adventure,18564613,Flight of the Navigator,PG,1986-08-01,90,6.9,Joey Cramer,36636,Mark H. Baker,1986
6000000,Hemdale,UK,Oliver Stone,Drama,138530565,Platoon,R,1987-02-06,120,8.1,Charlie Sheen,317585,Oliver Stone,1986
25000000,Henson Associates (HA),UK,Jim Henson,Adventure,12729917,Labyrinth,PG,1986-06-27,101,7.4,David Bowie,102879,Dennis Lee,1986
6000000,De Laurentiis Entertainment Group (DEG),USA,David Lynch,Drama,8551228,Blue Velvet,R,1986-10-23,120,7.8,Isabella Rossellini,146768,David Lynch,1986
9000000,Paramount Pictures,USA,Howard Deutch,Comedy,40471663,Pretty in Pink,PG-13,1986-02-28,96,6.8,Molly Ringwald,60565,John Hughes,1986
15000000,SLM Production Group,USA,David Cronenberg,Drama,40456565,The Fly,R,1986-08-15,96,7.5,Jeff Goldblum,129698,George Langelaan,1986


## **filter(): retain rows matching a criteria**

filter() allows you to subset observations based on their values.

In [3]:
# note: both comma and "&" represent AND condition
filter(movies, genre == "Comedy", director == "Woody Allen")

budget,company,country,director,genre,gross,name,rating,released,runtime,score,star,votes,writer,year
<dbl>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<chr>,<dbl>,<chr>,<dbl>
6400000,Orion Pictures,USA,Woody Allen,Comedy,40084041,Hannah and Her Sisters,PG-13,1986-03-14,107,8.0,Mia Farrow,56988,Woody Allen,1986
16000000,Orion Pictures,USA,Woody Allen,Comedy,14792779,Radio Days,PG,1987-01-30,88,7.6,Mia Farrow,26467,Woody Allen,1987
19000000,Jack Rollins & Charles H. Joffe Productions,USA,Woody Allen,Comedy,18254702,Crimes and Misdemeanors,PG-13,1989-11-03,104,8.0,Martin Landau,46269,Woody Allen,1989
15000000,Touchstone Pictures,USA,Woody Allen,Comedy,10763469,New York Stories,PG,1989-03-10,124,6.4,Woody Allen,14652,Richard Price,1989
12000000,Orion Pictures,USA,Woody Allen,Comedy,7331647,Alice,PG-13,1991-01-10,106,6.6,Mia Farrow,11331,Woody Allen,1990
14000000,Orion Pictures,USA,Woody Allen,Comedy,2735731,Shadows and Fog,PG-13,1992-03-20,85,6.8,Woody Allen,14024,Woody Allen,1991
20000000,TriStar Pictures,USA,Woody Allen,Comedy,10555619,Husbands and Wives,R,1992-09-18,103,7.6,Woody Allen,23108,Woody Allen,1992
13500000,TriStar Pictures,USA,Woody Allen,Comedy,11285588,Manhattan Murder Mystery,PG,1993-08-18,104,7.4,Woody Allen,30925,Woody Allen,1993
20000000,Miramax,USA,Woody Allen,Comedy,13383737,Bullets Over Broadway,R,1995-02-24,98,7.5,John Cusack,31237,Woody Allen,1994
15000000,Sweetland Films,USA,Woody Allen,Comedy,6700000,Mighty Aphrodite,R,1995-11-10,95,7.1,Woody Allen,33697,Woody Allen,1995


Package dplyr executes the filtering and returns a new data frame. It never modifies the original one.

## **Logical operators**

In [4]:
# Using AND operator
filter(movies, country == "USA", budget > 2.5e8)
# same as filter(movies, country == "USA" & budget > 2.5e8)

budget,company,country,director,genre,gross,name,rating,released,runtime,score,star,votes,writer,year
<dbl>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<chr>,<dbl>,<chr>,<dbl>
300000000.0,Walt Disney Pictures,USA,Gore Verbinski,Action,309420425,Pirates of the Caribbean: At World's End,PG-13,2007-05-25,169,7.1,Johnny Depp,514191,Ted Elliott,2007
258000000.0,Columbia Pictures,USA,Sam Raimi,Action,336530303,Spider-Man 3,PG-13,2007-05-04,139,6.2,Tobey Maguire,416842,Sam Raimi,2007
260000000.0,Walt Disney Animation Studios,USA,Nathan Greno,Animation,200821936,Tangled,PG,2010-11-24,100,7.8,Mandy Moore,325621,Dan Fogelman,2010


In [5]:
# Using OR operator
filter(movies, country == "USA" | budget > 2.5e8)

budget,company,country,director,genre,gross,name,rating,released,runtime,score,star,votes,writer,year
<dbl>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<chr>,<dbl>,<chr>,<dbl>
8000000,Columbia Pictures Corporation,USA,Rob Reiner,Adventure,52287414,Stand by Me,R,1986-08-22,89,8.1,Wil Wheaton,299174,Stephen King,1986
6000000,Paramount Pictures,USA,John Hughes,Comedy,70136369,Ferris Bueller's Day Off,PG-13,1986-06-11,103,7.8,Matthew Broderick,264740,John Hughes,1986
15000000,Paramount Pictures,USA,Tony Scott,Action,179800601,Top Gun,PG,1986-05-16,110,6.9,Tom Cruise,236909,Jim Cash,1986
18500000,Twentieth Century Fox Film Corporation,USA,James Cameron,Action,85160248,Aliens,R,1986-07-18,137,8.4,Sigourney Weaver,540152,James Cameron,1986
9000000,Walt Disney Pictures,USA,Randal Kleiser,Adventure,18564613,Flight of the Navigator,PG,1986-08-01,90,6.9,Joey Cramer,36636,Mark H. Baker,1986
6000000,De Laurentiis Entertainment Group (DEG),USA,David Lynch,Drama,8551228,Blue Velvet,R,1986-10-23,120,7.8,Isabella Rossellini,146768,David Lynch,1986
9000000,Paramount Pictures,USA,Howard Deutch,Comedy,40471663,Pretty in Pink,PG-13,1986-02-28,96,6.8,Molly Ringwald,60565,John Hughes,1986
15000000,SLM Production Group,USA,David Cronenberg,Drama,40456565,The Fly,R,1986-08-15,96,7.5,Jeff Goldblum,129698,George Langelaan,1986
6000000,Twentieth Century Fox Film Corporation,USA,David Seltzer,Comedy,8200000,Lucas,PG-13,1986-03-28,100,6.8,Corey Haim,12228,David Seltzer,1986
25000000,Twentieth Century Fox Film Corporation,USA,John Carpenter,Action,11100000,Big Trouble in Little China,PG-13,1986-07-02,99,7.3,Kurt Russell,101678,Gary Goldman,1986


In [6]:
#Using xor(), xor indicates elementwise exclusive OR.
filter(movies, xor(score > 9, budget > 2.5e8))

budget,company,country,director,genre,gross,name,rating,released,runtime,score,star,votes,writer,year
<dbl>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<chr>,<dbl>,<chr>,<dbl>
25000000.0,Castle Rock Entertainment,USA,Frank Darabont,Crime,28341469,The Shawshank Redemption,R,1994-10-14,142,9.3,Tim Robbins,1861666,Stephen King,1994
300000000.0,Walt Disney Pictures,USA,Gore Verbinski,Action,309420425,Pirates of the Caribbean: At World's End,PG-13,2007-05-25,169,7.1,Johnny Depp,514191,Ted Elliott,2007
258000000.0,Columbia Pictures,USA,Sam Raimi,Action,336530303,Spider-Man 3,PG-13,2007-05-04,139,6.2,Tobey Maguire,416842,Sam Raimi,2007
260000000.0,Walt Disney Animation Studios,USA,Nathan Greno,Animation,200821936,Tangled,PG,2010-11-24,100,7.8,Mandy Moore,325621,Dan Fogelman,2010


In [7]:
# you can also use %in% operator
filter(movies, country %in% c("Peru", "Colombia", "Chile"))

budget,company,country,director,genre,gross,name,rating,released,runtime,score,star,votes,writer,year
<dbl>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<chr>,<dbl>,<chr>,<dbl>
0.0,Concorde-New Horizons,Peru,Augusto Tamayo San Rom<e1>n,Action,410880,Ultra Warrior,R,1990-03-16,100,1.6,Dack Rambo,661,Len Jenkin,1990
45000000.0,Warner Bros.,Peru,Luis Llosa,Action,57362581,The Specialist,R,1994-10-07,110,5.5,Sylvester Stallone,53868,John Shirley,1994
3000000.0,HBO Films,Colombia,Joshua Marston,Crime,6517198,Maria Full of Grace,R,2004-08-06,101,7.5,Catalina Sandino Moreno,31546,Joshua Marston,2004
0.0,Participant Media,Chile,Pablo Larra<ed>n,Drama,2343664,No,R,2012-11-09,118,7.4,Gael Garc<ed>a Bernal,19935,Pedro Peirano,2012
26000000.0,Alcon Entertainment,Chile,Patricia Riggen,Biography,12188642,Los 33,PG-13,2015-11-13,127,6.9,Antonio Banderas,27925,Mikko Alanne,2015
1400000.0,Buffalo Films,Colombia,Ciro Guerra,Adventure,1329249,Embrace of the Serpent,NOT RATED,2015-05-25,125,7.9,Nilbio Torres,13698,Ciro Guerra,2015
9000000.0,Fox Searchlight Pictures,Chile,Pablo Larra<ed>n,Biography,13958679,Jackie,R,2016-12-02,100,6.8,Natalie Portman,50652,Noah Oppenheim,2016
0.0,AZ Films,Chile,Pablo Larra<ed>n,Biography,938875,Neruda,R,2017-03-10,107,7.0,Gael Garc<ed>a Bernal,5680,Guillermo Calder<f3>n,2016


In R, if you want to find if a variable's value is missing, use the is.na() function. In particular, do not check for equality with NA:

In [0]:
x <- 1

In [9]:
x == NA

In [10]:
is.na(x)

Similarly, never put an equality condition with NA in your dplyr filter() statements.

In [11]:
# create a dataframe
df <- tibble(x = c(1, NA, 3))
print(df)

[90m# A tibble: 3 x 1[39m
      x
  [3m[90m<dbl>[39m[23m
[90m1[39m     1
[90m2[39m    [31mNA[39m
[90m3[39m     3


In [12]:
filter(df, x > 1)

x
<dbl>
3


In [13]:
filter(df, is.na(x) | x > 1) # Note the special case of NA

x
<dbl>
""
3.0


## **Exercise 1:**


1.   Write code using filter that will allow you to output movies with `country` USA or UK and `genre` Action or Drama.
2.   Write code using filter that will allow you to output movies with `released` later than 2014-12-01. (hint: `movies$released <- as.Date(movies$released)`)


## **select(): pick columns by name**

select() let’s you choose a subset variables, specified by name.
Note, there is no need for quotation marks in dplyr:

In [14]:
#select 5 columns
select(movies, name, country, year, genre)

name,country,year,genre
<chr>,<chr>,<dbl>,<chr>
Stand by Me,USA,1986,Adventure
Ferris Bueller's Day Off,USA,1986,Comedy
Top Gun,USA,1986,Action
Aliens,USA,1986,Action
Flight of the Navigator,USA,1986,Adventure
Platoon,UK,1986,Drama
Labyrinth,UK,1986,Adventure
Blue Velvet,USA,1986,Drama
Pretty in Pink,USA,1986,Comedy
The Fly,USA,1986,Drama


In [15]:
select(movies, name, genre:score) # use colon to select contiguous columns,

name,genre,gross,rating,released,runtime,score
<chr>,<chr>,<dbl>,<chr>,<chr>,<dbl>,<dbl>
Stand by Me,Adventure,52287414,R,1986-08-22,89,8.1
Ferris Bueller's Day Off,Comedy,70136369,PG-13,1986-06-11,103,7.8
Top Gun,Action,179800601,PG,1986-05-16,110,6.9
Aliens,Action,85160248,R,1986-07-18,137,8.4
Flight of the Navigator,Adventure,18564613,PG,1986-08-01,90,6.9
Platoon,Drama,138530565,R,1987-02-06,120,8.1
Labyrinth,Adventure,12729917,PG,1986-06-27,101,7.4
Blue Velvet,Drama,8551228,R,1986-10-23,120,7.8
Pretty in Pink,Comedy,40471663,PG-13,1986-02-28,96,6.8
The Fly,Drama,40456565,R,1986-08-15,96,7.5


In [16]:
select(movies, -(star:writer)) # To drop columns use a minus, "-"

budget,company,country,director,genre,gross,name,rating,released,runtime,score,year
<dbl>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>
8000000,Columbia Pictures Corporation,USA,Rob Reiner,Adventure,52287414,Stand by Me,R,1986-08-22,89,8.1,1986
6000000,Paramount Pictures,USA,John Hughes,Comedy,70136369,Ferris Bueller's Day Off,PG-13,1986-06-11,103,7.8,1986
15000000,Paramount Pictures,USA,Tony Scott,Action,179800601,Top Gun,PG,1986-05-16,110,6.9,1986
18500000,Twentieth Century Fox Film Corporation,USA,James Cameron,Action,85160248,Aliens,R,1986-07-18,137,8.4,1986
9000000,Walt Disney Pictures,USA,Randal Kleiser,Adventure,18564613,Flight of the Navigator,PG,1986-08-01,90,6.9,1986
6000000,Hemdale,UK,Oliver Stone,Drama,138530565,Platoon,R,1987-02-06,120,8.1,1986
25000000,Henson Associates (HA),UK,Jim Henson,Adventure,12729917,Labyrinth,PG,1986-06-27,101,7.4,1986
6000000,De Laurentiis Entertainment Group (DEG),USA,David Lynch,Drama,8551228,Blue Velvet,R,1986-10-23,120,7.8,1986
9000000,Paramount Pictures,USA,Howard Deutch,Comedy,40471663,Pretty in Pink,PG-13,1986-02-28,96,6.8,1986
15000000,SLM Production Group,USA,David Cronenberg,Drama,40456565,The Fly,R,1986-08-15,96,7.5,1986


## **select() helpers**
You can use the following functions to help select the columns:


*   starts_with()
*   ends_with()
*   contains()
*   matches() (matches a regular expression)
*   num_range("x", 1:4): pickes variables x1, x2, x3, x4

Example:



In [17]:
select(movies, starts_with("r"))
select(movies, ends_with("e"))
select(movies, contains("re"))

rating,released,runtime
<chr>,<chr>,<dbl>
R,1986-08-22,89
PG-13,1986-06-11,103
PG,1986-05-16,110
R,1986-07-18,137
PG,1986-08-01,90
R,1987-02-06,120
PG,1986-06-27,101
R,1986-10-23,120
PG-13,1986-02-28,96
R,1986-08-15,96


genre,name,runtime,score
<chr>,<chr>,<dbl>,<dbl>
Adventure,Stand by Me,89,8.1
Comedy,Ferris Bueller's Day Off,103,7.8
Action,Top Gun,110,6.9
Action,Aliens,137,8.4
Adventure,Flight of the Navigator,90,6.9
Drama,Platoon,120,8.1
Adventure,Labyrinth,101,7.4
Drama,Blue Velvet,120,7.8
Comedy,Pretty in Pink,96,6.8
Drama,The Fly,96,7.5


director,genre,released,score
<chr>,<chr>,<chr>,<dbl>
Rob Reiner,Adventure,1986-08-22,8.1
John Hughes,Comedy,1986-06-11,7.8
Tony Scott,Action,1986-05-16,6.9
James Cameron,Action,1986-07-18,8.4
Randal Kleiser,Adventure,1986-08-01,6.9
Oliver Stone,Drama,1987-02-06,8.1
Jim Henson,Adventure,1986-06-27,7.4
David Lynch,Drama,1986-10-23,7.8
Howard Deutch,Comedy,1986-02-28,6.8
David Cronenberg,Drama,1986-08-15,7.5


## **Exercise 2:**

Write code that will have company as the first column and the columns starting with the letter 'g' as the following columns. Output the first 20 rows of such a dataset.

## **arrange(): reorder rows**

arrange() takes a data frame and a set of column names to order by.
For descending order, use the function desc() around the column name.

In [18]:
print(arrange(movies, runtime), n = 4)

[90m# A tibble: 6,820 x 15[39m
  budget company country director genre  gross name  rating released runtime
   [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m   [3m[90m<chr>[39m[23m   [3m[90m<chr>[39m[23m    [3m[90m<chr>[39m[23m  [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m  [3m[90m<chr>[39m[23m      [3m[90m<dbl>[39m[23m
[90m1[39m 0.  [90m [39m  Iwerks~ France  Jean-Ja~ Adve~ 1.51[90me[39m7 [90m"[39mWin~ G      1996-09~      50
[90m2[39m 1.25[90me[39m7 Univer~ USA     Don Blu~ Anim~ 4.81[90me[39m7 [90m"[39mThe~ G      1988-11~      69
[90m3[39m 6.00[90me[39m3 Next W~ UK      Christo~ Crime 4.85[90me[39m4 [90m"[39mFol~ R      1999-11~      69
[90m4[39m 0.  [90m [39m  Hyperi~ USA     Bruce W~ Anim~ 8.44[90me[39m6 [90m"[39mB\x~ PG-13  1992-07~      70
[90m# ... with 6,816 more rows, and 5 more variables: score [3m[90m<dbl>[90m[23m, star [3m[90m<chr>[90m[23m,
#   votes [3m[90m<dbl>[90m[23m, w

In [19]:
# use `desc` for descending
print(arrange(movies, desc(budget)), n = 4)

[90m# A tibble: 6,820 x 15[39m
  budget company country director genre  gross name  rating released runtime
   [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m   [3m[90m<chr>[39m[23m   [3m[90m<chr>[39m[23m    [3m[90m<chr>[39m[23m  [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m  [3m[90m<chr>[39m[23m      [3m[90m<dbl>[39m[23m
[90m1[39m 3.00[90me[39m8 Walt D~ USA     Gore Ve~ Acti~ 3.09[90me[39m8 Pira~ PG-13  2007-05~     169
[90m2[39m 2.60[90me[39m8 Walt D~ USA     Nathan ~ Anim~ 2.01[90me[39m8 Tang~ PG     2010-11~     100
[90m3[39m 2.58[90me[39m8 Columb~ USA     Sam Rai~ Acti~ 3.37[90me[39m8 Spid~ PG-13  2007-05~     139
[90m4[39m 2.50[90me[39m8 Warner~ UK      David Y~ Adve~ 3.02[90me[39m8 Harr~ PG     2009-07~     153
[90m# ... with 6,816 more rows, and 5 more variables: score [3m[90m<dbl>[90m[23m, star [3m[90m<chr>[90m[23m,
#   votes [3m[90m<dbl>[90m[23m, writer [3m[90m<chr>[90m[23m, year [3

Missing values are always sorted at the end:

In [20]:
df <- tibble(x = c(5, NA, 2))
arrange(df, x)

x
<dbl>
2.0
5.0
""


In [21]:
arrange(df, desc(x))

x
<dbl>
5.0
2.0
""


## **Exercise 3:**

Use arrange to sort the `movies` dataset by ascending order of the product of the budget and score variables. Output the first 20 rows of the new dataset.


## **mutate(): add new variables**

mutate() adds new columns that are a function of the existing ones

In [22]:
movies <- mutate(movies, profit = gross - budget)
select(movies, name, gross, budget, profit)

name,gross,budget,profit
<chr>,<dbl>,<dbl>,<dbl>
Stand by Me,52287414,8000000,44287414
Ferris Bueller's Day Off,70136369,6000000,64136369
Top Gun,179800601,15000000,164800601
Aliens,85160248,18500000,66660248
Flight of the Navigator,18564613,9000000,9564613
Platoon,138530565,6000000,132530565
Labyrinth,12729917,25000000,-12270083
Blue Velvet,8551228,6000000,2551228
Pretty in Pink,40471663,9000000,31471663
The Fly,40456565,15000000,25456565


To discard old variables, use transmute() instead of mutate().

In [23]:
# Generating multiple new variables
movies <- mutate(
movies,
profit = gross - budget,
gross_in_mil = gross/10^6,
budget_in_mil = budget/10^6,
profit_in_mil = profit/10^6
)
select(movies, name, year, country, contains("_in_mil"), profit)

name,year,country,gross_in_mil,budget_in_mil,profit_in_mil,profit
<chr>,<dbl>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>
Stand by Me,1986,USA,52.287414,8.0,44.287414,44287414
Ferris Bueller's Day Off,1986,USA,70.136369,6.0,64.136369,64136369
Top Gun,1986,USA,179.800601,15.0,164.800601,164800601
Aliens,1986,USA,85.160248,18.5,66.660248,66660248
Flight of the Navigator,1986,USA,18.564613,9.0,9.564613,9564613
Platoon,1986,UK,138.530565,6.0,132.530565,132530565
Labyrinth,1986,UK,12.729917,25.0,-12.270083,-12270083
Blue Velvet,1986,USA,8.551228,6.0,2.551228,2551228
Pretty in Pink,1986,USA,40.471663,9.0,31.471663,31471663
The Fly,1986,USA,40.456565,15.0,25.456565,25456565


Any vectorized function can be used with mutate(), including:


*   arithmetic operators (+,-,*,/, %, %%),
*   logical operators (<,<=,>,>=,==,!=),
*   logarithmic and exponential transfomations (log, log10, exp),
*   offsets (lead, lag),
*   cummulative rolling aggregates (cumsum, cumprod, cummin, cummax),
*   ranking (min_rank, percent_rank).

