title | description | lessons | |||||
---|---|---|---|---|---|---|---|
Batch processing |
Most shell commands will process many files at once. This chapter shows you how to make your own pipelines do that. Along the way, you will see how the shell uses variables to store information. |
|
type: MultipleChoiceExercise
key: e4d5f4adea
xp: 50
Like other programs, the shell stores information in variables. Some of these, called environment variables, are available all the time. Environment variables' names are conventionally written in upper case, and a few of the more commonly-used ones are shown below.
Variable | Purpose | Value |
---|---|---|
HOME |
User's home directory | /home/repl |
PWD |
Present working directory | Same as pwd command |
SHELL |
Which shell program is being used | /bin/bash |
USER |
User's ID | repl |
To get a complete list (which is quite long),
you can type set
in the shell.
Use set
and grep
with a pipe to display the value of HISTFILESIZE
,
which determines how many old commands are stored in your command history.
What is its value?
@possible_answers
- 10
- 500
- [2000]
- The variable is not there.
@hint
Use set | grep HISTFILESIZE
to get the line you need.
@pre_exercise_code
@sct
err1 = "No: the shell records more history than that."
err2 = "No: the shell records more history than that."
correct3 = "Correct: the shell saves 2000 old commands by default on this system."
err4 = "No: the variable `HISTFILESIZE` is there."
Ex().has_chosen(3, [err1, err2, correct3, err4])
type: ConsoleExercise
key: afae0f33a7
xp: 100
A simpler way to find a variable's value is to use a command called echo
, which prints its arguments. Typing
echo hello DataCamp!
prints
hello DataCamp!
If you try to use it to print a variable's value like this:
echo USER
it will print the variable's name, USER
.
To get the variable's value, you must put a dollar sign $
in front of it. Typing
echo $USER
prints
repl
This is true everywhere:
to get the value of a variable called X
,
you must write $X
.
(This is so that the shell can tell whether you mean "a file named X"
or "the value of a variable named X".)
@instructions
The variable OSTYPE
holds the name of the kind of operating system you are using.
Display its value using echo
.
@hint
Call echo
with the variable OSTYPE
prepended by $
.
@pre_exercise_code
@solution
echo $OSTYPE
@sct
Ex().multi(
has_cwd('/home/repl'),
check_correct(
has_expr_output(strict = True),
multi(
has_code('echo', incorrect_msg="Did you call `echo`?"),
has_code('OSTYPE', incorrect_msg="Did you print the `OSTYPE` environment variable?"),
has_code(r'\$OSTYPE', incorrect_msg="Make sure to prepend `OSTYPE` by a `$`.")
)
)
)
Ex().success_msg("Excellent echoing of environment variables! You're off to a good start. Let's carry on!")
type: BulletConsoleExercise
key: e925da48e4
xp: 100
The other kind of variable is called a shell variable, which is like a local variable in a programming language.
To create a shell variable, you simply assign a value to a name:
training=seasonal/summer.csv
without any spaces before or after the =
sign.
Once you have done this,
you can check the variable's value with:
echo $training
seasonal/summer.csv
@pre_exercise_code
type: ConsoleExercise
key: 78f7fd446f
xp: 50
@instructions
Define a variable called testing
with the value seasonal/winter.csv
.
@hint
There should not be spaces between the variable's name and its value.
@solution
testing=seasonal/winter.csv
@sct
# For some reason, testing the shell variable directly always passes, so we can't do the following.
# Ex().multi(
# has_cwd('/home/repl'),
# has_expr_output(
# expr='echo $testing',
# output='seasonal/winter.csv',
# incorrect_msg="Have you used `testing=seasonal/winter.csv` to define the `testing` variable?"
# )
# )
Ex().multi(
has_cwd('/home/repl'),
multi(
has_code('testing', incorrect_msg='Did you define a shell variable named `testing`?'),
has_code('testing=', incorrect_msg='Did you write `=` directly after testing, with no spaces?'),
has_code('=seasonal/winter\.csv', incorrect_msg='Did you set the value of `testing` to `seasonal/winter.csv`?')
)
)
type: ConsoleExercise
key: d5e7224f55
xp: 50
@instructions
Use head -n 1 SOMETHING
to get the first line from seasonal/winter.csv
using the value of the variable testing
instead of the name of the file.
@hint
Remember to use $testing
rather than just testing
(the $
is needed to get the value of the variable).
@solution
# We need to re-set the variable for testing purposes for this exercise
# you should only run "head -n 1 $testing"
testing=seasonal/winter.csv
head -n 1 $testing
@sct
Ex().multi(
has_cwd('/home/repl'),
has_code(r'\$testing', incorrect_msg="Did you reference the shell variable using `$testing`?"),
check_correct(
has_output('^Date,Tooth\s*$'),
multi(
has_code('head', incorrect_msg="Did you call `head`?"),
has_code('-n', incorrect_msg="Did you limit the number of lines with `-n`?"),
has_code(r'-n\s+1', incorrect_msg="Did you elect to keep 1 line with `-n 1`?")
)
)
)
Ex().success_msg("Stellar! Let's see how you can repeat commands easily.")
type: ConsoleExercise
key: 920d1887e3
xp: 100
Shell variables are also used in loops, which repeat commands many times. If we run this command:
for filetype in gif jpg png; do echo $filetype; done
it produces:
gif
jpg
png
Notice these things about the loop:
- The structure is
for
...variable...in
...list...; do
...body...; done
- The list of things the loop is to process (in our case, the words
gif
,jpg
, andpng
). - The variable that keeps track of which thing the loop is currently processing (in our case,
filetype
). - The body of the loop that does the processing (in our case,
echo $filetype
).
Notice that the body uses $filetype
to get the variable's value instead of just filetype
,
just like it does with any other shell variable.
Also notice where the semi-colons go:
the first one comes between the list and the keyword do
,
and the second comes between the body and the keyword done
.
@instructions
Modify the loop so that it prints:
docx
odt
pdf
Please use filetype
as the name of the loop variable.
@hint
Use the code structure in the introductory text, swapping the image file types for document file types.
@pre_exercise_code
@solution
for filetype in docx odt pdf; do echo $filetype; done
@sct
Ex().multi(
has_cwd('/home/repl'),
check_correct(
has_expr_output(),
multi(
has_code('for', incorrect_msg='Did you call `for`?'),
has_code('filetype', incorrect_msg='Did you use `filetype` as the loop variable?'),
has_code('in', incorrect_msg='Did you use `in` before the list of file types?'),
has_code('docx odt pdf', incorrect_msg='Did you loop over `docx`, `odt` and `pdf` in that order?'),
has_code(r'pdf\s*;', incorrect_msg='Did you put a semi-colon after the last loop element?'),
has_code(r';\s*do', incorrect_msg='Did you use `do` after the first semi-colon?'),
has_code('echo', incorrect_msg='Did you call `echo`?'),
has_code(r'\$filetype', incorrect_msg='Did you echo `$filetype`?'),
has_code(r'filetype\s*;', incorrect_msg='Did you put a semi-colon after the loop body?'),
has_code('; done', incorrect_msg='Did you finish with `done`?')
)
)
)
Ex().success_msg("First-rate for looping! Loops are brilliant if you want to do the same thing hundreds or thousands of times.")
type: ConsoleExercise
key: 8468b70a71
xp: 100
You can always type in the names of the files you want to process when writing the loop, but it's usually better to use wildcards. Try running this loop in the console:
for filename in seasonal/*.csv; do echo $filename; done
It prints:
seasonal/autumn.csv
seasonal/spring.csv
seasonal/summer.csv
seasonal/winter.csv
because the shell expands seasonal/*.csv
to be a list of four filenames
before it runs the loop.
@instructions
Modify the wildcard expression to people/*
so that the loop prints the names of the files in the people
directory
regardless of what suffix they do or don't have.
Please use filename
as the name of your loop variable.
@hint
@pre_exercise_code
@solution
for filename in people/*; do echo $filename; done
@sct
Ex().multi(
has_cwd('/home/repl'),
check_correct(
has_expr_output(),
multi(
has_code('for', incorrect_msg='Did you call `for`?'),
has_code('filename', incorrect_msg='Did you use `filename` as the loop variable?'),
has_code('in', incorrect_msg='Did you use `in` before the list of file types?'),
has_code('people/\*', incorrect_msg='Did you specify a list of files with `people/*`?'),
has_code(r'people/\*\s*;', incorrect_msg='Did you put a semi-colon after the list of files?'),
has_code(r';\s*do', incorrect_msg='Did you use `do` after the first semi-colon?'),
has_code('echo', incorrect_msg='Did you call `echo`?'),
has_code(r'\$filename', incorrect_msg='Did you echo `$filename`?'),
has_code(r'filename\s*;', incorrect_msg='Did you put a semi-colon after the loop body?'),
has_code('; done', incorrect_msg='Did you finish with `done`?')
)
)
)
Ex().success_msg("Loopy looping! Wildcards and loops make a powerful combination.")
type: MultipleChoiceExercise
key: 153ca10317
xp: 50
People often set a variable using a wildcard expression to record a list of filenames.
For example,
if you define datasets
like this:
datasets=seasonal/*.csv
you can display the files' names later using:
for filename in $datasets; do echo $filename; done
This saves typing and makes errors less likely.
If you run these two commands in your home directory, how many lines of output will they print?
files=seasonal/*.csv
for f in $files; do echo $f; done
@possible_answers
- None: since
files
is defined on a separate line, it has no value in the second line. - One: the word "files".
- Four: the names of all four seasonal data files.
@hint
Remember that X
on its own is just "X", while $X
is the value of the variable X
.
@pre_exercise_code
@sct
err1 = "No: you do not have to define a variable on the same line you use it."
err2 = "No: this example defines and uses the variable `files` in the same shell."
correct3 = "Correct. The command is equivalent to `for f in seasonal/*.csv; do echo $f; done`."
Ex().has_chosen(3, [err1, err2, correct3])
type: PureMultipleChoiceExercise
key: 4fcfb63c4f
xp: 50
A common mistake is to forget to use $
before the name of a variable.
When you do this,
the shell uses the name you have typed
rather than the value of that variable.
A more common mistake for experienced users is to mis-type the variable's name.
For example,
if you define datasets
like this:
datasets=seasonal/*.csv
and then type:
echo $datsets
the shell doesn't print anything,
because datsets
(without the second "a") isn't defined.
If you were to run these two commands in your home directory, what output would be printed?
files=seasonal/*.csv
for f in files; do echo $f; done
(Read the first part of the loop carefully before answering.)
@hint
Remember that X
on its own is just "X", while $X
is the value of the variable X
.
@possible_answers
- [One line: the word "files".]
- Four lines: the names of all four seasonal data files.
- Four blank lines: the variable
f
isn't assigned a value.
@feedback
- Correct: the loop uses
files
instead of$files
, so the list consists of the word "files". - No: the loop uses
files
instead of$files
, so the list consists of the word "files" rather than the expansion offiles
. - No: the variable
f
is defined automatically by thefor
loop.
type: ConsoleExercise
key: 39b5dcf81a
xp: 100
Printing filenames is useful for debugging, but the real purpose of loops is to do things with multiple files. This loop prints the second line of each data file:
for file in seasonal/*.csv; do head -n 2 $file | tail -n 1; done
It has the same structure as the other loops you have already seen: all that's different is that its body is a pipeline of two commands instead of a single command.
@instructions
Write a loop that prints the last entry from July 2017 (2017-07
) in every seasonal file. It should produce a similar output to:
grep 2017-07 seasonal/winter.csv | tail -n 1
but for each seasonal file separately. Please use file
as the name of the loop variable, and remember to loop through the list of files seasonal/*.csv
(instead of 'seasonal/winter.csv' as in the example).
@hint
The loop body is the grep command shown in the instructions, with seasonal/winter.csv
replaced by $file
.
@pre_exercise_code
@solution
for file in seasonal/*.csv; do grep 2017-07 $file | tail -n 1; done
@sct
Ex().multi(
has_cwd('/home/repl'),
# Enforce use of for loop, so students can't just use grep -h 2017-07 seasonal/*.csv
has_code('for', incorrect_msg='Did you call `for`?'),
check_correct(
has_expr_output(),
multi(
has_code('file', incorrect_msg='Did you use `file` as the loop variable?'),
has_code('in', incorrect_msg='Did you use `in` before the list of files?'),
has_code('seasonal/\*', incorrect_msg='Did you specify a list of files with `seasonal/*`?'),
has_code(r'seasonal\/\*\.csv\s*;', incorrect_msg='Did you put a semi-colon after the list of files?'),
has_code(r';\s*do', incorrect_msg='Did you use `do` after the first semi-colon?'),
has_code('grep', incorrect_msg='Did you call `grep`?'),
has_code('2017-07', incorrect_msg='Did you match on `2017-07`?'),
has_code(r'\$file', incorrect_msg='Did you use `$file` as the name of the loop variable?'),
has_code(r'file\s*|', incorrect_msg='Did you use a pipe to connect your second command?'),
has_code(r'tail\s*-n\s*1', incorrect_msg='Did you use `tail -n 1` to print the last entry of each search in your second command?'),
has_code('; done', incorrect_msg='Did you finish with `done`?')
)
)
)
Ex().success_msg("Loopy looping! Wildcards and loops make a powerful combination.")
type: PureMultipleChoiceExercise
key: b974b7f45a
xp: 50
It's easy and sensible to give files multi-word names like July 2017.csv
when you are using a graphical file explorer.
However,
this causes problems when you are working in the shell.
For example,
suppose you wanted to rename July 2017.csv
to be 2017 July data.csv
.
You cannot type:
mv July 2017.csv 2017 July data.csv
because it looks to the shell as though you are trying to move
four files called July
, 2017.csv
, 2017
, and July
(again)
into a directory called data.csv
.
Instead,
you have to quote the files' names
so that the shell treats each one as a single parameter:
mv 'July 2017.csv' '2017 July data.csv'
If you have two files called current.csv
and last year.csv
(with a space in its name)
and you type:
rm current.csv last year.csv
what will happen:
@hint
What would you think was going to happen if someone showed you the command and you didn't know what files existed?
@possible_answers
- The shell will print an error message because
last
andyear.csv
do not exist. - The shell will delete
current.csv
. - [Both of the above.]
- Nothing.
@feedback
- Yes, but that's not all.
- Yes, but that's not all.
- Correct. You can use single quotes,
'
, or double quotes,"
, around the file names. - Unfortunately not.
type: MultipleChoiceExercise
key: f6d0530991
xp: 50
The loops you have seen so far all have a single command or pipeline in their body, but a loop can contain any number of commands. To tell the shell where one ends and the next begins, you must separate them with semi-colons:
for f in seasonal/*.csv; do echo $f; head -n 2 $f | tail -n 1; done
seasonal/autumn.csv
2017-01-05,canine
seasonal/spring.csv
2017-01-25,wisdom
seasonal/summer.csv
2017-01-11,canine
seasonal/winter.csv
2017-01-03,bicuspid
Suppose you forget the semi-colon between the echo
and head
commands in the previous loop,
so that you ask the shell to run:
for f in seasonal/*.csv; do echo $f head -n 2 $f | tail -n 1; done
What will the shell do?
@possible_answers
- Print an error message.
- Print one line for each of the four files.
- Print one line for
autumn.csv
(the first file). - Print the last line of each file.
@hint
You can pipe the output of echo
to tail
.
@pre_exercise_code
@sct
err1 = "No: the loop will run, it just won't do something sensible."
correct2 = "Yes: `echo` produces one line that includes the filename twice, which `tail` then copies."
err3 = "No: the loop runs one for each of the four filenames."
err4 = "No: the input of `tail` is the output of `echo` for each filename."
Ex().has_chosen(2, [err1, correct2, err3, err4])