Skip to content

Latest commit

 

History

History
454 lines (304 loc) · 14 KB

values.pod

File metadata and controls

454 lines (304 loc) · 14 KB

Values

Effective Perl programs depend on the accurate representation and manipulation of values.

Computer programs contain variables: containers which hold values. Values are the actual data the programs manipulate. While it's easy to explain what that data might be--your aunt's name and address, the distance between your office and a golf course on the moon, or the weight of all cookies you've eaten in the past year--the rules regarding the format of that data are often strict. Writing an effective program often means understanding the best (simplest, fastest, most compact, or easiest) way of representing that data.

While the structure of a program depends heavily on the means by which you model your data with appropriate variables, these variables would be meaningless if they couldn't accurately contain the data itself--the values.

Strings

A string is a piece of data with no particular formatting, no particular contents, and no semantic meaning beyond the fact that it's a string. It could be your name. It could be the contents of an image file read from your hard drive. It could be the Perl program itself. A string has no meaning to the program until you give it meaning. A string is a fixed amount of data delineated by quotes of some form. Most strings use either single or double quotes:

Characters in a single-quoted string represent themselves literally, with one exception. You may embed a single quote inside a single-quoted string by escaping the quote with a leading backlash:

A double-quoted string has more complex (and often, more useful) behavior. You may escape control and non-printable characters in the string:

Should I include a table of escapes here? It's the same as in regexes.

You may also interpolate the value of a scalar variable or the values of an array within a double-quoted string directly:

You may include a literal double-quote inside a double-quoted string by escaping it with a leading backslash:

If you find that hideously ugly, you may use an alternate quoting operator. The q operator performs single quoting, while the qq operator performs double quoting. In each case, you may choose your own delimiter for the string. The character immediately following the operator determines the beginning and end of the string. If the character is the opening character of a balanced pair--such as opening and closing braces--the closing character will be the final delimiter. Otherwise, the character itself will be both the starting and ending delimiter.

Even though you can declare a complex string with a series of embedded escape characters, sometimes it's easier to declare a multi-line string on multiple lines. The heredoc syntax lets you assign one or more lines of a string with a different syntax:

The <<'END_BLURB' syntax has three parts. The double angle-brackets introduce the heredoc. The quotes determine whether the heredoc obeys single-quoted or double-quoted behavior with regard to variable and escape character interpolation. They're optional; the default behavior is double-quoted interpolation. The END_BLURB itself is an arbitrary identifier which the Perl 5 parser uses as the ending delimiter.

Be careful; regardless of the indentation of the heredoc declaration itself, the ending delimiter must begin in the first column of the program.

Unicode and Strings

Unicode is a system for representing characters in the world's written languages. While most English text uses a character set of only 127 characters (which requires seven bits of storage and fits nicely into eight-bit bytes), it's naïve to believe that you won't someday need an umlaut, for example.

Perl 5 handles Unicode internally. For the most part, you can ignore this, except that there are several representations for how to store Unicode. You don't have to understand what happens after you have the data within your program, but you have to know what kind of data you receive from the external world and what kind of data you want to emit to the external world. As long as you do not assume that a character in a Perl 5 string requires a byte of storage, all of the Perl 5 string operations perform correctly on Unicode and ASCII strings.

For far more detail what Unicode is, encodings, and how to manage incoming and outgoing data in a Unicode world, see perldoc perluniintro.

Probably need binmode here.

You may include Unicode characters in your programs in three ways. The easiest is to use the utf8 pragma (pragmas), which tells the Perl parser to interpret the rest of the source code file with the UTF-8 encoding (an arrangement and representation of Unicode characters). This allows you to use Unicode characters in strings as well in identifiers:

You will need to use a text editor which understands the UTF-8 encoding, and you will need to save your file appropriately.

You may also use the Unicode escape sequence to represent character encodings. The syntax \x{} represents a single character; place the hex form of the character's Unicode number within the curly brackets:

Some Unicode characters have names. Though these are more verbose, they can be clearer to read than Unicode numbers. You must use the charnames pragma to enable them. Use the \N{} escape to refer to them:

You may use the \x{} and \N{} forms within regular expressions as well as anywhere else you may legitimately use a string or a character.

Numbers

Perl also supports numbers, both integers and floating-point values. They support scientific notation as well as binary, octal, and hexadecimal representations:

The emboldened characters are the numeric prefixes for binary, octal, and hex notation respectively. Be aware that the leading zero always indicates octal mode; this can occasionally produce unanticipated confusion.

You may not use commas to separate thousands in numeric literals because the parser will interpret the commas as comma operators. You can use underscores in other places within the number, however. The parser will treat them as invisible characters. Your readers may not. These are equivalent:

Consider the most readable alternative, however.

Because of coercion (coercion), Perl programmers rarely have to worry about converting text read from outside the program to numbers. Perl will treat anything which looks like a number as a number in numeric contexts. Even though it almost always does so correctly, occasionally it's useful to know if something really does look like a number. The core module Scalar::Util contains a function named looks_like_number which returns true if Perl will consider the given argument numeric.

The Regexp::Common module from the CPAN also provides several well-tested regular expressions to identify valid types (whole number, integer, floating-point value) of numeric values.

Undef

Perl 5 has a value which represents an unassigned, undefined, and unknown value: undef. Declared but undefined scalar variables contain undef:

undef evaluates to false in boolean context. Be aware that an array containing a single element which is itself undef evaluates to true in a boolean context. Interpolating undef into a string--or evaluating it in a string context--produces an uninitialized value warning, if you have warnings enabled.

The Empty List

When used on the right-hand side of an assignment, the () construct represents an empty list. When evaluated in scalar context, this evaluates to undef. In list context, it is effectively an empty list.

When used on the left-hand side of an assignment, the () construct enforces list context. To count the number of elements returned from an expression in list context without using a temporary variable, you use the idiom (idioms):

Because of the right associativity (associativity) of the assignment operator, Perl first evaluates the second assignment by calling get_all_clown_hats() in list context. This produces a list. Even though that list never gets assigned to anything, the return value of the second assignment is that list. Perl next evaluates that list in scalar context, due to the scalar assignment to $count in the first assignment. As a result, $count contains the number of elements in the list returned from get_all_clown_hats().

You don't have to understand all of the implications of this code right now, but it does demonstrate how a few of Perl's fundamental design features can combine to produce interesting and useful behavior.

Lists

Lists and arrays are not interchangeable in Perl. You may store a list in an array and you may coerce an array to a list, but they are separate entities. Lists may occur verbatim in source code as values:

... as targets of assignments:

... or as lists of expressions:

Note that you do not need parentheses to create lists; where present, the parentheses in these examples group expressions to change the precedence of those expressions (precedence).

You may use the range operator to create lists of literals in a compact form:

... and you may use the qw() operator to split a literal string on whitespace to produce a list of strings:

Lists can (and often do) occur as the results of expressions, but these lists do not appear literally in source code.

POD ERRORS

Hey! The above document had some coding errors, which are explained below:

Around line 3:

A non-empty Z<>

Around line 80:

=end for without matching =begin. (Stack: [empty])

Around line 168:

A non-empty Z<>

Around line 192:

=end for without matching =begin. (Stack: [empty])

Around line 208:

Non-ASCII character seen before =encoding in '£_to_¥'. Assuming UTF-8