Skip to content

Data Types

tiago edited this page Nov 12, 2019 · 10 revisions

Variables and Data types - Concept

Variables and data types are the building blocks of all programs.

In order to better manipulate them two concepts must be taken into consideration:

  • State: The current status of a data type. True or False; 1, 2, 1000; 1.4, 1.2, etc ...
  • Mutation: The process of altering the state of a data type.

Then, we can define:

  • Data type: A way in which data can be arranged. When creating a new data type a "copy" of it is created (an object) in memory. Some are mutable (are sucseptible to change) where as others are immutable. Ex: boolean, integer, string, etc ...

Numbers

In general there are two types of representing numbers: integers and floats. You can create these variables easily in the interpreter:

a=3   # integer
b=4.0 # float

We can use operators to transform variables:

Name of Operation Symbol of Operation
Addition +
Subtraction -
Multiplication *
Division /
Integer Division //
Remainder %
Power **
3 + 1
a * 2
3.0 * 2.3
5 / 7
10 % 2
15.3 // 7.96
3.4 % 1.2
4 ** 5
4.0 ** 0.5
a = 2e5 # same as a=2*10^5
a+=1  # same as a=a+1
a*=2  # same as a=a*2

Some notes to take into account:

  • When dividing two numbers the result is a float.

  • When using operators between integers the result is an integer (except for division).

  • When using operators between floats the result is a float.

  • When using operators between floats and integers the result is a float.

    print(3.0 * 2) # 6.0
    print(1 / 1) # 1.0
  • Python takes operation priority into account (parenthesis are the highest priority, then **, etc...)

Booleans

The boolean type is used to store values for True or False. Some variables can be converted to boolean by using:

bool("") # False
bool("qwerty") # True

In general, an object that is considered to be empty (Ex: "", [ ], 0) returns false, while an object that contains anything returns true (Ex: 5, 0.1).

True False
True False
"0" ""
[1, 2, "asd"] [ ]
{4} {}
4 0
0.01 0.0
--- None

None

The type None is assigned with the intent of tagging a variable as not containing any value. They are normally used in exceptions.

Data Type Creation

As mentioned in the previous chapter, assigning a new variable can sometimes create new objects in memory.

Think of a factory of integers. This factory bases its production on a specific model (a class, defined by python) which is used to build new replicas (instances). When python creates a new integer it asks for a new integer from the factory. As such, the new object is considered to be a new instance from the class integer.

Actually, almost everything we use in python are objects (even print() or functions!).

This concept will help us understand better strings and string methods.

Strings

The string data type stores text or a sequence of characters. We can define them by surrounding text by ', " or """.

a="this is a string"

To define a quote inside a string, so as to not confuse the python interpreter, a escape sequence " can be used.

print("\"This is a quote\"")

List of the most common escape sequences.

Escape Sequence Output
\\ Backslash
\n Newline / Paragraph
\" Double quote
\' Single quote
\Uxxxx Unicode character
\o87 Octal character
\xFA Hexadecimal character

By convention, the indexing of the characters in a string starts at 0.

List indexing (Bird, Steven, Edward Loper and Ewan Klein (2009), Natural Language Processing with Python. O’Reilly Media Inc.)

You can use indexing to:

  • Access the nth character of a string using str[n].
  • Retrieve the last character of a string by str[-1].
  • Access the nth last character with str[-n].
  • Inverse the string with str[::-1].
a="IEEE"
print(a[0])   # "I"
print(a[-2]   # "E")
print(a[::-1] # EEEI)

Indexing also allows the slicing of a string. When slicing, a new substring is always created. The starting and ending chars of that substring can be specified (at least one of them is required). A string can be sliced using the syntax str[start:end]:

my_str="A beautiful morning"
my_substr=my_str[2:11]
print(my_substr) # beautiful

Notice how the first integer refers to the character 'b' (inclusive) and 11 corresponds to the whitespace after 'l' (exclusive).

However, the use of two integers isn't obligatory. If a single integer is placed before the colon, a substring is created from the character of that integer to the end of the string. An integer after the colon creates a substring until the character of that integer.

print("A beautiful morning"[12:]) # morning
print("A beautiful morning"[:12]) # A beautiful

String Methods

One particularity of strings is that they are immutable, and as such, cannot be changed.

a="str"
a[0]="t" # ILLEGAL! - str object does not support assignment

As a consequence, when trying to change the content of a string we are forced to create a new one. So as to make this process easier the str class has a selection of methods that take an input string and create a new object in memory (which is usually a modified version of that string).

A method can be called using str.method(args). Many methods require additional arguments and sometimes they have to be of a specific type. Some arguments are also optional. These are surrounded by brackets.

The command help(method) can be used to see the official documentation, which has information on how to use the method and what it does. The official documentation can also be accessed by this link.

The following list contains the most common string methods. All of their descriptions are either adapted or copied from the official python documentation.

  • str.capitalize() - Makes first character upper case and the rest lower

    "ferNANdo caRVALho".capitalize() # Fernando carvalho
  • str.count(sub[, start[, end]]) - Return the number of non-overlapping occurrences of substring sub in the range [start, end]. Optional arguments start and end are interpreted as in slice notation.

    "ananas".count("an") # 2
    "ananas".count("an", 2) # 1
  • str.endswith(suffix) - Returns True if the string ends with the specified suffix. False otherwise.

    "test.pdf".endswith("pdf") # True
  • str.find(sub[, start[, end]]) - Return the lowest index in the string where substring sub is found within the slice s[start:end]. Optional arguments start and end are interpreted as in slice notation. Return -1 if sub is not found.

    "Dory_Nemo".find("Nemo") # 5

    Note: Don't use when trying to check if sub is a substring. Use the in operator (much more efficient)

    "Nemo" in "Dory_Nemo" # True
  • str.isdigit() - Return true if all characters in the string are digits and there is at least one character, false otherwise.

    "1234".isdigit() # True
    "123f".isdigit() # False
  • str.isupper - Returns true only if all characters of the string are uppercased. False otherwise.

    "QWERTY_".isupper() # True
    "QWeRTY_".isupper() # False
  • str.join(iter) - Concatenates str between every member of iter.

    ", ".join("123") # "1, 2, 3"
    "_".join(["1", "2", "3"]) # 1_2_3

    Note: The last example is a list, which we will discuss later.

  • str.lower() - Return a copy of the string with all the cased characters converted to lowercase.

    "FeRNaNDo".lower() # "fernando"
  • str.replace(old, new[, count]) - Return a copy of the string with all occurrences of substring old replaced by new. If the optional argument count is given, only the first count occurrences are replaced.

    "I3E".replace("3", "EE") # IEEE
    "IEEE".replace("E", "_", 2) # I__E
  • str.split() - Return a list of the words in the string, using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done

    "1, 2, 3".split(", ") # ["1", "2", "3"]
    "1,2,3".split(',', maxsplit=1) # ["1", "2,3"]
  • str.strip([chars]) - Returns a copy of a string with the leading and trailing characters removed. The chars argument specifies which characters are to be removed from the begining and end of the string. Characters are removed from the leading end until reaching a string character that is not contained in the set of characters in chars. If chars is omitted then only whitespaces are removed.

    "   big and spacious     ".strip() # "big and spacious"
    "www,archlinux.org".strip("gw.or") # "archlinux"

String Format

The format method is one of the most extensive formats in the str class. It is designed to help string output formatting. It formats strings that are identified with the {} (braces) placeholder. The placeholders will then be replaced by the arguments given in format.

```python
name="Fernando"
print("My name is {}".format(name))
```

It is possible to specify the order in which the strings are substitute
by inserting an integer into the braces.

```python
print("1st:{3};2nd:{0};3rd:{1};4th:{2}".format("Second", "Third", "Forth", "First"))
# 1st:First;2nd:Second;3rd:Third;4th:Forth
```

There is a whole portefolio of different options that can be used inside braces
to format strings. Due to time restrictions we will only mention a few. For more
information be sure to check [**the official documentation**](https://docs.python.org/3/library/string.html#formatstrings).

These are the most commmon formatting modifiers:
  • align - Specifies the alignment of the string. A width can be give to define the minimum field width. If width isn't given, the field width will be determined by the content:

    1. < - Left align.
    2. > - Right align.
    3. = - Forces the padding to be placed after the sign (if any) but before. the digits. Only valid for numeric types.
    4. ^ - Centered
    print("{:>30}".format("IEEE"))
    #                IEEE
  • sign - Specifies the sign of numerical data.

    1. + - Sign is used for both positive as negative numbers.
    2. -- - Only negative numbers have a minus sign.
    3. space - Space is used on positive numbers. Minus sign on negative numbers.
  • # - Specifies how numbers are displayed. For integers:

    1. b - Binary.
    2. c - Unicode character.
    3. d - Decimal integer.
    4. o - Octal format.
    5. x - Hex Format. Lower-case for a-f.
    6. X - Hex Format. Upper cas for A-F.
    print("{:b}\n{:x}\n{:X}\n".format(8, 13, 13))
    # 1000
    # d
    # D

    For floats:

    1. e - Exponent notation.
    2. g - General format. Can be specified to a specific precision. Default precision is 6.
    3. % - Percentage format. Multiplies the number by 100 followed by a percent sign.
    4. .precision - Set number precision. Must be inserted befor the format type (if specified).
    print("{:.3e}\n{:.2g}\n{:%}".format(3.14, 1.26, 0.666))
    # 3.140e00
    # 1.3
    # 66.600000%

Sections

Previous: Variables
Next: Flow Control