# Primitive Data Types in Python

[![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/berniehogan/introducingpython/main?filepath=chapters%2FCh.02.DataTypes.ipynb)
[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/berniehogan/introducingpython/blob/main/chapters/Ch.02.DataTypes.ipynb)

There are two basic types of data in Python, __primitive__ data types and __object__ data types. Primitives are the basic building blocks of more complex data structures, much like how letters are the building blocks of words and digits the building blocks of numbers. For example, each letter is a primitive data type in Python called a __character__. An ordered list of characters is called a __string__ object. 

There are (as I understand it) five primitive types in Python: 

- `int` for integer or whole numbers.
- `float` for floating point numbers. These are numbers with decimals in them.
- `char` for characters. 
- `byte` for a kind of character interpreted by the computer, for example for storing image data.
- `bool` for Boolean, namely `True` or `False`

Usually you do not want to type out some primitive data every time. Instead you would use a label to represent them. This label is called a __variable__. You assign a value to a variable and then you can use that variable to 'represent' the value. See how this works below:

In [None]:
new_string = 'Hello world!'
print(new_string)

Variable names in Python start with an alphabetical character (`a-Z`) but can include numbers and underscores. There are some reserved words in Python. If you type them in Jupyter they usually show up in green. If you try to make a variable one of these (like saying `print = 4`), you might have some unexpected consequences. 

Don't worry, you won't 'break Python', but you might mess up that specific instance of Python. In which case, you can always start again by restarting the kernel from up in the menu. So I encourage you to toy around with the code, get a feel for some errors or ways to tinker. Then if you feel it's messed up, you can always restart the kernel. Of course, as you get further into production or academic level code you will not so easily want to restart the kernel, but by then you will likely have developed other approaches to tinkering with your code. 

# Characters 

The first primitive data type is the character. We don't really interact with characters directly but instead through their collection as a 'string' or `str` object. You saw above the string `Hello world!`. 

Characters become a string when encased in quotes. There are three types of quotes: the single tick, the double quote, and the triple tick.

***Quote 1***. The single tick. 

~~~ Python
print('One small tick for strings') 
~~~

***Quote 2***. The double quote. This is not two ticks, but the 'double-quote' character, ". It looks like two vertical ticks, but closer together than two ticks (e.g., " vs ''). Be careful with this character. Some programs like Microsoft Word like to replace the generic " with stylized quote characters that are different at the beginning and end of a quote such as “these”. Python doesn't like stylized quotes and prefers the generic ".  

~~~ python 
print("Python doesn't mind the tick here")
~~~

The advantage of using double-quotes is that you can write phrases like "don't bring me down" without the program being confused when the string stops. If you then wrote `"society", man`, inside of a string, as in: 

~~~ python
print("We all live in a "society", man")
~~~

Then it will get assume the string ends when the next double quote appears, which is not what was the intention.  

***Quote 3***. The triple tick. This is indeed three ticks in a row. Python will evaluate everything inside of the three ticks literally. So if you want to have a string break across a line you can just type that in between three ticks and it will not throw an error. 

~~~ python 
print('''I have no problems breaking 
across the lines!''')
~~~

In [None]:
print('One small tick for strings') 

print() # this just prints an empty line.

print("Python doesn't mind the tick here")

print()

print('''I have no problems breaking 
across the lines!''')

## When syntax issues arise in special characters 

So what happens if you want to use a quotation mark in your text? If you just insert a quote then Python will think it is the end of the string. It will then `raise` an error. Here is what an error, specifically a `SyntaxError`, looks like: 

In [None]:
print("We all live in a "society", man")

Decoding errors is a bit of an art that will develop over time. Here, we can see indeed, the syntax is invalid. But Python is not great at explaining why. This is where experience comes in. Over time you will get better at deducing errors and cleaning them up. 

Some pointers to help with errors: 

- The bottom part is closest to your code. The error might have been triggered at many different layers of abstraction and so the output looks long and intimidating. But it really refers to the _line number_ of a line that was in the process of running the code at that point and an indication of where the code was when the error was raised;
- Try to `print()` out at many points to get feedback (then remove these print statements from working code);
- Break the problem down: make the smallest possible changes and see if it affects the code; 
- Examine online forums that received a similar error. But be mindful of what you throw into a search engine. Your variable names might be noise or might be research data. Be cautious of online sources, know that the most popular is not always the best (often it is biased by age of comment). Often Stack Overflow comments are popular because of _how_ they explain the content. Students sometimes rush to paste a code snippet without understanding it, but the snippet does not work because it is an example with slightly different details than in the students' code. Be patient and read the explanation rather than immediately copy-paste-hope. 

In the case of the error above, the program used a caret character (`^`) to indicate that there should not be an `s` directly after a closing quote. But that's not actually a closing quote is it? It's those darned inverted commas used by skeptical academics everywhere. So in order to preserve those inverted commas we need to "escape" them. 

We use the backslash character to escape, so we should see a string that looks more like: 

~~~ python 
print("We all live in a \"society\", man")
~~~

Notice that the text colour is also a hint of when things are amiss. Observe the correctly formatted print statements below: 

In [None]:
print("We all live in a \"society\", man, but at least we can escape the quotes.")

So there are a number of 'special characters' that are escaped:

- `'`: To escape the single tick, as in `print('say it ain\'t so?')`.
- `"`: To escape the double quote, as in `print("Okay then, \"It ain't so\"")`.
- `\`: To escape the backslash in case you want to literally print it, as in `print('scanning C:\\temp folder')`. 

### Not all characters are visible. 

Sometimes we want to add some spaces to our code, maybe a tab or maybe a new line. Up until now we have just used `print()` to add an extra line. We can, however, do that right in the text. These sorts of characters are called 'whitespace' characters. There's two particularly relevant ones:  

- `\n` is the new line character;
- `\t` prints a tab. This is nice when you're printing tables as it counts spaces from the left-hand side and moves in multiples of 4 or 8 (depending on settings). Below I demonstrate the use of tab characters in a Haiku. 

See newline and tab in action below: 

In [None]:
print("\"A \\n\" (A New Line)")
print("By Bernie Hogan\n")
print("Autumn students learn:") 
print("\"A tab that provides \tstructure")
print("\t\t\tmay not provide space\"")

We will continue to explore features of characters when we return to collections, since a string is a collection of characters. 

# `float` and `int` as the two basic types of numbers 

For numbers, there are two basic primitive data types:

- __Integers__, which refer to whole numbers such as 1, 42 or 1812; 
- __Floating point numbers__, which refer to real numbers that can be approximated by digits using a decimal point, such as 0.5, 12.345 and 0.333333333.

In [None]:
# An integer
x = 7

# A floating point number. Still a whole number, but the .0 makes it a float rather than an integer.
y = 4.0

print ( type(x) )
print ( type(y) )

z = x + y 

# See how z inherits the floating point number even though the value could be an integer? 
print (type(z), z)

## Casting numbers 

Above I printed `type(<some_object>)`. The result of `type(x)` was `<class 'int'>` and the result of `type(y)` was `<class 'float'>`. But when we add these two together, we have to have comparable primitive data types, so the integer gets recast as a float. 

Data have a type, like `float` or `int` but we can transform data from one type to another by "casting" it. This does not always work as planned. But if it does not, then we have learned a valuable lesson and should seek a different way to convert the data if possible. 

If you convert an integer `7` to a float, that's easy. It's going to look the same, except it will have a decimal point and be ready to receive floating point precision, such as `7.0`. If you try to convert the string `"7.0"` to a float it should also work. But if you try to convert `"seven"` to a float the program will thrown an error. 

Below are some casting operations that work. Generally, it should be easy to find a way to turn a number into a string. It just ends up being that number, but as characters. Turning a string into a number on the other hand is really challenging given all the possible ways a number can be written in different languages, different spelling,etc. It would require some real understanding and is beyond the scope of simple Python operations. Below I will cast some ~~spells~~ data in different classes. 

In [None]:
# Start and end with Int: 
var_int = 7
print(var_int, type(var_int))

var_float = float(var_int)
print(var_float, type(var_float))

var_str = str(var_float)
print(var_str, type(var_str))

So far, so good. We went from an int `7`&rarr;float `7.0`&rarr;str `"7.0"`. But now we have a problem if we want to go back to int. It will throw a `ValueError`. If you run it, then it will state `ValueError: invalid literal for int() with base 10: '7.0'`. The problem here is that you might think that `.0` means nothing. But it's actually something out of nothing. Python is not fussy that the value after the decimal point is zero or some long string of digits. The matter is that it is something after the decimal and that particular conversion does not like it. 

In [None]:
var_int = int(var_str)
print(var_int, type(var_int))

It is often said that a language which requires you to specify the class of a variable ahead of time is a "strongly cast" language. For example, in Java you cannot write `x=5` unless you have previously defined `x` as an integer like `int x = 5`. 

Python is a "weakly cast" language meaning that it does not check the data type before assigning an object to that variable. So in one line you could code `x=0`, making `x` an integer number. On the next line, you can code `x="fabulous"` and Python does not have a problem with that.

## Basic number operations.

You will remember some basic number operations from arithmetic, such as addition and subtraction. Python implements these and a few others worth remembering. Let's have a look at several of these. You should pay attention to whether the result includes digits after the decimal point or not. We can do operations on both integers and floating points. When in doubt Python uses the type of number that gives more precision. So `1 + 2.5` will not round up or down, it will return `3.5`. Here is a list of common number operations:

- Addition: `X + Y`
- Subtraction: `X - Y`
- Multiplication: `X * Y`
- Exponent (i.e., raising X to the power of Y): `X ** Y` 
- Floating point division: `X / Y` 
- Integer division: `X // Y`
- Modulo (i.e., the remainder from integer division): `X % Y`

In [None]:
x = 9
y = 4
print("x = ",x)
print("y = ", y)
print("x + y = ", x + y)
print("x - y = ", x - y)
print("x * y = ", x * y)
print("x ** y = ", x ** y)
print("x / y = ", x / y)
print("x // y = ", x // y)
print("x % y = ", x % y)

## Floating point numbers mean floating point precision 

Floating point numbers are finite. Some numbers are not. For example, if you divide 1 by 3 the answer will be approximated by `0.33333` but really we would say that it goes on infinitely. So there is a limited level of precision using these primitive data types. Astrophysicists and others who need obscenely large levels of precision might find that floating point precision is not enough. For them there are specialist methods. You might never encounter the limits of floating point precision. Or rather, not until now. 

Observe this quirk below about what happens when you add some really large numbers together in Python. 

In [None]:
# Example 1 
x1 = 1/3
y1 = 1/3
z1 = 1/3

print("x1 is ", x1)
print("y1 is ", y1)
print("z1 is ", z1)
print() 

# Shouldn't this add up to 0.99999999999?
print("Then why is x + y + z = ", x1 + y1 + z1)
print("and not 0.9999999999999999?")

In [None]:
# Example 2 
x2  = 0.333333333333333  # <- Notice one digit short
x2a = 0.3333333333333333
y2  = 0.6666666666666666

print("If x2 + y2 is 0.9999999999999996:\n", x2 + y2)

print("Will x2a + y2 be 0.9999999999999999:\n", x2a + y2)

In [None]:
# Example 3 
x3 = 16/9 
y3 = 7/9
print("Won't 16/9 minus 7/9 equal 9/9, which is the same as 1?\n",x-y)

This is why we say floating point numbers are approximations of real numbers. $\pi$ is a real number, but it is also infinitely non-repeating. The computer then cannot load the full number of $\pi$ (as it does not have infinite memory), but it loads in an approximation. 

When we calculate things in Python, we are accepting a certain loss of precision. It does not do fractional math. In `Example 3`, the variable `x3` was first calculated as `1.7777777777777777` and stored as such. 

We rarely encounter that level of precision, but I think its nice to get a sense of our limits. 

# From characters to strings

Strings, as we have now seen, are collections of characters. So we can now tell that:

`"This is a string"`

There is a huge amount of work in programming that is really just dealing with strings in some form or another. Strings become the basis of more complex file types, like `csv` for tables, or `xml` for data. Getting good at programming for computational social science will often mean being confident in transforming strings from one form to another and being able to select part of a string as having some sort of pattern or structure. 

Whether it is text collected via communications, surveys, comments, or reviews, it will be a string and it will probably need formatting. We might want to detect the presence of words or remove certain characters because they do not print well. 

## Some example string methods
To use these you would have a string variable. I let `<var>` stand in for that here. 

- **`<var>.upper()`**: This returns a version of the string in all upper case. 
- **`<var>.lower()`**: This returns a version of the string string in all lower case.
- **`<var>.find(<substring>)`**: This returns the numerical index of the first complete mention of the substring. This returns `-1` if the substring is not found. 
- **`<var>.isalpha()`**: This returns `True` if the string is all alphabetical characters and `False` otherwise.  
- **`<var>.replace(<old_string>,<new_string>)`**: This takes additional two arguments, <old_string> which is what you want to find and <new_string> which is what you want to replace it with.   
- **`<var>.strip()`**: This is a useful command to remove whitespace from the beginning and the end of a string. To remove only from the beginning use `<var>.lstrip()`. To remove only from the right, use `<var>.rstrip()`.

If you want to find out details about a method, you can check the help. There are several ways to do that in Jupyter. The first is to run `help(<var>.<method>)`. But don't include the `()` at the end of the method or it will first run the method and then query what was returned for help. 

In [None]:
help("str".find)

The second way in Jupyter is to create a new tab (via the Launcher) and instead of selecting a notebook or Terminal, notice that in the lower right corner is a type of file called "Show contextual help". This will give realtime help on what command you are using. It's a second tab so drag it around Jupyter lab until you find a spot where it is visible but not obtrusive. Finally if you are in a code cell and you place your cursor inside a method and hit $shift$&rarr;$tab$ it should bring up the help for that variable as a tooltip.  

## How to use a string method
Methods and functions are the 'verbs' of Python. A method is basically a function except you invoke a method on an object. For now, it's okay to  use the terms interchangably, but the difference will be clear (and important) when we start building our own functions later. 

So if we have a string object, we can attach methods directly to the string:

~~~ python
"This is an object"
~~~

And we can attach the ```upper()``` method like so: 

~~~ python
"This is an object".upper()
~~~

It will then print:
~~~ python 
> THIS IS AN OBJECT
~~~ 

But more commonly we first assign a string to a variable then then use our method on the variable.

~~~ python
new_string = "This is a new string"
print ( new_string.upper() )
> THIS IS A NEW STRING
~~~

Try it below:

In [None]:
example_string = "The quick brown fox jumps over the lazy dog"

print("The original string:",example_string)

print("\nTo upper case:",example_string.upper())

print("\nTo lower case:",example_string.lower())

print("\nIs the string just alphabetic characters?",
      example_string.isalpha())

print("\nReplacing 'o' with 'you':", 
      example_string.replace("o","you"))

## Strings are really a special kind of a `list`. 

We will learn more about lists in the next chapter. But strings are just a special kind of list - one with only characters in the same encoding. This means we can do things with strings like we can with a list, like sort the strings, or ask for elements $3$ through $10$. Above we acted on the string as a collection of characters, but we can convert it to a list in two ways. Observe the difference between them:

In [None]:
new_str = "An example string to use"

print("1. The string itself:\n",new_str)
print("2. The string split by spaces:\n",new_str.split())
print("3. The string recast as a list:\n",list(new_str))

The method `new_str.split()` uses a single empty space character, ` `, by default. So then the program split there. You can split by a word or combination of characters. Also, while the program splits every instance of that pattern by default, you can ask it to split only once or $n$ times. See below:   

In [None]:
new_str2 = "Singing, dancing, grooving"

print("1. The string itself:\n", new_str2)
print("2. Splitting at ever 'ing' value:\n", new_str2.split("ing"))
print("3. Replacing only the first instance:\n", new_str2.split("ing",1))

The splitting on `ing` produced 5 elements: `'S'`, `''`, `', danc'`, `', groov'`, and `''`. Two of these are empty.  The first one is empty because we used `Singing` and so it wanted to split in between the two `ing`s, even though there was nothing between them. Similarly at the end it wanted to split between the final `ing` (from `grooving`) and the end of the string, so it produced a second empty string. In the last example we only splint at the `S` so that we have two elements, `S`, and the text from the other side of the first `ing`: `'ing, dancing, grooving'`. 

`replace`, like `split` has the same feature of being able to select multiple or single instances. 

In [None]:
new_str2 = "Singing, dancing, grooving"

print("1. The string itself:\n", new_str2)
print("2. Replacing all instances with ***:\n", new_str2.replace("ing","***"))
print("3. Replacing only the first instance with ***:\n", new_str2.replace("ing","***",1))


The second way we turned the string into a list was to take every character and make it its own element in a list. So that's why it displayed as `['A', 'n', ' ', 'e', ...`. In a sense, this is similar to a string in that it has just as many elements. But when you print the list versus the string, you can see that a list is geared towards thinking of the characters as elements in a collection, whereas the string is geared towards thinking of the characters as constituting a single "string" object. 

# Combining strings 

String formatting is complicated enough that it warrants its own section in this chapter. This will only be a cursory look at string formatting, but it will demonstrate the power of this approach. Usually we want to format strings because we have some variable that we want to include. So for example, if we want to print a greeting based on the day, we would make a generic greeting and then have a place to insert the day. 

Below we will insert some numbers into a string. You will notice that there are ways of formatting these numbers. They get a bit tricky and so here I'm just giving the basic syntax. 

## String concatenation

There are many ways to format a string but they tend to refer to either string concatenation or string insertion. To concatenate is to bring together. So if you have `"blue"` and `"berry"` you can concatenate them with `"blue" + "berry"`. This means that the plus symbol is __overloaded__. It means different things in different contexts. Watch us use the plus symbol to concatenate a string as well as add some numbers below: 

In [None]:
print("blue" + "berry")
print(7 + 9)
print("7" + "9")

So if you have a variable, `name` and you want to insert it into a greeting, you can concatenate with `print("Hello " + name)`. 

## Combining strings with f-insertions

One of the challenges with concatenation is how to deal with variables of different kinds. Like imagine you calculate a test score and then want to print the score and a string. Since the `+` symbol is overloaded it will not like trying to determine if it should add some numbers or concatenate some strings. See the error below: 

In [None]:
score = 17

print("Your score was " + score + "out of 30, or " + score/30 + "percent")

So we can format the number as a string ahead of time, but that would be unnecessary. Instead we can format the string itself. There's the classic way of doing this using the `format()` method. Then there's the new way (which I adore) using f-insertions. Let's see them both since you will encounter both in the wild. 

In both cases they use `{}` inside of string quotes to create a marker for where the variable should go in the string. So for the example above it would be like: `"Your score was {} out of 30, or {}."`. See below (you will see why I left out "percent" in a bit):

In [None]:
score = 17

print("Your score was {} out of 30 or {}.".format(score, score/30))

print(f"Your score was {score} out of 30 or {score/30}.")

Did you see how with the second approach, we inserted the variable right inside where it was supposed to be. It also made the colour of the variable stand out. In the second approach we had a different kind of string, called an f-insertion. This is like a regular string except we put `f` before the quote. It then knows that `{}` inside of a string means a variable is coming. 

### Tips for f-insertions

__Tip 1__. What if you want to print `{` literally inside of an f-insertion? You escape it like with other escape codes, by using `\`.

__Tip 2__. If you are using a dictionary inside your f-insertion, like `f"I like eating {food['sweet']}."` the dictionary should use different quotes to the statement. Here I used a single tick inside double quotes. 

## Formatting strings nicely

You might have seen how the result of 17/30 was very long and a number between 0 and 1 rather than a percentage? We can fix these things. After the varaible name inside te `{}` we can put some formatting codes. I rarely remember them all. Instead, I tend to look to [pyformat.info](https://pyformat.info/) which has clear examples of these. But here are two important ones: 

In [None]:
print(f"Out of 30 with two significant digits: {score/30:0.2f}")

In [None]:
print(f"Out of 30 as percent with one significant digit: {score/30:0.1%}")

In the first instance we used the code `0.2f` to mean `0` padding at the front, two significant digits after the decimal, and the number is a `float`. The second instance `0.1%` changed it to one significant digit, but understood the number as a percent so it was `56.7` rather than `0.57`. When printing the results of analysis, Python will report as much precision as it has, but that might make the number look noisy or hard to grasp. Reporting a meaningful level of significant digits thus makes it easier for people to see the number for what it means. 

# Conclusion

In this chapter we went from the most primitive data types (`int`, `float`, `char`) towards more meaningful data types. I showed how to convert data types, how to consider errors, and how to do some string manipulation. These get more interesting when we have many strings, numbers, or calculations. Then we can put these in a collection and start to ask questions of the collection itself. That is where we are headed next. 