# Video: How Python Represents Numbers

This video shows off the capabilities and limitations of Python's built in number types.


Script:
* If you play around with Python integers, you'll find that you can make them really big.
* You can calculate ten to the hundredth power, a googol, and Python will happily calculate it.

In [None]:
10 ** 100

10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

Script:
* 10 to the thousandth power works too.

In [None]:
10 ** 1000

1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

Script:
* If you try 10 ** 10000, you will get an error.


In [None]:
10 ** 10000

ValueError: Exceeds the limit (4300) for integer string conversion; use sys.set_int_max_str_digits() to increase the limit

Script:
* But if you look closely, the error is not about calculating the number.
* PAUSE AND HIGHLIGHT THE EXCEPTION TEXT
* The error asks if you really want to print a number more than 4300 digits long.
* It will also give you a pointer to change the setting if you really want to look at ten thousand zeros.
* So you can make an integer as big as you want in Python, but Python might protest printing it out.
* Well, technically, that depends on whether you have enough memory to store that integer.
* And in this course, and for large models, that can come up.
* So let's look at that now.
* My laptop at home has 16 gigabytes of RAM.
* One byte is about enough memory to store one of the characters in this notebook now.
* A gigabyte is about a billion bytes.
* I will calculate the real number of bytes of RAM in my laptop now.

In [None]:
16 * 2**30

17179869184

Script:
* Let's add some commas to make that more legible.

In [None]:
"{:,d}".format(16 * 2**30)

'17,179,869,184'

Script:
* We will come back to that arcane syntax later.
* For now, let's say that my laptop has about 16 billion bytes of memory.
* Some of that will be used by the operating system, and if we need a little more, it will temporarily store some of the data on disk, but that's a rough idea how much data I can work with comfortably on my laptop.
* In a cloud environment, like we are using for this class, it'll probably be closer to 2 billion.
* So how much does one of these integers take in Python?
* The built in sys module of Python has a function to check this.

In [None]:
import sys

In [None]:
sys.getsizeof(0)

24

Script:
* 28 bytes.
* Is that a lot for saying "nothing"?

In [None]:
sys.getsizeof(1)

28

Script:
* At least storing one is not bigger.
* Let's try some bigger numbers.

In [None]:
sys.getsizeof(2**10)

28

Script:
* Still 28 bytes.

In [None]:
sys.getsizeof(2**29)

28

In [None]:
sys.getsizeof(2**30)

32

Script:
* It got bigger.

In [None]:
sys.getsizeof(2**30-1)

28

Script:
* So the size increases at two to the thirtieth power, if this is consistent.

In [None]:
sys.getsizeof(2**60-1)

32

In [None]:
sys.getsizeof(2**60)

36

Script:
* And again at two to the sixtieth power.

In [None]:
sys.getsizeof(2**300)

68

Script:
* So at least for powers of two, it looks like the size is 28 bytes plus another 4 bytes every time you add a factor of two to the thirty.
* So how many integers around a million can we store?
* Let's use 2 gigabytes to match a light cloud setup.

In [None]:
(2 * 2**30) // sys.getsizeof(1_000_000)

76695844

Script:
* Not quite 77 million integers.
* That's not enough for generative AI, but plenty for a lot of more modest sized tasks.
* But remember later that it needs to cover both your input data and your model.
* What about if we are using real numbers?
* Most of our data is real numbers, not integers.

In [None]:
sys.getsizeof(1.1)

24

Script:
* 24 bytes.
* That is actually smaller than the smallest integer size.

In [None]:
sys.getsizeof(2.0 ** 30)

Script:
* Still the same size, at the point that we saw integers get bigger.

In [None]:
sys.getsizeof(2.0 ** 300)

24

Script:
* Still the same size.

In [None]:
sys.getsizeof(2.0 ** 3000)

OverflowError: (34, 'Numerical result out of range')

Script:
* But there is a limitation on how big that we can get using this floating point type.

In [None]:
sys.getsizeof(2.0 ** 1023)

24

Script:
* But up to that size limit, the amount of space does not go up.
* How does it get so big? What are the tradeoffs?
* BEGIN VIDEO? LIGHTBOARD?
* The floating point representation is like the scientific notation we learn in middle school.
* In that notation, we write a number like 1.3 x 105.
* "1.3" is called the mantissa, "10" is the base, and "5" is the exponent.
* Python will understand a similar notation, using e instead of 10.


In [None]:
1.3e10

13000000000.0

Script:
* BEGIN VIDEO? LIGHTBOARD OR SHOW SLIDE SHOWING BINARY NUMBERS.
* We humans usually use ten for the base in science, but computers usually use two for the base internally since they are based on binary logic.
* So their internal floating point representation looks like 1.0110 times 2 to the 10th power.
* That "1.0110" is now supposed to be binary digits, just zero or one, not zero through nine.
* To store one of these floating point numbers, the computer needs to save the mantissa and the exponent, but not the base since that is fixed at two.
* The most commonly used size floating point number takes 64 bits, or 8 bytes.
* That gives 52 bits after the decimal (really floating) point, and about 11 bits for the exponent.
* That is what Python is using now.
* But didn't the size check say 24 bytes before?
* END VIDEO

In [None]:
sys.getsizeof(1.0)

24

Script:
* The difference is overhead in how Python keeps track of each object.
* For example, that this is a floating point number and not some other type.
* We will look into ways to reduce that overhead later - who wants to pay triple cost?
* For now, let's look at what we can do with these floating point numbers.

In [None]:
2.0 ** 1024

OverflowError: (34, 'Numerical result out of range')

In [None]:
2.0 ** 1023

8.98846567431158e+307

In [None]:
-2.0 ** 1023

Script:
* We can express numbers up to about two to the 1023rd power.
* We can also express some very small numbers.

In [None]:
2.0 ** -1074

5e-324

Script:
* But if they get too small, they get rounded down to zero.

In [None]:
2.0 ** -1075

0.0

Script:
* So that is something to watch out for with very small numbers.
* A more common issue though, is that since the computer uses a binary representation internally, it doesn't perfectly match numbers that we write in decimal.
* Here is an example.

In [None]:
1.1

1.1

In [None]:
2.2

2.2

In [None]:
1.1 + 2.2

3.3000000000000003

Script:
* That is not quite right.
* But we can just say 3.3 and get the right answer.

In [None]:
3.3

3.3

Script:
* And we can confirm another way that they are different and it is not some weird printing issue.

In [None]:
(1.1 + 2.2) - 3.3

Script:
* Most of the time that you are working with real numbers, these tiny little errors will accumulate.
* Most of the time, they are immaterial, but their presence means that everything is approximate.
* We can't say things are exactly equal even though we can prove that they should be.

In [None]:
1.1 + 2.2 == 3.3

False

Script:
* Normally we would think those should be equal.
* Double equals there is checking that those are the same numbers, by the way.
* Another way these approximations manifest is when you add big numbers to small numbers.

In [None]:
1e15 + 1

1000000000000001.0

In [None]:
1e16 + 1

1e+16

Script:
* In the last addition, the extra one was too small compared to the total, and it was lost.
* The same can happen when adding very small numbers.

In [None]:
1 + 1e-15

1.000000000000001

In [None]:
1 + 1e-16

1.0

Script:
* Again, because ten to the negative sixteenth power is so small compared to one, it was lost by the approximation.
* This floating point representation just does not have enough resolution to add that small a number.
* So you should be aware of these tiny differences, but don't worry about them most of the time.
* They're like little rounding steps after each math operation, so the floating point representation does not get huge like the integers did.
* If you do need to compare numbers and be robust to these errors, the math module provides a function called isclose to compare numbers while allowing small differences.


In [None]:
import math

Script:
* Here is the built-in documentation for math dot isclose.

In [None]:
print(math.isclose.__doc__)

Determine whether two floating point numbers are close in value.

  rel_tol
    maximum difference for being considered "close", relative to the
    magnitude of the input values
  abs_tol
    maximum difference for being considered "close", regardless of the
    magnitude of the input values

Return True if a is close in value to b, and False otherwise.

For the values to be considered close, the difference between them
must be smaller than at least one of the tolerances.

-inf, inf and NaN behave similarly to the IEEE 754 Standard.  That
is, NaN is not close to anything, even itself.  inf and -inf are
only close to themselves.


Script:
* The web documentation has more detail about the tolerances.
* There is a link in the references section of Blackboard.
* Let's use it for the previous example now.

In [None]:
1.1 + 2.2 == 3.3

False

Script:
* This check still fails.

In [None]:
math.isclose(1.1 + 2.2, 3.3)

True

* But this one using math dot isclose works.
* Wrapping up, this floating point representation was designed to be a fast practical representation, not a perfectly precise but slow one.
* For the most part, it has served us well for about 45 years.