# Data types in mathematics and code: discrete versus continuous Data

In this notebook, we will learn about the difference between discrete and continuous data types in mathematics and code.

We will use `altair` to visualize the difference between discrete and continuous data types.

We will also learn about the difference between discrete and continuous probability distributions.

<!--
We will also learn about the difference between discrete and continuous random variables.

We will also learn about the difference between discrete and continuous probability mass functions. -->



## Discrete data types

Discrete data types are data types that can only take on a finite number of values.

For example, the number of students in a class is a discrete data type because it can only take on a finite number of values.

The number of students in a class can be 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, etc. but it cannot be 0.5 or 1.5 or 2.5 or 3.5 or 4.5 or 5.5 or 6.5 or 7.5 or 8.5 or 9.5, etc.

(Ed. note: actually, the number of students in a class has to be represented in fractional ways sometimes -- for example, for country-wide averages. Again, who gets to decide? "The birth rate for U.S. in 2022 was 12.012 births per 1000 people, a 0.09% increase from 2021.")

Another example of a discrete data type is the number of cars in a parking lot.

Example visualization using altair and python:
  
  ```python
  import altair as alt
  import pandas as pd

  df = pd.DataFrame({
      'number of students in a class': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
      'number of cars in a parking lot': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
  })

  alt.Chart(df).mark_point().encode(
      x='number of students in a class',
      y='number of cars in a parking lot'
  )
  ```


In [1]:
import altair as alt
import pandas as pd

df = pd.DataFrame({
    'number of students in a class': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
    'number of cars in a parking lot': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
})

alt.Chart(df).mark_point().encode(
    x='number of students in a class',
    y='number of cars in a parking lot'
)

## Continuous data types

Continuous data types are data types that can take on an infinite number of values.

For example, the height of a person is a continuous data type because it can take on an infinite number of values.

Example visualization of continuous data types using altair and python:

  ```python
  import altair as alt
  import pandas as pd

  df = pd.DataFrame({
      'height of a person': [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, ..., 1.9, 2.0]
  })

  alt.Chart(df).mark_point().encode(
      x='height of a person'
  )
  ```

In [3]:
import altair as alt
import pandas as pd

df = pd.DataFrame({
    'height of a person': [, 0.1, 0.2, 0.3, 0.4, 0.5, 0.01.9, 2.0]
})

alt.Chart(df).mark_point().encode(
    x='height of a person'
)

## Aside: how do you store data?

How do you store data in a computer?

You can store data in a computer using a data structure called an array.

An array is a data structure that stores a finite number of values.

### Why is there a finite number of values, I thought there was an infinite number of values for continuous data types?

There is an infinite number of values for continuous data types, but you can only store a finite number of values in a computer.

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

## Example of an infinite decimal number

The number 1/3 is an infinite decimal number.

$$\frac{1}{3} = 0.333333333333333
$$

The number $\pi$ is an infinite decimal number:

$$\pi = 3.1415926535897932384626433832795028841971693993751058209749445923078164062862089986280348253421170679...$$

In python, the number $\pi$ is stored as a finite number of digits:

```python
import math
math.pi
```


In [4]:
import math

In [5]:
math.pi

3.141592653589793

## Already, we are stuck with needing to be prescriptive: do we use python? Do we store 10 decimal places or more? 

It depends on the research question -- depends on the stakeholders, on societal incentives, who is involved, who makes a "map" of where such approximations matter or don't.

One of our goals is to learn the skills to decide when approximations might matter to answer which types of research questions, and when they might not.