## CMPINF2100 Week 02 | Calucalting Averages

### Overview:

We have learned how to store values in list, iterate, apply functions, apply methods, and define your own functions. 

Let's now use all of those skills to tackle an important and common data analysis task!

#### Important Modules:

We will need the `random` module, but we won't use it right away.

In [1]:
import random

#### Average:

How do we calculate the average of a collection of integers?

For example: What is the average of the following 3 numbers:

In [2]:
[1, 2, 3]

[1, 2, 3]

Calculating the averages can be summarized in three steps:

- sum the values in the selection
- determine the number of integers in the summarization
- divide the sum of the listed values by the number of integers to obtain the result

Step 1:

In [3]:
1+2+3

6

In [4]:
[1,2,3]

[1, 2, 3]

In [5]:
a_sum = 0
for a_value in [1,2,3]:
    a_sum = a_value + a_sum

In [6]:
a_sum

6

In [7]:
sum([1,2,3])

6

Step 2:

In [8]:
len([1,2,3])

3

Step 3:

In [9]:
sum([1,2,3]) / len([1,2,3])

2.0

In [10]:
6/3

2.0

In [11]:
x_a = [1,2,3,4,5,6,7,8]

In [12]:
sum(x_a)/len(x_a)

4.5

In [13]:
x_b = [-1,10,-2,3.14,5.5,300,-200,-11,1034]

In [14]:
sum(x_b)/len(x_b)

126.51555555555554

Let's define our own custom function for calling the average of values contained in a list.

In [16]:
def my_avg(x):
    """
    This is a docstring. It serves as a multiline comment

    Provide useful description and discussion of the input arguments and operations of the function

    This function accepts x as an input arguement. Let's assume x is a list of integers/floats

    The function returns the average of the list
    """

    return sum(x) / len(x)

In [17]:
%whos

Variable   Type        Data/Info
--------------------------------
a_sum      int         6
a_value    int         3
my_avg     function    <function my_avg at 0x0000022CEE677AF0>
random     module      <module 'random' from 'C:<...>inf2100\\lib\\random.py'>
x_a        list        n=8
x_b        list        n=9


In [18]:
my_avg

<function __main__.my_avg(x)>

In [19]:
my_avg()

TypeError: my_avg() missing 1 required positional argument: 'x'

In [20]:
my_avg([1,2,3])

2.0

In [21]:
my_avg(x_a)

4.5

In [22]:
my_avg(x_b)

126.51555555555554

Be cafeful: Our function `my_avg()` can **only** work with list that contain integers and floats. The function will not work if there is a character/string data type.

In [23]:
my_avg([1],[2],[3])

TypeError: my_avg() takes 1 positional argument but 3 were given

In [24]:
help(my_avg)

Help on function my_avg in module __main__:

my_avg(x)
    This is a docstring. It serves as a multiline comment
    
    Provide useful description and discussion of the input arguments and operations of the function
    
    This function accepts x as an input arguement. Let's assume x is a list of integers/floats
    
    The function returns the average of the list



#### Applying Our Function:

We learned how to **generate** list of random numbers

Let's make a list of random numbers now and use our function to calculate the average of those random numbers

In [25]:
random.seed(2100)

xu = [ random.random() for _ in range(11) ]


In [26]:
len(xu)

11

In [27]:
type(xu)

list

In [28]:
%whos

Variable   Type        Data/Info
--------------------------------
a_sum      int         6
a_value    int         3
my_avg     function    <function my_avg at 0x0000022CEE677AF0>
random     module      <module 'random' from 'C:<...>inf2100\\lib\\random.py'>
x_a        list        n=8
x_b        list        n=9
xu         list        n=11


In [29]:
xu

[0.764021138054951,
 0.4692037101382426,
 0.09010549233469012,
 0.08980190601697957,
 0.5561956463771123,
 0.41097303757112746,
 0.6979324821652212,
 0.9157019989699348,
 0.06734286989910154,
 0.7015550970478079,
 0.11078753610950587]

What is the average or **mean** of the 11 values

In [30]:
my_avg(xu)

0.4430564467895158

In [31]:
my_avg([random.random() for _ in range(1000) ] )

0.5130247005855082

What happens if we generate 11 random values, 3 different times

This can happen if we collect observations on different days, different months, or different locations. (main point, data entry changes)

For example: maybe collecting or measuring the heights of students in a classroom 

In [32]:
random.seed(2100)

yu = [random.random() for _ in range (11)]

zu = [random.random() for _ in range (11)]

qu = [random.random() for _ in range (11)]

In [33]:
%whos

Variable   Type        Data/Info
--------------------------------
a_sum      int         6
a_value    int         3
my_avg     function    <function my_avg at 0x0000022CEE677AF0>
qu         list        n=11
random     module      <module 'random' from 'C:<...>inf2100\\lib\\random.py'>
x_a        list        n=8
x_b        list        n=9
xu         list        n=11
yu         list        n=11
zu         list        n=11


In [43]:
qu

[0.6669762270729921,
 0.4942428713031397,
 0.7956962831040792,
 0.552986632512965,
 0.4893355644568754,
 0.4753411061753585,
 0.956484014417971,
 0.5808057706555326,
 0.6972309683740164,
 0.6530137967372368,
 0.4843336405398452]

In [44]:
yu

[0.764021138054951,
 0.4692037101382426,
 0.09010549233469012,
 0.08980190601697957,
 0.5561956463771123,
 0.41097303757112746,
 0.6979324821652212,
 0.9157019989699348,
 0.06734286989910154,
 0.7015550970478079,
 0.11078753610950587]

In [45]:
zu

[0.9379189789375049,
 0.2987506098933619,
 0.7342115932077558,
 0.8939978976343719,
 0.36690797563632527,
 0.7840449290430325,
 0.37956664277465757,
 0.6333253705111195,
 0.3605664070992006,
 0.7231215087463677,
 0.07281302170746007]

In [46]:
yu == zu 

False

In [47]:
zu == qu

False

In [48]:
yu == qu

False

The three collections have different values, so what about their averages?

In [49]:
my_avg(yu)

0.4430564467895158

In [50]:
my_avg(zu)

0.5622931759264689

In [51]:
my_avg(qu)

0.6224042613954558

Which of the above averages is correct if all of them were called the same way?