### Uniform Distribution
Lets start with generating some fake random data. You can get a random number between 0 and 1 using the python random module as follow:

In [1]:
import random
x=random.random()
print("The Value of x is", x)

The Value of x is 0.28134802625758737


Everytime you call random, you will get a new number.

*Exercise 1:* Using random, write a function `generate_uniform(N, mymin, mymax)`, that returns a python list containing N random numbers between specified minimum and maximum value. Note that you may want to quickly work out on paper how to turn numbers between 0 and 1 to between other values. 

In [2]:
# Skeleton
def generate_uniform(N,x_min,x_max):
    out = []
    ### BEGIN SOLUTION
    for _ in range(N):
        u = random.random()
        x = u * (x_max - x_min) + x_min
        out.append(x)
    ### END SOLUTION
    return out

In [3]:
# Test your solution here
data=generate_uniform(1000,-10,10)
print ("Data Type:", type(data))  #should print class 'list'
print ("Data Length:", len(data))
if len(data)>0: 
    print ("Type of Data Contents:", type(data[0])) #should print 'float'
    print ("Data Minimum:", min(data)) #close to -10
    print ("Data Maximum:", max(data)) #close to 10

Data Type: <class 'list'>
Data Length: 1000
Type of Data Contents: <class 'float'>
Data Minimum: -9.946413668247878
Data Maximum: 9.994724983321984


*Exercise 2a:* 
Write a function that computes the mean of values in a list. Recall the equation for the mean of a random variable $\bf{x}$ computed on a data set of $n$ values $\{ x_i \} = \{x_1, x_2, ..., x_n\}$  is ${\bf\bar{x}} = \frac{1}{n} \sum_i^n x_i$.

In [4]:
# Skeleton
def mean(Data):
    m=0.
    ### BEGIN SOLUTION
    if len(Data) == 0:
        return 0
    total = sum(Data) #summing values into the data
    m = total/len(Data) #compute the mean 
  
    ### END SOLUTION
    return m

In [5]:
# Test your solution here
print ("Mean of Data:", mean(data))

Mean of Data: 0.02628670908785162


*Exercise 2b:* 
Write a function that computes the variance of values in a list. Recall the equation for the variance of a random variable $\bf{x}$ computed on a data set of $n$ values $\{ x_i \} = \{x_1, x_2, ..., x_n\}$  is ${\bf\langle x \rangle} = \frac{1}{n} \sum_i^n (x_i - {\bf\bar{x}})$.

In [6]:
# Skeleton
def variance(Data):
    m=0.
    #BEGIN SOLUTION
    
    if len(Data) == 0:
        return 0
    mean_value = sum(Data) / len(Data) #mean
    diff_sum = sum((x - mean_value) ** 2 for x in Data) #variance by summing sq diff
    m = diff_sum / len(Data)
    
    ### END SOLUTION
    return m

In [7]:
# Test your solution here
print ("Variance of Data:", variance(data))

Variance of Data: 33.62603554206999


## Histogramming

*Exercise 3:* Write a function that bins the data so that you can create a histogram. An example of how to implement histogramming is the following logic:

* User inputs a list of values `x` and optionally `n_bins` which defaults to 10.
* If not supplied, find the minimum and maximum (`x_min`,`x_max`) of the values in x.
* Determine the bin size (`bin_size`) by dividing the range of the function by the number of bins.
* Create an empty list of zeros of size `n_bins`, call it `hist`.
* Loop over the values in `x`
    * Loop over the values in `hist` with index `i`:
        * If x is between `x_min+i*bin_size` and `x_min+(i+1)*bin_size`, increment `hist[i].` 
        * For efficiency, try to use continue to goto the next bin and data point.
* Return `hist` and the list corresponding of the bin edges (i.e. of `x_min+i*bin_size`).    

In [8]:
# Solution
def histogram(x,n_bins=10,x_min=None,x_max=None):
    ### BEGIN SOLUTION
    if x_min is None:
        x_min = min(x)
    if x_max is None:
        x_max = max(x)
   #binsize
    bsize = (x_max-x_min) / n_bins
    #list of 0s
    hist = [0] *  n_bins
    #list of bin edges
    bin_edges = [x_min + i * bsize for i in range(n_bins + 1)]
    
    #loop over x vlaues
    for value in x:
        #to find bin
        for i in range(n_bins):
            #check if value falls within the bin range
            if x_min + i *bsize <= value < x_min +(i +1) * bsize:
                hist[i] += 1
                break #moving onto next value
        
    
    ### END SOLUTION

    return hist,bin_edges

In [9]:
# Test your solution here
h,b=histogram(data,100)  #100 bins
print(h)

[14, 15, 10, 10, 16, 5, 11, 12, 12, 8, 10, 9, 7, 11, 10, 14, 10, 11, 9, 7, 11, 9, 8, 10, 7, 5, 5, 16, 8, 6, 3, 11, 4, 14, 11, 14, 11, 7, 7, 7, 9, 17, 8, 10, 13, 4, 6, 12, 8, 11, 7, 14, 14, 11, 10, 6, 7, 16, 11, 16, 16, 12, 11, 14, 5, 12, 11, 12, 6, 8, 13, 7, 12, 12, 6, 10, 8, 11, 8, 12, 9, 10, 8, 12, 12, 9, 10, 11, 7, 12, 14, 8, 9, 10, 12, 4, 11, 14, 13, 3]


*Exercise 4:* Write a function that uses the histogram function in the previous exercise to create a text-based "graph". For example the output could look like the following:
```
[  0,  1] : ######
[  1,  2] : #####
[  2,  3] : ######
[  3,  4] : ####
[  4,  5] : ####
[  5,  6] : ######
[  6,  7] : #####
[  7,  8] : ######
[  8,  9] : ####
[  9, 10] : #####
```

Where each line corresponds to a bin and the number of `#`'s are proportional to the value of the data in the bin. 

In [10]:
# Solution
def draw_histogram(x,n_bins,x_min=None,x_max=None,character="#",max_character_per_line=20):
    ### BEGIN SOLUTION
    hist, bin_edges = histogram(x,n_bins,x_min,x_max)
    #calculate the mac count to scale the graph
    maxcount = max(hist)
    #create the graph
    for i in range(n_bins):
        bin_start = bin_edges[i]
        bin_end = bin_edges[i+1]
        count = hist[i]
        # Scale the number of `character` symbols based on the bin count
        num_chars = int((count / maxcount) * max_character_per_line)
        print(f"[{bin_start:.2f}, {bin_end:.2f}] : {character * num_chars}")
    
    ### END SOLUTION

    return hist,bin_edges

In [11]:
# Test your solution here
h,b=draw_histogram(data,20)


[-9.95, -8.95] : ####################
[-8.95, -7.95] : ##############
[-7.95, -6.96] : ##############
[-6.96, -5.96] : ###############
[-5.96, -4.96] : #############
[-4.96, -3.96] : ############
[-3.96, -2.97] : #############
[-2.97, -1.97] : ##############
[-1.97, -0.97] : #################
[-0.97, 0.02] : ############
[0.02, 1.02] : #################
[1.02, 2.02] : #################
[2.02, 3.02] : #################
[3.02, 4.01] : ###############
[4.01, 5.01] : ###############
[5.01, 6.01] : ###############
[6.01, 7.00] : ###############
[7.00, 8.00] : ###############
[8.00, 9.00] : ################
[9.00, 9.99] : #############


## Functional Programming

*Exercise 5:* Write a function the applies a booling function (that returns true/false) to every element in data, and return a list of indices of elements where the result was true. Use this function to find the indices of entries greater than 0.5. 

In [12]:
def where(mylist,myfunc):
    out= []
    
    ### BEGIN SOLUTION
    for index, value in enumerate(mylist):
        if myfunc(value): #boolean control check
            out.append(index)
    
    ### END SOLUTION
    
    return out

In [13]:
# Test your solution here
indices = where(data, lambda x: x > 0.5)  # Find indices of entries greater than 0.5
print("Indices of entries greater than 0.5:", indices)

Indices of entries greater than 0.5: [0, 5, 6, 7, 9, 11, 15, 16, 18, 19, 20, 21, 22, 24, 25, 27, 28, 32, 35, 37, 40, 41, 43, 47, 48, 49, 51, 54, 55, 56, 57, 60, 62, 63, 64, 65, 70, 71, 72, 77, 80, 81, 83, 84, 88, 90, 91, 92, 93, 94, 95, 97, 98, 100, 101, 102, 105, 107, 108, 110, 111, 114, 116, 120, 122, 124, 136, 137, 138, 139, 142, 145, 147, 150, 153, 157, 158, 159, 161, 165, 166, 169, 171, 172, 173, 175, 177, 179, 181, 183, 188, 189, 193, 194, 197, 201, 204, 205, 206, 210, 213, 215, 217, 218, 219, 220, 222, 224, 231, 232, 233, 238, 239, 240, 245, 248, 250, 251, 256, 257, 260, 261, 263, 264, 268, 269, 270, 272, 275, 278, 279, 283, 288, 289, 291, 295, 298, 300, 304, 306, 307, 308, 309, 312, 315, 316, 321, 322, 323, 324, 325, 326, 327, 328, 330, 332, 333, 337, 338, 339, 347, 348, 349, 350, 353, 354, 356, 357, 358, 360, 363, 365, 366, 367, 368, 372, 373, 375, 376, 378, 379, 380, 382, 386, 387, 390, 391, 393, 395, 396, 398, 399, 401, 402, 403, 404, 406, 410, 412, 415, 416, 419, 423, 424, 

*Exercise 6:* The `inrange(mymin,mymax)` function below returns a function that tests if it's input is between the specified values. Write corresponding functions that test:
* Even
* Odd
* Greater than
* Less than
* Equal
* Divisible by

In [14]:
def in_range(mymin,mymax):
    def testrange(x):
        return x<mymax and x>=mymin
    return testrange

# Examples:
F1=in_range(0,10)
F2=in_range(10,20)

# Test of in_range
print (F1(0), F1(1), F1(10), F1(15), F1(20))
print (F2(0), F2(1), F2(10), F2(15), F2(20))

print ("Number of Entries passing F1:", len(where(data,F1)))
print ("Number of Entries passing F2:", len(where(data,F2)))

True True False False False
False False True True False
Number of Entries passing F1: 519
Number of Entries passing F2: 0


In [15]:
### BEGIN SOLUTION
def is_even():
    def test_even(x):
        return x % 2 == 0
    return test_even

def is_odd():
    def test_odd(x):
        return x % 2 != 0
    return test_odd

def greater_than(myval):
    def test_greater(x):
        return x > myval
    return test_greater

def less_than(myval):
    def test_less(x):
        return x < myval
    return test_less

def equal_to(myval):
    def test_equal(x):
        return x == myval
    return test_equal

def divisible_by(myval):
    def test_divisible(x):
        return x % myval == 0
    return test_divisible
    ###### Examples:
F3 = is_even()
F4 = is_odd()
F5 = greater_than(8)
F6 = less_than(4)
F7 = equal_to(10)
F8 = divisible_by(3)
    
### END SOLUTION

In [18]:
# Test your solution
print(F3(4), F3(3))  
print(F4(2), F4(3))  
print(F5(7), F5(5))  
print(F6(18), F6(15), F6(16))  
print(F7(10), F7(9), F7(11))  
print(F8(9), F8(10), F8(12))  

True False
False True
False False
False False False
True False False
True False True


*Exercise 7:* Repeat the previous exercise using `lambda` and the built-in python functions sum and map instead of your solution above. 

In [19]:
### BEGIN SOLUTION
# map is used to apply each lamda func over a dataset
def in_range(mymin, mymax):
    return lambda x: mymin <= x < mymax

def is_even():
    return lambda x: x % 2 == 0

def is_odd():
    return lambda x: x % 2 != 0

def greater_than(myval):
    return lambda x: x > myval

def less_than(myval):
    return lambda x: x < myval

def equal_to(myval):
    return lambda x: x == myval

def divisible_by(myval):
    return lambda x: x % myval == 0

######## Examples:
F1 = in_range(5, 20)
F2 = in_range(12, 20)
F3 = is_even()
F4 = is_odd()
F5 = greater_than(4)
F6 = less_than(14)
F7 = equal_to(10)
F8 = divisible_by(3)

# Test of in_range
print(F1(0), F1(1), F1(10), F1(15), F1(20))  # Test in_range
print(F2(0), F2(1), F2(10), F2(15), F2(20))  # Test in_range


# Test of other functions using sum and map
print("Even count:", sum(map(F3, data)))
print("Odd count:", sum(map(F4, data)))
print("Greater than 5 count:", sum(map(F5, data)))
print("Less than 15 count:", sum(map(F6, data)))
print("Equal to 10 count:", sum(map(F7, data)))
print("Divisible by 3 count:", sum(map(F8, data)))

### END SOLUTION

False False True True False
False False False True False
Even count: 0
Odd count: 1000
Greater than 5 count: 298
Less than 15 count: 1000
Equal to 10 count: 0
Divisible by 3 count: 0


## Monte Carlo

*Exercise 7:* Write a "generator" function called `generate_function(func,x_min,x_max,N)`, that instead of generating a flat distribution, generates a distribution with functional form coded in `func`. Note that `func` will always be > 0.  

Use the test function below and your histogramming functions above to demonstrate that your generator is working properly.

Hint: A simple, but slow, solution is to a draw random number `test_x` within the specified range and another number `p` between the `min` and `max` of the function (which you will have to determine). If `p<=function(test_x)`, then place `test_x` on the output. If not, repeat the process, drawing two new numbers. Repeat until you have the specified number of generated numbers, `N`. For this problem, it's OK to determine the `min` and `max` by numerically sampling the function.  

In [23]:
def generate_function(func,x_min,x_max,N=1000):
    out = list()
    ### BEGIN SOLUTION
    x_samples = [x_min + (x_max - x_min) * i / 999 for i in range(1000)]
    y_samples = [func(x) for x in x_samples]
    y_min = min(y_samples)
    y_max = max(y_samples)
    
    while len(out) < N:
        #genrate random x within the range
        test_x = random.uniform(x_min, x_max)
        #generate a random p withing the range
        p = random.uniform(y_min, y_max)

        #if p is less than or equal to the fucn val at test_c
        if p <= func(test_x):
            out.append(test_x)
    
    ### END SOLUTION
    
    return out

In [26]:
# A test function
def test_func(x,a=1,b=1):
    return abs(a*x+b)
x_min = -10
x_max = 10
N = 1000
samps = generate_function(test_func, x_min, x_max, N)
##use the filtering fucntions from histogramming
odd_count = sum(map(is_odd(), samps))
greater_than_five_count = sum(map(greater_than(5), samps))
equal_to_ten_count = sum(map(equal_to(10), samps))

print("Count of odd samples:", odd_count)
print("Count of samples greater than 5:", greater_than_five_count)
print("Count of samples equal to 10:", equal_to_ten_count)

Count of odd samples: 1000
Count of samples greater than 5: 436
Count of samples equal to 10: 0


*Exercise 8:* Use your function to generate 1000 numbers that are normal distributed, using the `gaussian` function below. Confirm the mean and variance of the data is close to the mean and variance you specify when building the Gaussian. Histogram the data. 

In [27]:
import math

def gaussian(mean, sigma):
    def f(x):
        return math.exp(-((x-mean)**2)/(2*sigma**2))/math.sqrt(math.pi*sigma)
    return f

# Example Instantiation
g1=gaussian(0,1)
g2=gaussian(10,3)

In [28]:
# Generate 1000 samples from the first Gaussian
samples = generate_function(g1, -5, 5, 1000) #xmin-5 and xmax5

mean_sample = sum(samples) / len(samples)
variance_sample = sum((x - mean_sample) ** 2 for x in samples) / len(samples)

print("Generated Samples Mean:", mean_sample)
print("Generated Samples Variance:", variance_sample)

Generated Samples Mean: 0.05917767638240581
Generated Samples Variance: 0.9311144883151685


*Exercise 9:* Combine your `generate_function`, `where`, and `in_range` functions above to create an integrate function. Use your integrate function to show that approximately 68% of Normal distribution is within one variance.

In [31]:
def integrate(func, x_min, x_max, n_points=1000):
    samples = generate_function(func, x_min, x_max, n_points)
    # Define the range within one stdev
    condition = in_range(mean - sigma, mean + sigma)
    
    count_within_range = len(where(samples, condition))
    # Approximate the integral (area under the curve)
    integral = count_within_range / len(samples)
    
    return integral

In [32]:
mean = 0
sigma = 1
g1 = gaussian(mean, sigma)
# Perform integration 
area = integrate(g1, mean - 4 * sigma, mean + 4 * sigma)  # Wider range to ensure coverage

print("Approximate area under the curve within one standard deviation:", area)

Approximate area under the curve within one standard deviation: 0.681


In [33]:
### About 68% of the normal distribution falls within 1stdev from mean. 