## HackerRank - 10 days of Statistics

In [1]:
# import required libraries
import numpy as np
import statistics
from statistics import *

### Day 0: Mean, Median, and Mode
#### Objective
In this challenge, we practice calculating the mean, median, and mode. Check out the Tutorial tab for learning materials and an instructional video!

#### Task
Given an array, X, of N integers, calculate and print the respective mean, median, and mode on separate lines. If your array contains more than one modal value, choose the numerically smallest one.

Note: Other than the modal value (which will always be an integer), your answers should be in decimal form, rounded to a scale of 1 decimal place (i.e., 12.3, 7.0 format).

#### Example
N = 6
X = [1,2,3,4,5,5]

The mean is 20/6 = 3.3.
The median is 3+4/2 = 3.5.
The mode is 5 because  occurs most frequently.

#### Input Format

The first line contains an integer, N, the number of elements in the array. The second line contains N space-separated integers that describe the array's elements.

#### Constraints:

![image.png](attachment:image.png)

#### Output Format

Print 3 lines of output in the following order:

 1. Print the mean on the first line to a scale of 1 decimal place (i.e., 12.3, 7.0).
 2. Print the median on a new line, to a scale of 1 decimal place (i.e., 12.3, 7.0).
 3. Print the mode on a new line. If more than one such value exists, print the numerically smallest one.

#### Sample Input:
10

64630 11735 14216 99233 14470 4978 73429 38120 51135 67060

#### Sample Output

43900.6

44627.5

4978

In [4]:
# n = 10.0
# x = [64630, 11735, 14216, 99233, 14470, 4978, 73429, 38120, 51135, 67060]

def calculate_mean(data):
    X = sorted(data)
    mean = np.mean(X)
    return round(mean,1)

def calculate_median(data):
    X = sorted(data)
    median = np.median(X)
    return round(median,1)

def calculate_mode(data):
    X = sorted(data)
    Y = np.bincount(X)
    mode = np.argmax(Y)
    return round(mode,1)

N = int(input())
a = input().split()
X = [int(i) for i in a]

print(calculate_mean(X))
print(calculate_median(X))
print(calculate_mode(X))


43900.6
44627.5
4978


### Day 0: Weighted Mean
#### Objective
In the previous challenge, we calculated a mean. In this challenge, we practice calculating a weighted mean. Check out the Tutorial tab for learning materials and an instructional video!

#### Task
Given an array, X, of N integers and an array, W, representing the respective weights of X's elements, calculate and print the weighted mean of X's elements. Your answer should be rounded to a scale of 1 decimal place (i.e., 12.3 format).

#### Example
X = [1,2,3]
W = [5,6,7]

The array of values X[i] * W[i] = [5,12,21]. Their sum is 38. The sum of W = 18. The weighted mean is 38/18 = 2.11111. Print 2.1 and return.

#### Function Description
Complete the weightedMean function in the editor below.

weightedMean has the following parameters:
- int X[N]: an array of values
- int W[N]: an array of weights

#### Prints
- float: the weighted mean to one decimal place

#### Input Format

The first line contains an integer, N, the number of elements in arrays X and W.
The second line contains N space-separated integers that descdribe the elements of array X.
The third line contains N space-separated integers that descdribe the elements of array W.

#### Constraints

![image.png](attachment:image.png)

#### Output Format

Print the weighted mean on a new line. Your answer should be rounded to a scale of 1 decimal place (i.e., 12.3 format).

#### Sample Input

STDIN           Function
-----           --------
5               X[] and W[] size n = 5
10 40 30 50 20  X = [10, 40, 30, 50, 20]  
1 2 3 4 5       W = [1, 2, 3, 4, 5]

#### Sample Output

32.0

#### Explanation

We use the following formula to calculate the weighted mean:

![image-2.png](attachment:image-2.png)

And then print our result to a scale of 1 decimal place (32.0) on a new line.

In [7]:
import math
import os
import random
import re
import sys

#
# Complete the 'weightedMean' function below.
#
# The function accepts following parameters:
#  1. INTEGER_ARRAY X = [1,2,3]
#  2. INTEGER_ARRAY W = [5,6,7]
#

def weightedMean(X, W):
    if len(X) != len(W):
        raise ValueError("The number of values as well as weights must be the same")
    
    total_sum = sum(i*j for i,j in zip(X,W))
    total_weight = sum(W)
    
    weighted_mean = total_sum/total_weight
    print(round(weighted_mean,1))

if __name__ == '__main__':
    n = int(input().strip())

    vals = list(map(int, input().rstrip().split()))

    weights = list(map(int, input().rstrip().split()))

    weightedMean(vals, weights)

2.1


### Day 1: Quartiles
#### Objective
In this challenge, we practice calculating quartiles. Check out the Tutorial tab for learning materials and an instructional video!

#### Task
Given an array, arr, of n integers, calculate the respective first quartile (Q1), second quartile (Q2), and third quartile (Q3). It is guaranteed that Q1, Q2, and Q3 are integers.

#### Example
arr = [9,5,7,1,3]

The sorted array is [1,3,5,7,9] which has an odd number of elements. The lower half consists of [1,3], and its median is 1+3/2 = 2. The middle element is 5 and represents the second quartile. The upper half is [7,9] and its median is 7+9/2 = 8. Return [2,5,8].

arr = [1,3,5,7]
The array is already sorted. The lower half is [1,3] with a median = 1+3/2 = 2. The median of the entire array is 3+5/2 = 4, and of the upper half is 5+7/2 = 6. Return [2,4,6].

#### Function Description

Complete the quartiles function in the editor below.

quartiles has the following parameters:

 - int arr[n]: the values to segregate

#### Returns

 - int[3]: the medians of the left half of arr, arr in total, and the right half of arr.

#### Input Format

The first line contains an integer, n, the number of elements in arr.
The second line contains n space-separated integers, each an arr[i].

#### Constraints
![image.png](attachment:image.png)

#### Sample Input

STDIN                   Function
-----                   --------    
9                       arr[] size n = 9 
3 7 8 5 12 14 21 13 18  arr = [3, 7, 8, 5, 12, 14, 21, 13,18]

#### Sample Output

6
12
16

#### Explanation
![image-2.png](attachment:image-2.png)

There is an odd number of elements, and the middle element, the median, is 12.

As there are an odd number of data points, we do not include the median (the central value in the ordered list) in either half:

Lower half (L): 3, 5, 7, 8

Upper half (U): 13, 14, 18, 21

Now find the quartiles:

![image-3.png](attachment:image-3.png)

In [17]:
import math
from math import *
import os
import random
import re
import sys
import numpy as np

#
# Complete the 'quartiles' function below.
#
# The function is expected to return an INTEGER_ARRAY.
# The function accepts INTEGER_ARRAY arr as parameter.
#

def quartiles(arr):
    arr = sorted(arr)
    quart = quantiles(arr)
    return [ceil(quart[0]), round(quart[1]), floor(quart[2])]  

if __name__ == '__main__':
    n = int(input().strip())

    data = list(map(int, input().rstrip().split()))

    res = quartiles(data)
    print(res)


[2, 4, 6]


### Day 1: Interquartile Range
#### Objective
In this challenge, we practice calculating the interquartile range. We recommend you complete the Quartiles challenge before attempting this problem.

#### Task
The interquartile range of an array is the difference between its first (Q1) and third (Q3) quartiles (i.e., Q3-Q1).

Given an array, values, of n integers and an array, freqs, representing the respective frequencies of values's elements, construct a data set, S, where each values[i] occurs at frequency freq[i]. Then calculate and print S's interquartile range, rounded to a scale of 1 decimal place (i.e., 12.3 format).

Tip: Be careful to not use integer division when averaging the middle two elements for a data set with an even number of elements, and be sure to not include the median in your upper and lower data sets.

#### Example
values = [1,2,3]

freqs = [3,2,1]

Apply the frequencies to the values to get the expanded array S = [1,1,1,2,2,3]. Here left = [1,1,1], right = [2,2,3]. The median of the left half, Q1 = 1.0, the middle element. For the right half, Q3 = 2.0. Print the difference to one decimal place: Q3 - Q1 = 2.0 - 1.0 = 1, so print 1.0.

#### Function Description

Complete the interQuartile function in the editor below.

interQuartile has the following parameters:
- int values[n]: an array of integers
- int freqs[n]: values[i] occurs freqs[i] times in the array to analyze

#### Prints
 - float: the interquartile range to 1 place after the decimal

#### Input Format

The first line contains an integer, n, the number of elements in arrays values and freqs.
The second line contains n space-separated integers describing the elements of array values.
The third line contains n space-separated integers describing the elements of array freqs.

#### Constraints
![image.png](attachment:image.png)

#### Output Format

Print the interquartile range for the expanded data set on a new line. Round the answer to a scale of 1 decimal place (i.e., 12.3 format).

#### Sample Input

STDIN           Function
-----           --------
6               arrays size n = 6
6 12 8 10 20 16 values = [6, 12, 8, 10, 20, 16]
5 4 3 2 1 5     freqs = [5, 4, 3, 2, 1, 5]

#### Sample Output

9.0

#### Explanation

The given data is:

![image-2.png](attachment:image-2.png)

First, we create data set S containing the data from set values at the respective frequencies specified by freqs:

![image-3.png](attachment:image-3.png)

As there are an even number of data points in the original ordered data set, we will split this data set exactly in half:

Lower half (L): 6, 6, 6, 6, 6, 8, 8, 8, 10, 10

Upper half (U): 12, 12, 12, 12, 16, 16, 16, 16, 16, 20

Next, we find Q1. There are 10 elements in lower half, so Q1 is the average of the middle two elements: 6 and 8. Thus, Q1 = 6+8/2 = 7.0.

Next, we find Q3. There are 10 elements in upper half, so Q3 is the average of the middle two elements: 16 and 16. Thus, Q3 = 16+16/2 = 16.0.

From this, we calculate the interquartile range as Q3 - Q1 = 16.0 - 7.0 = 9.0 and print 9.0 as our answer.

In [53]:
#!/bin/python3

import math
from math import *
import os
import random
import re
import sys
import numpy as np
from statistics import *

#
# Complete the 'interQuartile' function below.
#
# The function accepts following parameters:
#  1. INTEGER_ARRAY values
#  2. INTEGER_ARRAY freqs
#

def interQuartile(values, freqs):
    sample = []
    for i,j in zip(values,freqs):
        sample += [i]*j
    sample = sorted(sample)
    Q1, Q2, Q3 = quantiles(sample)
    iqr = Q3 - Q1
    print(float(floor(iqr)))
    

if __name__ == '__main__':
    n = int(input().strip())

    val = list(map(int, input().rstrip().split()))

    freq = list(map(int, input().rstrip().split()))

    interQuartile(val, freq)

9.0


### Day 1: Standard Deviation
#### Objective
In this challenge, we practice calculating standard deviation. Check out the Tutorial tab for learning materials and an instructional video!

#### Task
Given an array, arr, of n integers, calculate and print the standard deviation. Your answer should be in decimal form, rounded to a scale of 1 decimal place (i.e., 12.3 format). An error margin of +-0.1 will be tolerated for the standard deviation.

#### Example
arr = [2,5,2,7,4]

The sum of the array values is 20 and there are 5 elements. The mean is 4.0.

Subtract the mean from each element, square each result, and take their sum.

![image.png](attachment:image.png)

#### Function Description

Complete the stdDev function in the editor below.

stdDev has the following parameters:
- int arr[n]: an array of integers

#### Prints
- float: the standard deviation to 1 place after the decimal

#### Input Format

The first line contains an integer, n, denoting the size of arr.
The second line contains n space-separated integers that describe arr.

#### Constraints
![image-2.png](attachment:image-2.png)

#### Output Format

Print the standard deviation on a new line, rounded to a scale of 1 decimal place (i.e., 12.3 format).

#### Sample Input

STDIN           Function
-----           --------
5               arr[] size n = 5
10 40 30 50 20  arr =[10, 40, 30, 50, 20]

#### Sample Output

14.1

#### Explanation

First, find the mean:

![image-3.png](attachment:image-3.png)

In [61]:
import math
import os
import random
import re
import sys
import statistics

#
# Complete the 'stdDev' function below.
#
# The function accepts INTEGER_ARRAY arr as parameter.
#

def stdDev(arr):
    arr = sorted(arr)
    print(round(statistics.pstdev(arr),1))

if __name__ == '__main__':
    n = int(input().strip())

    vals = list(map(int, input().rstrip().split()))

    stdDev(vals)

14.1


### Day 2: Basic Probability
#### Objective
In this challenge, we practice calculating probability. Check out the Tutorial tab for a breakdown of probability fundamentals!

#### Task
In a single toss of 2 fair (evenly-weighted) six-sided dice, find the probability that their sum will be at most 9.

![image.png](attachment:image.png)

In [3]:
from itertools import product
from fractions import Fraction

def get_probability_at_most(dice1:list, dice2:list, max_sum:int) -> None:
    len_product = len(dice1)*len(dice2)
    print(list(product(dice1,dice2)))
    filtered = [el for el in product(dice1,dice2) if sum(el) <= max_sum]
    len_filtered = len(filtered)
    print(filtered)
    
    print(len_filtered)
    print(len_product)
    
    print(Fraction(len_filtered,len_product))
    
dice1 = list(range(1,7))
dice2 = [1,2,3,4,5,6]
get_probability_at_most(dice1=dice1, dice2=dice2, max_sum=9)

[(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6), (3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6), (4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6), (5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6), (6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)]
[(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6), (3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6), (4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (5, 1), (5, 2), (5, 3), (5, 4), (6, 1), (6, 2), (6, 3)]
36
30
5/6


### Day 2: More Dice
#### Objective
In this challenge, we practice calculating probability. We recommend you review the previous challenge's Tutorial before attempting this problem.

#### Task
In a single toss of 2 fair (evenly-weighted) six-sided dice, find the probability that the values rolled by each die will be different and the two dice have a sum of 6.

![image.png](attachment:image.png)

In [4]:
from itertools import product
from fractions import Fraction

def get_probability(dice1:list, dice2:list, dice_sum:int) -> None:
    len_product = len(dice1)*len(dice2)
    print(list(product(dice1,dice2)))
    
    filtered = [(d1,d2) for d1,d2 in product(dice1,dice2) if (sum([d1,d2])==dice_sum) and
                (d1!=d2)]
    len_filtered = len(filtered)
    print(filtered)
    print(len_filtered)
    print(len_product)
    print(Fraction(len_filtered,len_product))
    
dice1 = list(range(1,7))
dice2 = [1,2,3,4,5,6]

get_probability(dice1=dice1, dice2=dice2, dice_sum=6)

[(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6), (3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6), (4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6), (5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6), (6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)]
[(1, 5), (2, 4), (4, 2), (5, 1)]
4
36
1/9


### Day 2: Compound Event Probability
#### Objective
In this challenge, we practice calculating the probability of a compound event. We recommend you review today's Probability Tutorial before attempting this challenge.

#### Task
There are 3 urns labeled X, Y, and Z.

Urn X contains 4 red balls and 3 black balls.

Urn Y contains 5 red balls and 4 black balls.

Urn Z contains 4 red balls and 4 black balls.

One ball is drawn from each of the 3 urns. What is the probability that, of the 3 balls drawn, 2 are red and 1 is black?

![image.png](attachment:image.png)

#### Solution:
#### Urn X:
P(R) = 4/7 & P(B) = 3/7

#### Urn Y:
P(R) = 5/9 & P(B) = 4/9

#### Urn Z:
P(R) = 1/2 & P(B) = 1/2

P_2R_1B = ((4/7)x(5/9)x(1/2)) + ((4/7)x(4/9)x(1/2)) + ((3/7)x(5/9)x(1/2))

P_2R_1B = (20/126) + (16/126) + (15/126)

P_2R_1B = 51/126

P_2R_1B = 17/42

### Day 3: Conditional Probability
#### Objective
In this challenge, we get started with conditional probability. Check out the Tutorial tab for learning materials!

#### Task
Suppose a family has 2 children, one of which is a boy. What is the probability that both children are boys?

![image.png](attachment:image.png)

#### Solution:
The family has 2 childrens, one of which is boy. So the sample space will be:

S = {(B,B), (B,G), (G,B)}

P(BB) = 1/3

In [8]:
from itertools import product
from fractions import Fraction

# since the family has 2 childrens and one of them is boy. however, they did not mentioned whether first born is a boy or second born is a boy.
# so the possibilities are:

S1 = ['B','G'] # considering first children born is boy
S2 = ['G','B'] # considering first children born is girl

sample_space = [(s1,s2) for s1,s2 in product(S1,S2) if [s1,s2]!=['G','G']]

sample_length = len(sample_space)

filtered = [(s1,s2) for s1,s2 in product(S1,S2) if (s1==s2) and [s1,s2]!=['G','G']]

filtered_length = len(filtered)

print(filtered)

print(filtered_length)

print(sample_length)

print(Fraction(filtered_length, sample_length))

[('B', 'B')]
1
3
1/3


### Day 3: Cards of the Same Suit
#### Objective
In this challenge, we're getting started with combinations and permutations. Check out the Tutorial tab for learning materials!

#### Task
You draw 2 cards from a standard 52-card deck without replacing them. What is the probability that both cards are of the same suit?

![image.png](attachment:image.png)

#### Solution:
Let's say that the first card drawn is of hearts, so its probability will be:

P(1H) = 13/52

Now since we are not replacing the card, so the probability the next one is also a hearts is:

P(2H) = 12/51

Since there are 4 suits in a standard 52-card deck, so the answer will be:

answer = {(13/52) * (12/51)} +  {(13/52) * (12/51)} + {(13/52) * (12/51)} + {(13/52) * (12/51)}

answer = { (13/52) * (12/51) } *  4

answer = {(1/4) * (12/51)} * 4

answer = 12/51 or 4/17

In [10]:
from itertools import combinations
from fractions import Fraction

cards = list(13*'h' + 13*'s'+ 13*'c'+ 13*'d')
both_cards_same_suit = list(combinations(cards,2))
x = [i for i in both_cards_same_suit if i[0]==i[1]]

print(cards)
print(both_cards_same_suit)
print(x)
print(len(x))
print(len(both_cards_same_suit))
print(Fraction(len(x),len(both_cards_same_suit)))

['h', 'h', 'h', 'h', 'h', 'h', 'h', 'h', 'h', 'h', 'h', 'h', 'h', 's', 's', 's', 's', 's', 's', 's', 's', 's', 's', 's', 's', 's', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'd', 'd', 'd', 'd', 'd', 'd', 'd', 'd', 'd', 'd', 'd', 'd', 'd']
[('h', 'h'), ('h', 'h'), ('h', 'h'), ('h', 'h'), ('h', 'h'), ('h', 'h'), ('h', 'h'), ('h', 'h'), ('h', 'h'), ('h', 'h'), ('h', 'h'), ('h', 'h'), ('h', 's'), ('h', 's'), ('h', 's'), ('h', 's'), ('h', 's'), ('h', 's'), ('h', 's'), ('h', 's'), ('h', 's'), ('h', 's'), ('h', 's'), ('h', 's'), ('h', 's'), ('h', 'c'), ('h', 'c'), ('h', 'c'), ('h', 'c'), ('h', 'c'), ('h', 'c'), ('h', 'c'), ('h', 'c'), ('h', 'c'), ('h', 'c'), ('h', 'c'), ('h', 'c'), ('h', 'c'), ('h', 'd'), ('h', 'd'), ('h', 'd'), ('h', 'd'), ('h', 'd'), ('h', 'd'), ('h', 'd'), ('h', 'd'), ('h', 'd'), ('h', 'd'), ('h', 'd'), ('h', 'd'), ('h', 'd'), ('h', 'h'), ('h', 'h'), ('h', 'h'), ('h', 'h'), ('h', 'h'), ('h', 'h'), ('h', 'h'), ('h', 'h'), ('h', 'h'), ('h', 'h'), ('h', 

### Day 3: Drawing Marbles
#### Objective
In this challenge, we're reinforcing what we've learned today. In case you've missed them, today's tutorials are on Conditional Probability and Combinations and Permutations.

#### Task
A bag contains 3 red marbles and 4 blue marbles. Then, 2 marbles are drawn from the bag, at random, without replacement. If the first marble drawn is red, what is the probability that the second marble is blue?

![image.png](attachment:image.png)

#### Solution:
P(R) = 3/7

P(B) = 4/7

P(R|B) = 3/6 = 1/2

P(R|Bc) = 2/6 = 1/3

P(Bc) = 1-4/7 = 3/7 

P(B|R) = P(R|B) * P(B) / P(R|B) * P(B) + P(R|Bc) * P(Bc)

P(B|R) = { (1/2) * (4/7) } / { (1/2) * (4/7) + (1/3) * (3/7) }

P(B|R) = { (4/14) } / { (4/14) + (1/7) }

P(B|R) = { (4/14) } / { (4/14) + (2/14) }

P(B|R) = (4/14) / (6/14)

P(B|R) = (4/6)

P(B|R) = (2/3)

### Day 4: Binomial Distribution I:
#### Objective
In this challenge, we learn about binomial distributions. Check out the Tutorial tab for learning materials!

#### Task
The ratio of boys to girls for babies born in Russia is 1.09:1. If there is 1 child born per birth, what proportion of Russian families with exactly 6 children will have at least 3 boys?

Write a program to compute the answer using the above parameters. Then print your result, rounded to a scale of 3 decimal places (i.e., 1.234 format).

#### Input Format

A single line containing the following values:

1.09 1

If you do not wish to read this information from stdin, you can hard-code it into your program.

#### Output Format

Print a single line denoting the answer, rounded to a scale of 3 decimal places (i.e., 1.234 format).

In [12]:
import math
from math import comb

def binomial_probability(boys:float, girls:float):
    p = round(boys/(boys+girls),3)
    print(p)
    q = round(1-p,3)
    print(q)
    
    n = int(input())
    
    binomial = lambda x,n,p,q:(math.comb(n,x)) * (p**x) * (q**(n-x))
    
    P = [binomial(x,n,p,q) for x in range(3,7)]
    
    print(round(sum(P),3))
    
if __name__ == "__main__":
    boys = float(input())
    girls = float(input())
    
    binomial_probability(boys=boys, girls=girls)

0.522
0.478
0.749


In [17]:
from math import factorial as f
b,g=map(float,input().split())
p=b/(b+g)
q=1-p
n=6
def comb(n,r):
    return f(n)/(f(r)*f(n-r))
print(format(sum(comb(n,r)*(p**r)*(q**(n-r))for r in range (3,7)),'.3f'))

0.696


### Day 4: Binomial Distribution II
#### Objective
In this challenge, we go further with binomial distributions. We recommend reviewing the previous challenge's Tutorial before attempting this problem.

#### Task
A manufacturer of metal pistons finds that, on average, 12% of the pistons they manufacture are rejected because they are incorrectly sized. What is the probability that a batch of 10 pistons will contain:
 1. No more than 2 rejects?
 2. At least 2 rejects?

#### Input Format

A single line containing the following values (denoting the respective percentage of defective pistons and the size of the current batch of pistons):

12 10

If you do not wish to read this information from stdin, you can hard-code it into your program.

#### Output Format

Print the answer to each question on its own line:

 1. The first line should contain the probability that a batch of 10 pistons will contain no more than 2 rejects.
 2. The second line should contain the probability that a batch of 10 pistons will contain at least 2 rejects.

Round both of your answers to a scale of 3 decimal places (i.e., 1.234 format).

In [20]:
import math
from math import factorial as f
from math import comb

def binomial(x, n, p):
    return comb(n, x) * (p**x) * ((1-p)**(n-x))

p,n = map(int,input().split())

print(round(sum([binomial(i, n, p/100) for i in range(0,3)]),3))
print(round(sum([binomial(i, n, p/100) for i in range(2,n+1)]),3))

0.891
0.342


### Day 4: Geometric Distribution I
#### Objective
In this challenge, we learn about geometric distributions. Check out the Tutorial tab for learning materials!

#### Task
The probability that a machine produces a defective product is 1/3. What is the probability that the 1st defect occurs the 5th item produced?

#### Input Format

The first line contains the respective space-separated numerator and denominator for the probability of a defect, and the second line contains the inspection we want the probability of being the first defect for:

1 3

5

If you do not wish to read this information from stdin, you can hard-code it into your program.

#### Output Format

Print a single line denoting the answer, rounded to a scale of 3 decimal places (i.e., 1.234 format).

In [21]:
numerator, denominator = map(int, input().split())
n = int(input())

p = numerator/denominator
q = 1-p

answer = q**(n-1) * p
print(round(answer,3))

0.066


### Day 4: Geometric Distribution II
#### Objective
In this challenge, we go further with geometric distributions. We recommend reviewing the Geometric Distribution tutorial before attempting this challenge.

#### Task
The probability that a machine produces a defective product is 1/3. What is the probability that the 1st defect is found during the first 5 inspections?

#### Input Format

The first line contains the respective space-separated numerator and denominator for the probability of a defect, and the second line contains the inspection we want the probability of the first defect being discovered by:

1 3

5

If you do not wish to read this information from stdin, you can hard-code it into your program.

Output Format

Print a single line denoting the answer, rounded to a scale of 3 decimal places (i.e., 1.234 format).

In [22]:
numerator, denominator = map(int, input().split())
n = int(input())

p = numerator/denominator
q = 1-p

def geo_prob(n,p,q):
    result = q**(n-1) * p
    return result

solution = []
for i in range(1,n+1):
    answer = geo_prob(i,p,q)
    solution.append(answer)
    
print(round(sum(solution),3))

0.868


### Day 5: Poisson Distribution I
#### Objective
In this challenge, we learn about Poisson distributions. Check out the Tutorial tab for learning materials!

#### Task
A random variable, X, follows Poisson distribution with mean of 2.5. Find the probability with which the random variable X is equal to 5.

#### Input Format

The first line contains X's mean. The second line contains the value we want the probability for:

2.5

5

If you do not wish to read this information from stdin, you can hard-code it into your program.

#### Output Format

Print a single line denoting the answer, rounded to a scale of 3 decimal places (i.e., 1.234 format).

In [3]:
import math
from math import factorial as fact

L = float(input())
X = int(input())

e = 2.71828

P_XL = ((L**X) * (e**(-L))) / fact(X)

print(round(P_XL,3))

0.067


### Day 5: Poisson Distribution II
#### Objective
In this challenge, we go further with Poisson distributions. We recommend reviewing the previous challenge's Tutorial before attempting this problem.

#### Task
The manager of a industrial plant is planning to buy a machine of either type A or type B. For each day’s operation:

 - The number of repairs, X, that machine A needs is a Poisson random variable with mean 0.88. The daily cost of operating A is C(A) = 160 + 40X^2.
 - The number of repairs, Y, that machine B needs is a Poisson random variable with mean 1.55. The daily cost of operating B is C(B) = 128 + 40Y^2.

Assume that the repairs take a negligible amount of time and the machines are maintained nightly to ensure that they operate like new at the start of each day. Find and print the expected daily cost for each machine.

#### Input Format

A single line comprised of 2 space-separated values denoting the respective means for A and B:

0.88 1.55

If you do not wish to read this information from stdin, you can hard-code it into your program.

#### Output Format

There are two lines of output. Your answers must be rounded to a scale of 3 decimal places (i.e., 1.234 format):

 - On the first line, print the expected daily cost of machine A.
 - On the second line, print the expected daily cost of machine B.

In [5]:
a,b = map(float,input().split())

costA = 160 + (40 * (a + a**2))

costB = 128 + (40 * (b + b**2))

print(round(costA,3))
print(round(costB,3))

226.176
286.1


### Day 5: Normal Distribution I
#### Objective
In this challenge, we learn about normal distributions. Check out the Tutorial tab for learning materials!

#### Task
In a certain plant, the time taken to assemble a car is a random variable, X, having a normal distribution with a mean of 20 hours and a standard deviation of 2 hours. What is the probability that a car can be assembled at this plant in:

 - Less than 19.5 hours?
 - Between 20 and 22 hours?

#### Input Format

There are 3 lines of input (shown below):

20 2

19.5

20 22

The first line contains 2 space-separated values denoting the respective mean and standard deviation for X. The second line contains the number associated with question 1. The third line contains 2 space-separated values describing the respective lower and upper range boundaries for question 2.

If you do not wish to read this information from stdin, you can hard-code it into your program.

#### Output Format

There are two lines of output. Your answers must be rounded to a scale of 3 decimal places (i.e., 1.234 format):

On the first line, print the answer to question 1 (i.e., the probability that a car can be assembled in less than 19.5 hours).
On the second line, print the answer to question 2 (i.e., the probability that a car can be assembled in between 20 to 22 hours).

In [8]:
import math

m, std = map(int, input().split())
a = float(input())
b,c = map(int, input().split())

p1 = 1/2 * (1 + (math.erf((a-m)/(std * math.sqrt(2)))))

print(round(p1,3))

p2 = 1/2 * (1 + (math.erf((b-m)/(std * math.sqrt(2)))))
p3 = 1/2 * (1 + (math.erf((c-m)/(std * math.sqrt(2)))))

print(round(p3-p2,3))

0.401
0.341


### Day 5: Normal Distribution II
#### Objective
In this challenge, we go further with normal distributions. We recommend reviewing the previous challenge's Tutorial before attempting this problem.

#### Task
The final grades for a Physics exam taken by a large group of students have a mean of 70 and a standard deviation of 10. If we can approximate the distribution of these grades by a normal distribution, what percentage of the students:

 - Scored higher than 80 (i.e., have a grade > 80 )?
 - Passed the test (i.e., have a grade >= 60)?
 - Failed the test (i.e., have a grade < 60)?

Find and print the answer to each question on a new line, rounded to a scale of 2 decimal places.

#### Input Format

There are 3 lines of input (shown below):

70 10

80

60

The first line contains 2 space-separated values denoting the respective mean and standard deviation for the exam. The second line contains the number associated with question 1. The third line contains the pass/fail threshold number associated with questions 2 and 3.

If you do not wish to read this information from stdin, you can hard-code it into your program.

#### Output Format

There are three lines of output. Your answers must be rounded to a scale of 2 decimal places (i.e., 1.23 format):

 - On the first line, print the answer to question  (i.e., the percentage of students having grade > 80).
 - On the second line, print the answer to question  (i.e., the percentage of students having grade >= 60).
 - On the third line, print the answer to question  (i.e., the percentage of students having grade < 60).

In [9]:
import math

def calculate_percent(mean, std_dev, x):
    p = (1/2 * (1 + (math.erf((x-mean)/(std_dev * math.sqrt(2)))))) * 100
    return p

if __name__ == "__main__":
    # take inputs
    mean, std_dev = map(int, input().split())
    x1 = int(input())
    x2 = int(input())
    
    p1 = 100 - calculate_percent(mean, std_dev, x1)
    p2 = 100 - calculate_percent(mean, std_dev, x2)
    p3 = calculate_percent(mean, std_dev, x2)
    
    print(round(p1,2))
    print(round(p2,2))
    print(round(p3,2))

15.87
84.13
15.87


### Day 6: The Central Limit Theorem I
#### Objective
In this challenge, we practice solving problems based on the Central Limit Theorem. Check out the Tutorial tab for learning materials!

#### Task
A large elevator can transport a maximum of 9800 pounds. Suppose a load of cargo containing 49 boxes must be transported via the elevator. The box weight of this type of cargo follows a distribution with a mean of 205 pounds and a standard deviation of 15 pounds. Based on this information, what is the probability that all 49 boxes can be safely loaded into the freight elevator and transported?

#### Input Format

There are 4 lines of input (shown below):

9800

49

205

15

The first line contains the maximum weight the elevator can transport. The second line contains the number of boxes in the cargo. The third line contains the mean weight of a cargo box, and the fourth line contains its standard deviation.

If you do not wish to read this information from stdin, you can hard-code it into your program.

#### Output Format

Print the probability that the elevator can successfully transport all 49 boxes, rounded to a scale of 4 decimal places (i.e., 1.2345 format).

In [3]:
import math

def calculate_probability(mean, std_dev, x):
    p = 1/2 * (1 + (math.erf((x-mean)/(std_dev * math.sqrt(2)))))
    return p

if __name__ == "__main__":
    X = int(input())
    N = int(input())
    mean_weight = int(input())
    standard_dev = int(input())
    
    mean = mean_weight * N
    std_dev = standard_dev * math.sqrt(N)
    
    p = calculate_probability(mean=mean, std_dev=std_dev, x=X)
    
    print(round(p,4))

0.0098


### Day 6: The Central Limit Theorem II
#### Objective
In this challenge, we practice solving problems based on the Central Limit Theorem. We recommend reviewing the Central Limit Theorem Tutorial before attempting this challenge.

#### Task
The number of tickets purchased by each student for the University X vs. University Y football game follows a distribution that has a mean of 2.4 and a standard deviation of 2.0.

A few hours before the game starts, 100 eager students line up to purchase last-minute tickets. If there are only 250 tickets left, what is the probability that all 100 students will be able to purchase tickets?

#### Input Format

There are 4 lines of input (shown below):

250

100

2.4

2.0

The first line contains the number of last-minute tickets available at the box office. The second line contains the number of students waiting to buy tickets. The third line contains the mean number of purchased tickets, and the fourth line contains the standard deviation.

If you do not wish to read this information from stdin, you can hard-code it into your program.

#### Output Format

Print the probability that 100 students can successfully purchase the remaining 250 tickets, rounded to a scale of 4 decimal places (i.e., 1.2345 format).

In [4]:
import math

def calculate_probability(mean, std_dev, x):
    p = 1/2 * (1 + (math.erf((x-mean)/(std_dev * math.sqrt(2)))))
    return p

if __name__ == "__main__":
    last_minute_tickets = int(input())
    no_of_students = int(input())
    mean_purchased_tickets = float(input())
    standard_dev = float(input())
    
    mean = mean_purchased_tickets * no_of_students
    std_dev = standard_dev * math.sqrt(no_of_students)
    
    p = calculate_probability(mean=mean, std_dev=std_dev, x=last_minute_tickets)
    
    print(round(p,4))

0.6915


### Day 6: The Central Limit Theorem III
#### Objective
In this challenge, we practice solving problems based on the Central Limit Theorem. We recommend reviewing the Central Limit Theorem Tutorial before attempting this challenge.

#### Task
You have a sample of 100 values from a population with mean 500 and with standard deviation 80. Compute the interval that covers the middle 95% of the distribution of the sample mean; in other words, compute A and B such that P(A < x < B) = 0.95. Use the value of z = 1.96. Note that z is the z-score.

#### Input Format

There are five lines of input (shown below):

100

500

80

.95

1.96

The first line contains the sample size. The second and third lines contain the respective mean () and standard deviation (). The fourth line contains the distribution percentage we want to cover (as a decimal), and the fifth line contains the value of z.

If you do not wish to read this information from stdin, you can hard-code it into your program.

#### Output Format

Print the following two lines of output, rounded to a scale of 2 decimal places (i.e., 1.23 format):

 - On the first line, print the value of A.
 - On the second line, print the value of B.

In [5]:
import math

def lower_band(X,Z,std_dev,N):
    A = X - (Z * (std_dev/math.sqrt(N)))
    return A

def upper_band(X,Z,std_dev,N):
    B = X + (Z * (std_dev/math.sqrt(N)))
    return B

if __name__ == "__main__":
    sample_size = int(input())
    sample_mean = int(input())
    sample_stddev = int(input())
    dist_percent = float(input())
    z_score = float(input())
    
    A = lower_band(X=sample_mean, Z=z_score, std_dev=sample_stddev, N=sample_size)
    print(round(A,2))
    
    B = upper_band(X=sample_mean, Z=z_score, std_dev=sample_stddev, N=sample_size)
    print(round(B,2))

484.32
515.68


### Day 7: Pearson Correlation Coefficient I
#### Objective
In this challenge, we practice calculating the Pearson correlation coefficient. Check out the Tutorial tab for learning materials!

#### Task
Given two n-element data sets, X and Y, calculate the value of the Pearson correlation coefficient.

#### Input Format
The first line contains an integer, n, denoting the size of data sets X and Y.

The second line contains n space-separated real numbers (scaled to at most one decimal place), defining data set X.

The third line contains n space-separated real numbers (scaled to at most one decimal place), defining data set Y.

#### Constraints

![image.png](attachment:image.png)

#### Output Format

Print the value of the Pearson correlation coefficient, rounded to a scale of 3 decimal places.

#### Sample Input

10

10 9.8 8 7.8 7.7 7 6 5 4 2

200 44 32 24 22 17 15 12 8 4

#### Sample Output

0.612

#### Explanation

![image-2.png](attachment:image-2.png)


In [11]:
import statistics
import math

def calculate_pearson_corr_coeff(N, X, Y):
    mean_x = statistics.mean(X)
    mean_y = statistics.mean(Y)
    std_x = statistics.pstdev(X)
    std_y = statistics.pstdev(Y)
    COVxy = 0
    for i in range(N):
        COVxy += ((X[i] - mean_x) * (Y[i] -mean_y))
        pearson = COVxy/(N*std_x*std_y)
    print(round(pearson,3))

if __name__ == "__main__":
    n = int(input())
    x = list(map(float, input().split()))
    y = list(map(float, input().split()))
    
    calculate_pearson_corr_coeff(N=n, X=x, Y=y)
    

0.612


### Day 7: Spearman's Rank Correlation Coefficient
#### Objective
In this challenge, we practice calculating Spearman's rank correlation coefficient. Check out the Tutorial tab for learning materials!

#### Task
Given two n-element data sets, X and Y, calculate the value of Spearman's rank correlation coefficient.

#### Input Format
The first line contains an integer, n, denoting the number of values in data sets X and Y.

The second line contains n space-separated real numbers (scaled to at most one decimal place) denoting data set X.

The third line contains n space-separated real numbers (scaled to at most one decimal place) denoting data set Y.

#### Constraints

![image.png](attachment:image.png)

#### Output Format

Print the value of the Spearman's rank correlation coefficient, rounded to a scale of 3 decimal places.

#### Sample Input

10

10 9.8 8 7.8 7.7 1.7 6 5 1.4 2

200 44 32 24 22 17 15 12 8 4

#### Sample Output

0.903

#### Explanation

We know that data sets X and Y both contain unique values, so the rank of each value in each data set is unique. Because of this property, we can use the following formula to calculate the value of Spearman's rank correlation coefficient:

![image-2.png](attachment:image-2.png)

![image-3.png](attachment:image-3.png)

In [16]:
def calculate_spearman_corr_coeff(N, X, Y):
    rank = lambda arr: list(map(lambda i: sorted(arr).index(i)+1, arr))
    rank_x = rank(X)
    rank_y = rank(Y)
    
    di = 0
    
    for i in range(N):
        di+= (rank_x[i] - rank_y[i])**2
    
    spearman = 1 - ((6 * di) / (N * ((N**2) - 1)))
    
    print(round(spearman,3))
    
if __name__ == "__main__":
    n = int(input())
    x = list(map(float,input().split()))
    y = list(map(float,input().split()))
    
    calculate_spearman_corr_coeff(N=n, X=x, Y=y)

0.903


### Day 8: Least Square Regression Line
#### Objective
In this challenge, we practice using linear regression techniques. Check out the Tutorial tab for learning materials!

#### Task
A group of five students enrolls in Statistics immediately after taking a Math aptitude test. Each student's Math aptitude test score, x, and Statistics course grade, y, can be expressed as the following list of (x,y) points:
 1. (95, 85)
 2. (85, 95)
 3. (80, 70)
 4. (70, 65)
 5. (60, 70)

If a student scored an 80 on the Math aptitude test, what grade would we expect them to achieve in Statistics? Determine the equation of the best-fit line using the least squares method, then compute and print the value of y when x = 80.

#### Input Format

There are five lines of input; each line contains two space-separated integers describing a student's respective x and y grades:

95 85

85 95

80 70

70 65

60 70

If you do not wish to read this information from stdin, you can hard-code it into your program.

#### Output Format

Print a single line denoting the answer, rounded to a scale of 3 decimal places (i.e., 1.234 format).

In [4]:
import math
import statistics

def calculate_pearson_corr_coeff(N, X, Y):
    mean_x = statistics.mean(X)
    mean_y = statistics.mean(Y)
    std_x = statistics.pstdev(X)
    std_y = statistics.pstdev(Y)
    COVxy = 0
    for i in range(N):
        COVxy += ((X[i] - mean_x) * (Y[i] -mean_y))
        pearson = COVxy/(N*std_x*std_y)
    return pearson, mean_x, mean_y, std_x, std_y

if __name__ == "__main__":
    x1,y1 = map(int, input().split())
    x2,y2 = map(int, input().split())
    x3,y3 = map(int, input().split())
    x4,y4 = map(int, input().split())
    x5,y5 = map(int, input().split())
    
    x = [x1, x2, x3, x4, x5]
    y = [y1, y2, y3, y4, y5]
    
    N = len(x)
    
    pearson, mean_x, mean_y, std_x, std_y = calculate_pearson_corr_coeff(N=N, X=x, Y=y)
    
    b = pearson * (std_y/std_x)
    
    a = mean_y - (b * mean_x)
    
    x_test = 80
    
    y_hat = a + (b * x_test)
    
    print(round(y_hat,3))

78.288


### Day 8: Pearson Correlation Coefficient II

The regression line of y on x is 3x + 4y + 8 = 0, and the regression line of x on y is 4x + 3y + 7 = 0. What is the value of the Pearson correlation coefficient?

Note: If you haven't seen it already, you may find our Pearson Correlation Coefficient Tutorial helpful in answering this question.

![image.png](attachment:image.png)

#### Solution:
i) Regression line of y on x is 3x + 4y + 8 = 0

4y = -8 - 3x

y = -2 -(3/4)x

b1 = -3/4

ii) Regression line of x on y is 4x + 3y + 7 = 0

4x = -7 -3y

x = -7/4 - (3/4)y

b2 = -3/4

b1 = p * (std_y/std_x) -----------(III)

b2 = p * (std_x/std_y) -----------(IV)

Multiply (III) & (IV)

b1 * b2 = p^2

p^2 = (-3/4) * (-3/4)

p = ± 3/4

In general, straight lines have slopes that are positive, negative, or zero. If we were to examine our least-square regression lines and compare the corresponding values of r, we would notice that every time our data has a negative correlation coefficient, the slope of the regression line is negative. Similarly, for every time that we have a positive correlation coefficient, the slope of the regression line is positive.

The formula for the slope a of the regression line is:

a = r(sy/sx)

The calculation of a standard deviation involves taking the positive square root of a nonnegative number. As a result, both standard deviations in the formula for the slope must be nonnegative. If we assume that there is some variation in our data, we will be able to disregard the possibility that either of these standard deviations is zero. Therefore the sign of the correlation coefficient will be the same as the sign of the slope of the regression line.

therefore, p = -3/4

### Day 9: Multiple Linear Regression
#### Objective
In this challenge, we practice using multiple linear regression. Check out the Tutorial tab for learning materials!

#### Task
Andrea has a simple equation:

Y = a + b1 . f1 + b2 . f2 + ..... + bm . fm

for (m + 1) real constants (a, f1, f2, ..., fm). We can say that the value of Y depends on m features. Andrea studies this equation for n different feature sets (f1, f2, f3, ..., fm) and records each respective value of Y. If she has q new feature sets, can you help Andrea find the value of Y for each of the sets?

Note: You are not expected to account for bias and variance trade-offs.

#### Input Format

The first line contains 2 space-separated integers, m (the number of observed features) and n (the number of feature sets Andrea studied), respectively. Each of the n subsequent lines contain (m + 1) space-separated decimals; the first m elements are features (f1, f2, f3, ..., fm), and the last element is the value of Y for the line's feature set. The next line contains a single integer, q, denoting the number of feature sets Andrea wants to query for. Each of the q subsequent lines contains m space-separated decimals describing the feature sets.

#### Constraints

![image.png](attachment:image.png)

#### Scoring
For each feature set in one test case, we will compute the following:

![image-2.png](attachment:image-2.png)

#### Output Format

For each of the q feature sets, print the value of Y on a new line (i.e., you must print a total of q lines).

#### Sample Input

2 7

0.18 0.89 109.85

1.0 0.26 155.72

0.92 0.11 137.66

0.07 0.37 76.17

0.85 0.16 139.75

0.99 0.41 162.6

0.87 0.47 151.77

4

0.49 0.18

0.57 0.83

0.56 0.64

0.76 0.18

#### Sample Output

105.22

142.68

132.94

129.71

#### Explanation

![image-3.png](attachment:image-3.png)

In [2]:
import numpy as np

m,n = map(int, input().split())
x = []
y = []

for i in range(n):
    x.append([1] + [float(a) for a in input().split()])
    y.append(x[i].pop(-1))
    
X = np.array(x)
Y = np.array(y)

B = np.dot(np.dot(np.linalg.inv(np.dot(np.transpose(X),X)), np.transpose(X)), Y)

q = int(input())

for i in range(q):
    print(round(np.dot(np.array([1] + list(map(float, input().split()))), B),2))

105.21
142.67
132.94
129.7
