# Statistical user Defined Functions

Creating statistical user-defined functions in Python involves writing custom functions to perform various statistical calculations. 

Mean (Average),
Median,
Mode,
Variavnce,
Standard Deviantion,
Range,
Co-variance,
Correlation,
Skewness,
Kurtosis,
etc.


# Mean
Explanation: The mean (average) is calculated by summing all the numbers in the list and dividing by the number of elements in the list.

In [3]:
def cal_mean(num):

    if len(num) == 0:
        return 0                           # Return 0 for an empty 

    
    mean = sum(num) / len(num)

    return mean

# Example
num = [1,2,3,4,5]
mean = cal_mean(num)
print("The mean of the list is:", mean)


The mean of the list is: 3.0


## Median
Explanation: The median is the middle value in a sorted list. If the list length is even, it's the average of the two middle values.

In [2]:
def cal_median(num):
    if len(num) == 0:
        return 0                            # Return 0 for an empty list

    sort_num = sorted(num)
    n = len(sort_num)
    mid = n // 2

    if n % 2 == 0:
        
        med = (sort_num[mid] + sort_num[mid + 1]) / 2      # If even, average the two middle numbers
    else:
                                            
        med = sort_num[mid]                                # If odd, take the middle number
    
    return med

# Example usage
num= [10, 20, 30, 40, 50]
median = cal_median(num)
print("The median of the list is: ",median)

The median of the list is:  30



# Mode
Explanation: The mode is the number that appears most frequently in the list. If there's a tie, all the most frequent numbers are returned.

In [6]:
def cal_mode(numb):
   
    if len(numb) == 0:
        return 0                                   # Return None for an empty list

    # Create a dictionary to count the occurrences of each number
    freq = {}
    for number in numb:
        if number in freq:
            freq[number] += 1
        else:
            freq[number] = 1

    # Find the maximum frequency
    max_count = max(freq.values())

    # Identify the modes
    modes = [num for num, count in freq.items() if count == max_count]

    # If there is only one mode, return it, otherwise return the list of modes
    return modes[0] if len(modes) == 1 else modes

# Example usage
numb = [10, 20, 20, 30, 40, 50, 50]
mode = cal_mode(numb)
print("The mode of the list is: ",mode)


The mode of the list is:  [20, 50]


# Variance
Explanation: Variance measures how spread out the numbers are. It's calculated as the average of the squared differences from the mean.

In [4]:
def cal_var(num):
    
    if len(num) == 0:
        return 0                               # Return 0 for an empty
   
    # Calculate the mean
    mean = sum(num) / len(num)
    
    # Calculate the squared differences from the mean
    squared_diffs = [(x - mean) ** 2 for x in num]
    
    # Calculate the variance
    var = sum(squared_diffs) / len(num)
    
    return var

# Example usage
num = [10, 20, 30, 40, 50]
var = cal_var(num)
print("The variance of the list is: ",var)


The variance of the list is:  200.0


# Standard Deviation
Explanation: The standard deviation is the square root of the variance. It indicates how much the numbers in the list deviate from the mean.

In [7]:
import math

def cal_SD(numb):
   
    if len(numb) == 0:
        return 0                                    # Return 0 for an empty 

    # Calculate the mean
    mean = sum(numb) / len(numb)
    
    # Calculate the squared differences from the mean
    squared_diffs = [(x - mean) ** 2 for x in numb]
    
    # Calculate the variance
    var = sum(squared_diffs) / len(numb)
    
    # Calculate the standard deviation
    SD = math.sqrt(var)
    
    return SD
# Example usage
numb = [10, 20, 30, 40, 50]
SD = cal_SD(numb)
print("The standard deviation of the list is: ",SD)


The standard deviation of the list is:  14.142135623730951


# Range
Explanation: The range is the difference between the largest and smallest values in the list.

In [8]:
def cal_range(numb):

    if len(numb) == 0:
        return 0                                 # Return 0 for an empty list

    # Calculate the range
    range_value = max(numb) - min(numb)
    
    return range_value

# Example usage
numb = [10, 20, 30, 40, 50]
range_value = cal_range(numb)
print("The range of the list is: ",range_value)


The range of the list is:  40


# Co-variance
Explanation: In Python, you can calculate the covariance between two lists of numbers using a user-defined function

In [12]:
def cal_covar(x, y):
    
    if len(x) != len(y) or len(x) == 0:
        return 0

    n = len(x)
    
    # Calculate the means of x and y
    mean_x = sum(x) / n
    mean_y = sum(y) / n

    # Calculate the covariance
    covar = sum((x[i] - mean_x) * (y[i] - mean_y) for i in range(n)) / n
    
    return covar

# Example usage
x = [10, 20, 30, 40, 50]
y = [15, 25, 35, 45, 55]
covar = cal_covar(x, y)
print("The covariance of the lists is: ",covar)


The covariance of the lists is:  200.0


# Correlation 
you can calculate the correlation coefficient between two lists of numbers using a user-defined function:

In [13]:
def cal_corr(x, y):
    
    if len(x) != len(y) or len(x) == 0:
        return 0

    n = len(x)
    
    # Calculate the means of x and y
    mean_x = sum(x) / n
    mean_y = sum(y) / n

    # Calculate the sums of squares
    sum_sq_x = sum((xi - mean_x) ** 2 for xi in x)
    sum_sq_y = sum((yi - mean_y) ** 2 for yi in y)

    # Calculate the sum of products
    sum_products = sum((x[i] - mean_x) * (y[i] - mean_y) for i in range(n))

    # Calculate the correlation coefficient
    corr = sum_products / (math.sqrt(sum_sq_x) * math.sqrt(sum_sq_y))
    
    return corr

# Example usage
x = [10, 20, 30, 40, 50]
y = [15, 25, 35, 45, 55]
corr = cal_corr(x, y)
print("The correlation coefficient of the lists is: ",corr)


The correlation coefficient of the lists is:  1.0


# Skewness

In [16]:
def skewness(data):
    
    n = len(data)
    mean = sum(data) / n
    var = sum((x - mean) ** 2 for x in data) / n
    sd = var ** 0.5
    skew = (sum((x - mean) ** 3 for x in data) / n) / sd ** 3
    return skew

# Example usage
data = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
skew = skewness(data)
print("The skewness of the data is: ",skew)


The skewness of the data is:  0.0


# Kurtosis

In [18]:
def kurtosis(data):
    
    n = len(data)
    mean = sum(data) / n
    var = sum((x - mean) ** 2 for x in data) / n
    sd = var ** 0.5
    kurt= (sum((x - mean) ** 4 for x in data) / n) / sd ** 4 - 3
    return kurt

# Example usage
data = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
kurt = kurtosis(data)
print("The kurtosis of the data is: ",kurt)


The kurtosis of the data is:  -1.2242424242424241
