## Central Tendencies:

In [31]:
x=[1,2,3,4,5,6,7,8,9,10]

In [32]:
# mean (or average)
def mean(x):
    """finds the average of data"""
    return sum(x)/len(x)
    

In [33]:
mean(x)

5.5

In [34]:
#median
def median(v):
 """finds the 'middle-most' value of v"""
 n = len(v)
 sorted_v = sorted(v)
 midpoint = n // 2
 if n % 2 == 1:
     # if odd, return the middle value
     return sorted_v[midpoint]
 else:
     # if even, return the average of the middle values
     lo = midpoint - 1
     hi = midpoint
     return (sorted_v[lo] + sorted_v[hi]) / 2

In [35]:
median(x)

5.5

In [36]:
#quantile
def quantile(x, p):
 """returns the pth-percentile value in x"""
 p_index = int(p * len(x))
 return sorted(x)[p_index]


In [37]:
quantile(x, 0.10)

2

In [38]:
quantile(x, 0.50)

6

In [39]:
y=[1,2,4,3,6,2,6,2,7,2,75,0,4,5]

In [40]:
from collections import Counter
#mode, or most-common value[s]
def mode(x):
 """returns a list, might be more than one mode"""
 counts = Counter(x)
 max_count = max(counts.values())
 return [x_i for x_i, count in counts.items() if count == max_count]


In [41]:
mode(y)

[2]

## Dispersion
Dispersion refers to measures of how spread out our data is. Typically they’re statistics
for which values near zero signify not spread out at all and for which large values
(whatever that means) signify very spread out

In [44]:
def data_range(x):
 return max(x) - min(x)


In [45]:
data_range(x)

9

In [46]:
def de_mean(x):
 """translate x by subtracting its mean (so the result has mean 0)"""
 x_bar = mean(x)
 return [x_i - x_bar for x_i in x]

In [47]:
de_mean(x)

[-4.5, -3.5, -2.5, -1.5, -0.5, 0.5, 1.5, 2.5, 3.5, 4.5]

In [48]:
mean(de_mean(x))

0.0


Variance is a statistical measure that quantifies the spread or dispersion of a set of data points. In simple terms, it tells us how much the data points differ from the mean (average) of the data set.

In [51]:
def sum_of_squares(x):
    """Returns the sum of squares of the elements in x."""
    return sum(xi ** 2 for xi in x)

def variance(x):
    """Assumes x has at least two elements."""
    n = len(x)
    deviations = de_mean(x)
    return sum_of_squares(deviations) / (n - 1)

In [52]:
variance(x)

9.166666666666666

Standard deviation is a statistical measure that indicates the amount of variation or dispersion in a set of data points. In simpler terms, it tells us how spread out the data points are around the mean (average) of the data set.

In [55]:
import math
def standard_deviation(x):
 return math.sqrt(variance(x))

In [56]:
standard_deviation(x)

3.0276503540974917

Covariance is a measure of how two random variables change together. If the greater values of one variable correspond to the greater values of another variable, and the same holds for the lesser values (i.e., both variables tend to move in the same direction), the covariance is positive. Conversely, if the greater values of one variable correspond to the lesser values of another variable (i.e., the variables tend to move in opposite directions), the covariance is negative.

Covariance is calculated by taking the product of the deviations of each pair of corresponding values from their respective means and then averaging these produs change together.

In [59]:
def covariance(x, y):
    """Calculates the covariance between two lists of numbers."""
    n = len(x)
    if n != len(y):
        raise ValueError("x and y must have the same number of elements")
    return sum((xi - mean(x)) * (yi - mean(y)) for xi, yi in zip(x, y)) / (n - 1)


In [58]:
# Example usage
x = [10, 20, 30, 40, 50]
y = [8, 18, 28, 38, 48]
print("Covariance:", covariance(x, y))

Covariance: 250.0


Correlation is a statistical measure that describes the strength and direction of a relationship between two variables. It quantifies how much one variable changes in relation to changes in another variable. The most common measure of correlation is the Pearson correlation coefficient, which ranges from -1 to 1.

- **1** indicates a perfect positive correlation, meaning as one variable increases, the other variable also increases proportionally.
- **-1** indicates a perfect negative correlation, meaning as one variable increases, the other variable decreases proportionally.
- **0** indicates no correlation, meaning there is no linear relationship between the variab

In simple terms, correlation is a measure of how two things move in relation to each other. If two things tend to increase or decrease together, they have a positive correlation. If one tends to increase when the other decreases, they have a negative correlation. If they don't seem to move together at all, their correlation is close to zero.

Example:
- Positive Correlation: Height and weight often have a positive correlation because as height increases, weight tends to increase too.
- Negative Correlation: The amount of time spent watching TV and grades might have a negative correlation because more TV time might be associated with lower grades.
- No Correlation: The number of books in a library and the temperature outside likely have no correlation because they don't affect each other.easure of the strength and direction of their linear relationship.

In [61]:
def correlation(x, y):
    """Calculates the Pearson correlation coefficient between two lists of numbers."""
    stdev_x = standard_deviation(x)
    stdev_y = standard_deviation(y)
    if stdev_x > 0 and stdev_y > 0:
        return covariance(x, y) / (stdev_x * stdev_y)
    else:
        return 0  # If there is no variation, correlation is zero

# Example usage
x = [10, 20, 30, 40, 50]
y = [100, 88, 77, 2, 1]
print("Correlation:", correlation(x, y))

Correlation: -0.9306124863271186


In [63]:
# Example usage
books = [1000, 1500, 2000, 2500, 3000]
temperature = [22, 15, 27, 18, 20]
print("Correlation:", correlation(books, temperature))

Correlation: -0.035093120317179836
