In [1]:
##  Explain the different types of data (qualitative and quantitative) and provide examples of each. Discuss nominal, ordinal, interval, and ratio scales.

# Data can be broadly classified into two main types: **qualitative** (or categorical) data and **quantitative** (or numerical) data. Each type has unique characteristics and uses.

# 1. Qualitative (Categorical) Data

# Qualitative data represent categories or labels and describe attributes or characteristics.
# This type of data is not measured in numbers and instead classifies things into distinct groups.

# Examples of Qualitative Data:
# - **Color of cars** (e.g., red, blue, green)
# - **Type of cuisine** (e.g., Italian, Mexican, Chinese)
# - **Gender** (e.g., male, female, non-binary)

# Qualitative data can further be divided into two subcategories: **nominal** and **ordinal** scales.

# Nominal Scale
# - **Description**: The nominal scale categorizes data without any quantitative value or order. Items on a nominal scale are simply labeled.
# - **Example**: Eye color (blue, brown, green), types of pets (dog, cat, bird).
# - **Properties**: No ordering or ranking is involved; only labels are provided. It’s a purely qualitative scale.

# Ordinal Scale
# - **Description**: The ordinal scale provides a rank order among categories. The intervals between ranks are not uniform or measurable, but the order is meaningful.
# - **Example**: Education levels (high school, bachelor’s, master’s, PhD), customer satisfaction (satisfied, neutral, dissatisfied).
# - **Properties**: There’s a clear order, but differences between values aren’t consistent or meaningful.

# 2. Quantitative (Numerical) Data

# Quantitative data represent measurable quantities, typically expressed in numbers. 
# This type of data can be analyzed mathematically, such as by averaging or measuring variation.

# Examples of Quantitative Data:
# - **Temperature** (in degrees Celsius or Fahrenheit)
# - **Weight** (in kilograms or pounds)
# - **Age** (in years)

# Quantitative data are further divided into two subcategories: **interval** and **ratio** scales.

# Interval Scale
# - **Description**: The interval scale has a meaningful order, and the difference between values is consistent. However, it lacks an absolute zero, meaning zero does not indicate the absence of the quantity.
# - **Example**: Temperature in Celsius or Fahrenheit, where 0 does not mean "no temperature."
# - **Properties**: Allows for addition and subtraction, but not meaningful multiplication or division.

# Ratio Scale
# - **Description**: The ratio scale is the most informative scale. It has an absolute zero point, allowing all arithmetic operations, including meaningful multiplication and division.
# - **Example**: Height, weight, distance, and age.
# - **Properties**: Ratios are meaningful (e.g., twice as heavy, half as tall) because the zero value signifies an absence of the quantity.

In [2]:
## What are the measures of central tendency, and when should you use each? Discuss the mean, median, and mode with examples and situations where each is appropriate.

# Measures of central tendency are statistical values that describe the center or typical value of a dataset.
# The three primary measures are the mean,median and mode.
# Each of these measures provides insights into the data’s distribution and is appropriate in different situations.

# 1. Mean (Average)

# The **mean** is the sum of all values in a dataset divided by the number of values.
# It is the most commonly used measure of central tendency.

# Example:
# Consider the following set of exam scores: 80, 85, 90, 95, 100.

# Mean = (80 + 85 + 90 + 95 + 100)/5 = 90

# When to Use the Mean:
# - **Symmetrical data**: The mean is most informative when the dataset is normally distributed (bell-shaped and symmetrical).
# - **Continuous data**: The mean is suitable for continuous, quantitative data where values vary consistently.
  
# Limitations:
# - **Sensitive to outliers**: The mean can be heavily influenced by extreme values (outliers).
# For example, in a set of income values, a very high income can skew the mean upward, making it less representative of typical incomes in the group.

# 2. Median

# The median is the middle value in an ordered dataset.
# If the dataset has an odd number of values, the median is the exact middle; if even, it is the average of the two middle values.

# Example:
# Using the dataset: 80, 85, 90, 95, 100, the median is 90 (the middle value).

# If we add an additional value (say 105), making it 80, 85, 90, 95, 100, 105, the median becomes (90 + 95)/2 = 92.5

# When to Use the Median:
# - **Skewed data**: The median is more appropriate than the mean when data are skewed (asymmetrical).
# For instance, if one income in a dataset is disproportionately high, the median provides a central value unaffected by this outlier.
# - **Ordinal data**: The median can also be useful for ordinal data (data with a meaningful order but not equidistant, such as rankings) as it identifies the midpoint without requiring equal intervals.

# 3. Mode

# The **mode** is the most frequently occurring value in a dataset.
# A dataset may have one mode (unimodal), more than one mode (bimodal or multimodal), or no mode at all if all values are unique.

# Example:
# In the dataset: 80, 85, 85, 90, 100, the mode is 85, as it appears most frequently.

# When to Use the Mode:
# - **Categorical data**: The mode is best for categorical or nominal data, where it can indicate the most common category or value.
# - **Multimodal distributions**: When a dataset has multiple peaks, the mode can provide insights into the most frequently occurring values, which is useful for identifying trends in multi-modal datasets (e.g., peak ages for different career paths).

In [3]:
## Explain the concept of dispersion. How do variance and standard deviation measure the spread of data?

# Dispersion refers to the spread of data values in a dataset, indicating how much the values differ from the central tendency (such as the mean).
# A dataset with low dispersion has values close to the mean, while a high dispersion dataset has values spread out over a wider range.
# Dispersion is crucial because it helps us understand the variability within the data, giving us a fuller picture than central tendency measures alone.

# The two primary measures of dispersion are variance and standard deviation, which describe how data points differ from the mean.

# 1. Variance

# Variance measures the average squared deviation of each data point from the mean.
# It is calculated by taking each data point's deviation from the mean, squaring it (to avoid negative values), and then averaging those squared deviations.

# 2. Standard Deviation

# Standard deviation is the square root of the variance.
# It brings the units back to the original scale of the data, making it easier to interpret.
# Standard deviation represents the average distance of each data point from the mean, giving a sense of how spread out or clustered the values are.

# Variance and Standard Deviation: When to Use
# Variance is more commonly used in theoretical and computational contexts. It is helpful in statistical models where squared deviations are needed (e.g., calculating the sum of squares in regression analysis).
# Standard Deviation is often preferred for interpretation since it is in the same units as the original data, making it more intuitive for practical analysis.
# Both variance and standard deviation are foundational in statistics, helping us understand the variability in data beyond just the central tendency.

In [4]:
## ## What is a box plot, and what can it tell you about the distribution of data?

# A box plot is a graphical representation of a dataset that shows its central tendency, spread, and skewness.
# It provides a visual summary of the distribution, highlighting key summary statistics such as the minimum,first quartile (Q1), median (Q2),third quartile (Q3), and maximum.

# Structure of a Box Plot

# A typical box plot consists of the following parts:

# 1. Box: The box in the middle represents the interquartile range (IQR), which is the range between the first quartile (Q1) and the third quartile (Q3).
#   Lower Edge (Q1): Represents the 25th percentile, or the point below which 25% of the data lie.
#   Upper Edge (Q3): Represents the 75th percentile, or the point below which 75% of the data lie.
#   Length of the Box: Shows the IQR (Q3 - Q1), which contains the middle 50% of the data points.

# 2. Median (Q2): A line inside the box represents the median, or the 50th percentile. This shows the central point of the data, with half the values above and half below.

# 3. Whiskers: Lines extending from the box indicate variability outside the upper and lower quartiles.
#   Lower Whisker: Extends from Q1 to the minimum value within a certain range (typically 1.5 × IQR below Q1).
#   Upper Whisker: Extends from Q3 to the maximum value within a certain range (typically 1.5 × IQR above Q3).

# 4. Outliers: Data points that lie beyond the whiskers (1.5 × IQR from Q1 or Q3) are plotted as individual points and are considered outliers. These represent unusually high or low values in the dataset.

# Interpretation of a Box Plot

# A box plot reveals several key aspects of a dataset's distribution:

# 1.Center and Spread: 
#   The position of the median line within the box shows where the central point of the data lies.
#   The IQR(length of the box) indicates the data's spread. A larger box suggests greater variability in the middle 50% of values.

# 2. Symmetry and Skewness:
#   Symmetric Distribution: If the median is centered in the box, and the whiskers are of approximately equal length, the data distribution is likely symmetric.
#   Left-Skewed Distribution: If the median is closer to Q3, and the left whisker is longer, the data is left-skewed (more values on the higher end).
#   Right-Skewed Distribution: If the median is closer to Q1, and the right whisker is longer, the data is right-skewed (more values on the lower end).

# 3.Range and Outliers:
#   Range: The total span from the minimum to maximum whisker indicates the data's range.
#   Outliers: Points outside the whiskers suggest possible anomalies or outliers, showing values that fall outside the expected range.
#   These may need further investigation to understand why they differ so much from the rest of the data.

In [5]:
##  Discuss the role of random sampling in making inferences about populations

# Random sampling is a fundamental method in statistics that allows researchers to make reliable inferences about a larger population based on a smaller, representative subset.
# In this process, each member of the population has an equal chance of being included in the sample, ensuring that the sample accurately reflects the characteristics of the entire population. 

# Importance of Random Sampling in Inference

# 1. Representativeness:
#   - Random sampling increases the likelihood that the sample will be representative of the population, capturing its diversity in terms of characteristics and variability.
#     This representativeness is crucial because it allows researchers to generalize findings from the sample to the population, minimizing bias.

# 2. Reducing Bias:
#   - By giving each individual an equal chance of selection, random sampling reduces **selection bias**, which occurs when certain individuals or groups are more likely to be chosen than others.
#     A well-conducted random sample avoids over- or under-representing certain population segments, leading to more valid inferences.

# 3. Allows for Statistical Inference:
#   - Random sampling underlies many statistical tests and formulas. With a random sample, we can apply probability theory and calculate confidence intervals and significance tests to assess the reliability of estimates.
#     This means that random sampling enables the calculation of error margins and probabilities, which are essential for drawing meaningful inferences.

# 4. Generalizability:
#   - Because a random sample is more likely to mirror the population's structure, the results from analyzing the sample can be generalized to the population. 
# For example, if a survey randomly samples a group of people about their voting intentions, the findings can be extended to predict the preferences of the entire voting population.

# 5. Enables Estimation of Population Parameters:
#   - Random sampling allows for unbiased estimation of population parameters (like the mean, median, or proportion). 
#     Sampling randomly ensures that calculated sample statistics (such as sample mean or proportion) serve as unbiased estimators of the population parameters, with the sampling error reduced to the minimum feasible level.

# Example of Random Sampling in Inference

# Suppose a university wants to study the average study time per week among its students. Sampling the entire student body might be impractical, so a random sample of students is selected. 

# - Step 1: A random sample of 200 students is chosen.
# - Step 2: Researchers calculate the average study time and observe variability within this sample.
# - Step 3: Using this sample, the researchers calculate a confidence interval around the sample mean, allowing them to estimate the likely range of the true average study time for the whole student body.
# - Conclusion: Because of the random sampling method, this estimate is considered reliable and generalizable to all students.

In [6]:
## Explain the concept of skewness and its types. How does skewness affect the interpretation of data?


# Skewness refers to the asymmetry of a dataset’s distribution around its mean.
# In a perfectly symmetrical distribution, like a normal distribution, the data are evenly distributed around the mean, resulting in a skewness of zero. However, many real-world datasets are asymmetrical, or skewed, which affects the interpretation of data, especially measures of central tendency.

# Types of Skewness

# 1. Positive Skew (Right Skew):
#   - In a positively skewed distribution, the tail on the right side (higher values) is longer or fatter than the left tail.
#   - Most of the data points cluster on the left side of the distribution, but a few higher values pull the mean to the right, making it greater than the median.
#   - Example: Income distribution in a population, where most people earn moderate salaries, but a few high earners create a long right tail.

#   Relationships among Mean, Median, and Mode:
#   - Mean > Median > Mode
   
# 2. Negative Skew (Left Skew):
#   - In a negatively skewed distribution, the tail on the left side (lower values) is longer or fatter than the right tail.
#   - Most of the data points cluster on the right side, but a few lower values pull the mean to the left, making it less than the median.
#   - Example: Age at retirement, where most people retire around a common age, but a few retire very early, creating a left tail.

#   Relationships among Mean, Median, and Mode:
#   - Mean < Median < Mode
   
# 3. Zero Skew (Symmetrical Distribution):
#   - In a symmetric distribution, the data are evenly spread around the mean, with the left and right sides mirroring each other.
#   - Example: Heights or weights in a large population sample often approximate symmetry.
   
#   Relationships among Mean, Median, and Mode:
#   - Mean ≈ Median ≈ Mode

# How Skewness Affects Interpretation of Data

# 1. Central Tendency:
#   - Skewness can significantly affect measures of central tendency, especially the mean. In skewed distributions, the mean is "pulled" in the direction of the skew (right for positive skew, left for negative skew). 
#   - The median and mode, however, are less affected by extreme values. Thus, in skewed data, the **median** is often a better measure of central tendency than the mean.

# 2. Spread of Data:
#   - Skewness affects the interpretation of the dataset's spread. In a right-skewed distribution, for instance, the majority of data points are below the mean, creating an impression that the data are more concentrated than they are.
#   - This makes measures like standard deviation and variance less informative on their own, as they may not accurately reflect the spread in asymmetrical distributions.

# 3. Interpretation of Probabilities:
#   - In skewed distributions, probability calculations based on mean values may be misleading. For example, in income data with positive skew, the average income will be higher than most people’s income, so the "typical" income may be closer to the median than the mean.
#   - Skewed distributions require analysts to think carefully about which central measure best represents "typical" values.

# 4. Decision-Making:
#   - Skewness is particularly important in fields like finance, insurance, and quality control, where understanding extreme values is crucial.
#     For instance, in risk assessment, a positively skewed distribution of losses may indicate a few rare but costly events, impacting how risk is managed.
#   - In quality control, a negatively skewed distribution in product defects might suggest occasional severe defects that could need targeted attention.

# 5. Choice of Statistical Tests:
#   - Many statistical tests assume a normal distribution. When data are skewed, these tests may yield inaccurate results. For skewed data, non-parametric tests or transformations are often recommended.

In [7]:
## What is the interquartile range (IQR), and how is it used to detect outliers?


# The interquartile range (IQR) is a measure of statistical dispersion that indicates the range within which the middle 50% of a dataset falls.
# It represents the spread of the data between the first quartile (Q1) and the third quartile (Q3).
# By focusing on the central portion of the data, the IQR is less affected by outliers and extreme values than the overall range.

# How to Calculate the IQR

# To calculate the IQR:
# 1. Order the data: Arrange the data points in ascending order.
# 2. Calculate Q1 and Q3:
#   - Q1 (First Quartile): The 25th percentile, or the point below which 25% of the data lie.
#   - Q3 (Third Quartile): The 75th percentile, or the point below which 75% of the data lie.
# 3. Compute the IQR:
#   IQR = Q3 - Q1

# Example:
# Consider the dataset: 5, 7, 8, 12, 13, 14, 18, 20, 21.
# - Q1 (25th percentile) is 8.
# - Q3 (75th percentile) is 18.
# - IQR = 18 - 8 = 10.

# The IQR of this dataset is 10, which shows the spread of the central 50% of the values.

# Using the IQR to Detect Outliers

# The IQR is commonly used to detect outliers values that are unusually high or low compared to the rest of the dataset.
# Outliers are typically identified as values that fall significantly beyond the typical range of the data. A commonly used rule for outlier detection is the 1.5 × IQR rule:

# 1. Calculate the lower and upper bounds:
#   - Lower Bound: ( Q1 - 1.5 * IQR )
#   - Upper Bound: ( Q3 + 1.5 * IQR)

# 2. Identify outliers:
#   - Values below the lower bound or above the upper bound are considered outliers.

# Example (continued):
# Using the previous dataset where Q1 = 8 and Q3 = 18:
# IQR = 10.
# Lower Bound = (8 - (1.5 *10) = 8 - 15 = -7)
# Upper Bound = (18 + (1.5 *10) = 18 + 15 = 33)

# Any data points below -7 or above 33 would be classified as outliers.
# In this dataset, there are no values outside this range, so there are no outliers.

In [8]:
## Discuss the conditions under which the binomial distribution is used.

# The binomial distribution is a discrete probability distribution that describes the number of successes in a fixed number of independent trials, where each trial has only two possible outcomes.
# It’s commonly used when we’re interested in the probability of a specific number of successful outcomes over several attempts, such as flipping a coin multiple times or testing products for defects.

# Conditions for Using the Binomial Distribution

# To use the binomial distribution, the following conditions must be met:

# 1. Fixed Number of Trials (n):
#   - There is a set number of trials, ( n), which means the experiment or process is repeated a specific number of times.
#   - Example: Flipping a coin 10 times or conducting 20 quality control tests.

# 2. Binary Outcomes (Success or Failure):
#   - Each trial has exactly two possible outcomes, often labeled as "success" and "failure."
#   - Example: In a coin toss, heads might be considered a success and tails a failure; in a product test, a functioning product is a success, and a defective product is a failure.

# 3. Constant Probability of Success (p):
#   - The probability of success, (p), remains the same for each trial.
#   - Example: If a coin is fair, the probability of getting heads (success) remains 0.5 for each flip.
#     In testing products for defects, the probability of a product being defective should remain constant across all tests.

# 4. Independence of Trials:
#   - Each trial is independent of the others, meaning that the outcome of one trial does not affect the outcome of any other trial.
#   - Example: In flipping a coin multiple times, each flip is independent, so getting heads on one flip does not influence the outcome of the next flip.

# If these four conditions are satisfied, then the number of successes across the trials follows a binomial distribution.

In [9]:
##  Explain the properties of the normal distribution and the empirical rule (68-95-99.7 rule)

# The normal distribution, often called the Gaussian distribution, is a continuous probability distribution that is symmetric and bell-shaped.
# It is one of the most important probability distributions in statistics because many natural and human-made phenomena tend to follow it under certain conditions.

# Properties of the Normal Distribution

# 1. Symmetry:
#   - The normal distribution is perfectly symmetrical about its mean. This symmetry implies that the left and right sides of the distribution are mirror images of each other.

# 2. Bell Shape:
#   - The distribution has a bell-shaped curve, with the majority of values clustering around the mean. This shape reflects how values closer to the mean are more common, while extreme values are less frequent.

# 3. Mean, Median, and Mode Are Equal:
#   - In a normal distribution, the mean, median, and mode all coincide at the center of the distribution. This is a direct result of its symmetry.

# 4. Asymptotic Nature:
#   - The tails of the normal distribution extend infinitely in both directions and never actually touch the horizontal axis. This means there is a small, nonzero probability of observing extreme values far from the mean.

# 5. Defined by Mean (μ) and Standard Deviation (σ):
#   - The shape and location of a normal distribution are determined by its mean(μ) and standard deviation(σ).
#   - The mean (μ) defines the center of the distribution, while the standard deviation (σ) determines the spread, or how "wide" or "narrow" the distribution is.

# 6. Total Area Equals 1:
#   - The total area under the normal distribution curve is equal to 1, representing the entire probability space. Any probability calculation within the distribution involves calculating the area under the curve for a given interval.

# The Empirical Rule (68-95-99.7 Rule)

# The empirical rule, also known as the 68-95-99.7 rule, is a key property of the normal distribution that describes how data are distributed around the mean. It provides a quick way to understand the spread of data points in a normal distribution based on their distance from the mean.

# 1. Approximately 68% of Data within 1 Standard Deviation (μ ± σ):
#   - About 68% of the values in a normal distribution fall within one standard deviation of the mean. This means that if you were to randomly pick a data point from a normally distributed dataset, there’s a 68% chance that it would be within this range.

# 2. Approximately 95% of Data within 2 Standard Deviations (μ ± 2σ):
#   - About 95% of the values fall within two standard deviations of the mean. This interval captures nearly all values for many practical purposes, making it useful for identifying values that deviate from the mean.

# 3. Approximately 99.7% of Data within 3 Standard Deviations (μ ± 3σ):
#   - About 99.7% of the values fall within three standard deviations of the mean. Data points beyond this range are rare and are often considered outliers.

# These intervals provide insight into the distribution and allow statisticians to make quick probability estimates without complex calculations. 
# For example, if we know a dataset is normally distributed, we can say that any data point lying more than three standard deviations from the mean is highly unusual.

In [10]:
##  Provide a real-life example of a Poisson process and calculate the probability for a specific event.

# The Poisson process is often used to model events that occur randomly over a fixed interval of time or space, where these events happen independently and at a constant average rate.
# It’s particularly useful for counting occurrences of rare or random events, such as accidents, phone calls to a call center, or emails received.

# Real-Life Example: Hospital Emergency Room Arrivals

# Let’s say a hospital’s emergency room receives an average of 5 patient arrivals per hour.
# This situation can be modeled with a Poisson process since:
# 1. Arrivals are random and independent.
# 2. They occur at a constant average rate (5 per hour).

# We can use the Poisson distribution to calculate the probability of receiving a specific number of patients within an hour.

# Poisson Probability Formula

# The probability of observing exactly \( k \) events in a fixed interval, given an average rate \( \lambda \), is calculated using the **Poisson probability formula**:

# P(X = k) = (lambda^k e^(lambda))/(k!)

# Example Calculation

# Let’s calculate the probability that exactly 8 patients arrive in one hour, given an average rate of 5 arrivals per hour.

# Given:
# ( lambda = 5 ) (mean number of arrivals per hour),
# ( k = 8 ) (number of arrivals we want to find the probability for).

# Using the formula:

# P(X = 8) = ((5^8) * e^(-5))/8!


# 1. Calculate ( 5^8): 
#   5^8 = 390625


# 2. Calculate ( e^(-5) ):
#   e^(-5) \approx 0.0067

# 3. Calculate ( 8! ) (factorial of 8):   
#   8! = 40320

# 4. Plug these values into the formula:

#   P(X = 8) = (390625* 0.0067)/(40320)
#   P(X = 8) = 0.0649
# So, the probability that exactly 8 patients arrive in the emergency room in one hour is approximately 0.0649, or about 6.5%.

In [11]:
##  ## Explain what a random variable is and differentiate between discrete and continuous random variables.

# A random variable is a numerical outcome of a random phenomenon.
# It is a function that assigns a real number to each possible outcome of a random experiment. 
# Random variables are used in statistics and probability theory to quantify outcomes and to perform calculations about them.

# Types of Random Variables

# Random variables can be categorized into two main types: discrete random variables and continuous random variables.

# 1. Discrete Random Variables

# - Definition: A discrete random variable can take on a finite or countably infinite number of distinct values.
# The values can often be counted, and there are gaps between them.
# - Examples:
#   - Number of students in a classroom: This variable can take values like 0, 1, 2, ..., up to some maximum number, but it cannot take fractional values.
#   - Number of heads when flipping a coin three times: Possible outcomes are 0, 1, 2, or 3 heads.
#   - Roll of a die: The outcomes can be 1, 2, 3, 4, 5, or 6.

# - Probability Mass Function (PMF): The probabilities associated with a discrete random variable can be described using a probability mass function, which assigns a probability to each possible value of the variable.
#  For example, if \( X \) is the number of heads in three coin flips, we could represent \( P(X = k) \) for \( k = 0, 1, 2, 3 \).

# 2. Continuous Random Variables

# - Definition: A continuous random variable can take on any value within a given range or interval. These variables are uncountably infinite and can include fractions or decimals.
# - Examples:
#  - Height of students: A person's height can vary continuously (e.g., 150.5 cm, 160.2 cm, etc.).
#  - Temperature readings: Temperature can take any value in a range, such as between -10°C and 40°C.
#  - Time taken to complete a task: This could be any positive value, such as 2.5 seconds or 10.7 minutes.

# - Probability Density Function (PDF): For continuous random variables, probabilities are described using a probability density function, which represents the likelihood of the variable taking on a specific value.
# The probability of a continuous random variable falling within a specific interval is found by integrating the PDF over that interval.
# For example, to find the probability that a random variable \( Y \) falls between values \( a \) and \( b \), you would compute:

# P(a < Y < b) = int_{a}^{b} f(y) , dy

# where ( f(y) ) is the PDF of the random variable.

In [12]:
##  Provide an example dataset, calculate both covariance and correlation, and interpret the results.

# Scenario: Let's consider a simple dataset that tracks daily ice cream sales and the average daily temperature.
# We want to see if there's a relationship between these two variables.

# Dataset:

# Day    Ice Cream Sales   Temperature (°C)
#  1            20               25
#  2            25               30
#  3            18               22
#  4            32               35
#  5            28               32


# calculating covariance and correlation
import numpy as np

# Create arrays for ice cream sales and temperature
sales = np.array([20, 25, 18, 32, 28])
temperature = np.array([25, 30, 22, 35, 32])

# Calculate covariance
covariance = np.cov(sales, temperature)[0, 1]
print("Covariance:", covariance)

# Calculate correlation
correlation = np.corrcoef(sales, temperature)[0, 1]
print("Correlation:", correlation)

Covariance: 29.9
Correlation: 0.9919605070859466
