<a href="https://colab.research.google.com/github/MithunSR/Scipy-Tutorial/blob/main/Scipy_Statistical_Analysis_and_Hypothesis_Testing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#1. Introduction
The provided code showcases the usage of Scipy, a powerful library for scientific computing, in performing statistical analysis and hypothesis testing. Statistical analysis plays a crucial role in extracting insights and making data-driven decisions in various fields. The code demonstrates several key aspects of statistical analysis, including descriptive statistics, hypothesis testing, and correlation analysis.

Descriptive statistics provide a summary of the dataset by capturing important characteristics such as central tendency, variability, and distribution. In the code, descriptive statistics are computed using functions like mean, median, standard deviation, minimum, and maximum values. These statistics offer a concise representation of the dataset and help in understanding its overall properties.

Hypothesis testing is a fundamental component of statistical analysis, allowing us to draw inferences and make decisions based on sample data. The code presents two common hypothesis testing techniques: t-test and chi-square test. The t-test examines whether there is a significant difference between the means of two independent samples, while the chi-square test assesses the relationship between categorical variables. The resulting test statistics and p-values provide evidence to accept or reject the null hypothesis.

Another essential aspect of statistical analysis is the analysis of variance (ANOVA), which assesses the differences in means among multiple groups. The code demonstrates one-way ANOVA, where three groups are compared to determine if there are significant differences in their means. The F-statistic and p-value generated from ANOVA analysis aid in understanding group variations and potential relationships between the variables.

Correlation analysis investigates the association between two variables. The code calculates the Pearson correlation coefficient, which measures the strength and direction of a linear relationship between two variables. Additionally, the associated p-value evaluates the statistical significance of the correlation coefficient. Correlation analysis is useful in understanding the interdependence between variables and identifying potential patterns or trends.

Overall, the code demonstrates the utilization of Scipy's statistical functions for performing various statistical analyses, allowing data scientists and researchers to gain insights, validate hypotheses, and make informed decisions based on data. By applying these techniques to real-world datasets, one can extract meaningful information and uncover relationships within the data, ultimately contributing to data-driven decision-making and problem-solving in various domains.

#2. Example with code example
##2.1 Import Libraries
First, we import the necessary libraries, numpy and scipy.stats.

In [2]:
import numpy as np
from scipy import stats

##2.2 generate Random Data
We generate random data using np.random.normal(). The loc parameter specifies the mean, scale specifies the standard deviation, and size determines the number of samples.

In [3]:
# Generate random data
np.random.seed(0)
data = np.random.normal(loc=0, scale=1, size=100)

##2.3 Descriptive Statistics:

We calculate the mean, median, standard deviation, minimum value, and maximum value using np.mean(), np.median(), np.std(), np.min(), and np.max() functions respectively.

In [4]:
# Descriptive statistics
mean = np.mean(data)
median = np.median(data)
std = np.std(data)
min_val = np.min(data)
max_val = np.max(data)

In [5]:
print("Descriptive Statistics:")
print("Mean:", mean)
print("Median:", median)
print("Standard Deviation:", std)
print("Minimum Value:", min_val)
print("Maximum Value:", max_val)

Descriptive Statistics:
Mean: 0.059808015534485
Median: 0.09409611943799814
Standard Deviation: 1.0078822447165796
Minimum Value: -2.5529898158340787
Maximum Value: 2.2697546239876076


##2.4 Hypothesis Testing (t-test):

We create two samples, sample1 and sample2, using np.random.normal().
We perform an independent t-test using stats.ttest_ind() to compare the means of the two samples.
The resulting t-statistic and p-value are stored in t_statistic and p_value variables.

In [6]:
# Hypothesis testing (t-test)
sample1 = np.random.normal(loc=0, scale=1, size=50)
sample2 = np.random.normal(loc=1, scale=1, size=50)
t_statistic, p_value = stats.ttest_ind(sample1, sample2)

In [7]:
print("\nHypothesis Testing (t-test):")
print("T-Statistic:", t_statistic)
print("P-Value:", p_value)


Hypothesis Testing (t-test):
T-Statistic: -3.159380359002485
P-Value: 0.002102937568007988


##2.5 Hypothesis Testing (chi-square test):

We create an observed contingency table, observed, with the observed frequencies of two categorical variables.
We perform a chi-square test of independence using stats.chi2_contingency() to test if there is a relationship between the two variables.
The resulting chi-square statistic, p-value, degrees of freedom (dof), and expected frequencies are stored in chi2, p, dof, and expected variables.

In [8]:
# Hypothesis testing (chi-square test)
observed = np.array([[10, 20, 30], [6, 15, 25]])
chi2, p, dof, expected = stats.chi2_contingency(observed)

In [9]:
print("\nHypothesis Testing (chi-square test):")
print("Chi-square:", chi2)
print("P-Value:", p)


Hypothesis Testing (chi-square test):
Chi-square: 0.32545172219085255
P-Value: 0.8498241263395327


##2.6 Analysis of Variance (ANOVA):

We create three groups, group1, group2, and group3, using np.random.normal().
We perform a one-way ANOVA using stats.f_oneway() to test if there are significant differences among the means of the groups.
The resulting F-statistic and p-value are stored in f_statistic and p_value variables.

In [10]:
# Analysis of Variance (ANOVA)
group1 = np.random.normal(loc=0, scale=1, size=30)
group2 = np.random.normal(loc=1, scale=1, size=30)
group3 = np.random.normal(loc=2, scale=1, size=30)
f_statistic, p_value = stats.f_oneway(group1, group2, group3)

In [11]:
print("\nAnalysis of Variance (ANOVA):")
print("F-Statistic:", f_statistic)
print("P-Value:", p_value)


Analysis of Variance (ANOVA):
F-Statistic: 34.602351122736934
P-Value: 8.777968000007025e-12


##2.7 Correlation Analysis:

We generate two random variables, x and y, using np.random.normal().
We calculate the Pearson correlation coefficient and the associated p-value using stats.pearsonr().
The correlation coefficient and p-value are stored in correlation_coefficient and p_value variables.

In [12]:
# Correlation analysis
x = np.random.normal(loc=0, scale=1, size=100)
y = np.random.normal(loc=0, scale=1, size=100)
correlation_coefficient, p_value = stats.pearsonr(x, y)

In [13]:
print("\nCorrelation Analysis:")
print("Correlation Coefficient:", correlation_coefficient)
print("P-Value:", p_value)



Correlation Analysis:
Correlation Coefficient: -0.015900684268378675
P-Value: 0.8752309433436853
