# Introduction

Statistics referes to the mathematics and techniques with which we understand data.

### Descriptive statistics
Data analysis that summarizes data in a meaningful way such that patterns emerge from it.

Measures such as mean, median, mode, standard deviation, variance, and quartiles fall under this category.


### Inferential statistics
Describing the larger picture of the analysis with a limited set of data and deriving conclusions from it.

Techniques such as hypothesis testing, confidence intervals, and regression analysis are used for this purpose.

### Descriptive versus Inferential statistics
**Descriptive statistics** allow you to describe a data set, while **Inferential statistics** allow you to make inferences based on a data set.

#### Descriptive statistics
Using descriptive statistics, you can report characteristics of your data:
- The **distribution** concerns the frequency of each value.
- The **central tendency** concerns the averages of the values.
- The **variability** concerns how spread out the values are.

In descriptive statistics, there is no uncertainty – the statistics precisely describe the data that you collected. If you collect data from an entire population, you can directly compare these descriptive statistics to those from other populations.

**An Example of Descriptive statistics**

You collect data on the SAT scores of all 11th graders in a school for three years. You can use descriptive statistics to get a quick overview of the school’s scores in those years. You can then directly compare the mean SAT score with the mean scores of other schools.

#### Inferential statistics
Most of the time, you can only acquire data from samples, because it is too difficult or expensive to collect data from the whole population that you’re interested in.While descriptive statistics can only summarize a sample’s characteristics, inferential statistics use your sample to make reasonable guesses about the larger population.

With inferential statistics, it’s important to use random and unbiased sampling methods. If your sample isn’t representative of your population, then you can’t make valid statistical inferences or generalize.

**An Example of Inferential statistics**

You randomly select a sample of 11th graders in your state and collect data on their SAT scores and other characteristics. You can use inferential statistics to make estimates and test hypotheses about the whole population of 11th graders in the state based on your sample data.

*Sampling error in inferential statistics*

Since the size of a sample is always smaller than the size of the population, some of the population isn’t captured by sample data. This creates sampling error, which is the difference between the true population values (called parameters) and the measured sample values (called statistics).


Sampling error arises any time you use a sample, even if your sample is random and unbiased. For this reason, there is always some uncertainty in inferential statistics. However, using probability sampling methods reduces this uncertainty.



#### Key steps involved in inferential statistics.

##### 1. Formulating a Hypothesis
A hypothesis is a statement or assumption about a population parameter, such as the mean or proportion.

**Null Hypothesis (H0)** -represents the status quo.

**Alternative Hypothesis (Ha)** - proposes a specific claim or difference.

Example:

**H0**: There is no significant difference in the mean test scores between students who receive tututoring and those who do not.

**Ha**: There is a significant difference in mean test scores between students who receive tutoring and those who do not.

##### 2. Selecting a Sample
A representative sample from the population of interesting appropriate sampling techniques.

##### 3. Collecting Data
Data is collected from the selected sample using various methods such as surveys, experiments or observations.
The data should be collected in a manner that minimizes bias and ensures its reliability and validity.

##### 4. Analyzing the Data
Statistical techniques are applied to the sample data to estimate population parameters and assess the strength if evidence dor or against the null hypothesis.
Common techniques include: Hypothesis testing, Confidence Intervals and Regression Analysis.

##### 5. Drawing Conclusions
Based on the analysis of the sample data, conclusions are drawn about the population of interest.
These conclusions may involve accepting or rejecting the null hypothessis, estimating population parameters or making predictions.


#### Common Techniques used in inferential statistics.
These include:

##### Hypothesis testing
Assessing the evidence provided by the sample data to determine whether the evidence is strong enough to reject the null hypothesis in favor of alternate hypothesis.
This is typically done by calculating a test statistic and comparing it to a critical value or calculating a p-value.

##### Confidence Intervals
It provides a range of values within which the population parameter is estimated to lie within a certain level of confidence.
For example, a 95% confidence interval for the population mean indicates that we are 95% confident that the true population mean falls within that interval.

##### Regression Analysis
It is used to model and analyze the ralationship between one or more independent variables and a dependent variable.
It helps understand the impact of the independent variables on the dependent variable and makes predictions or explanations based on the model.

![Alt text](image-20.png)


##### Correlation tests
**Correlation tests** determine the extent to which two variables are associated.

Although **Pearson’s r** is the most statistically powerful test, **Spearman’s r** is appropriate for interval and ratio variables when the data doesn’t follow a normal distribution.

The chi square test of independence is the only test that can be used with nominal variables.

![Alt text](image-19.png)

##### Comparison tests
**Comparison tests** assess whether there are differences in means, medians or rankings of scores of two or more groups.

To decide which test suits your aim, consider whether your data meets the conditions necessary for parametric tests, the number of samples, and the levels of measurement of your variables.

Means can only be found for interval or ratio data, while medians and rankings are more appropriate measures for **ordinal data**.

Ordinal data -  is classified into categories within a variable that have a natural rank order. 

![Alt text](image-21.png)

**Types of comparison tests:**

![Alt text](image-22.png)


# Practical examples

### Confidence Interval 

In [3]:
#import libraries
import numpy as np
import scipy.stats as stats

# Generate a sample dataset
dataset = np.random.normal(loc=50, scale=10, size=100)

# Calculate the sample mean and standard deviation
sample_mean = np.mean(dataset)
sample_std = np.std(dataset, ddof=1)  # Set ddof=1 for sample standard deviation

# Set the desired confidence level
confidence_level = 0.95

# Calculate the critical value based on the confidence level
z_critical = stats.norm.ppf((1 + confidence_level) / 2)

# Calculate the margin of error
margin_of_error = z_critical * (sample_std / np.sqrt(len(dataset)))

# Calculate the confidence interval
confidence_interval = (sample_mean - margin_of_error, sample_mean + margin_of_error)

# Print the results
print("Sample Mean:", sample_mean)
print("Confidence Interval:", confidence_interval)

Sample Mean: 47.669047711024724
Confidence Interval: (45.72993542441857, 49.60815999763088)


**Explanation** <br>
*numpy.random.normal* - generate a dataset with a mean of 50 and a standard deviation of 10.<br>
*np.mean* - calculate the sample mean of the dataset <br>
*np.st* - sample standard deviation of the dataset 

### Conclusion
Inferential statistic allows Researchers and Data Scientists to make generalizations and draw conclusions beyond the specific data collected in a study.
By using appropriate statistical techniques, they can estimate population parameters, make predictions, test hypothesis and gain insights into the larger population of interest.