<a href="https://colab.research.google.com/github/SURESHBEEKHANI/Statistics-For-Data-Science-learining/blob/main/Basic_Statistics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Introduction to Statistics**

Statistics is a branch of mathematics that deals with collecting, analyzing, interpreting, and presenting data. It is widely used in various fields to make informed decisions based on data.



## **Two Types of Statistics**

Statistics can generally be divided into two main categories:

Descriptive Statistics

Inferential Statistics

## **1. Descriptive Statistics**

Descriptive statistics involve methods for summarizing and organizing data. It describes the basic features of the data in a study. These statistics are used to present quantitative descriptions in a manageable form. The key elements of descriptive statistics include:


- **Measures of Central Tendency**:
  - Mean (average)
  - Median (middle value)
  - Mode (most frequent value)
  
- **Measures of Dispersion**:
  - Range (difference between the highest and lowest values)
  - Variance (average squared deviation from the mean)
  - Standard deviation (square root of the variance)

- **Frequency Distributions**: Organizing data into tables or graphs to show the number of occurrences of different values.

- **Graphs**:
  - Histograms
  - Bar charts
  - Pie charts
  - Box plots

## **2. Inferential Statistics**

Inferential statistics involves drawing conclusions or making inferences about a population based on sample data. It uses probability theory to make predictions or generalizations. Key techniques in inferential statistics include:

- **Hypothesis Testing**: Testing assumptions about a population based on sample data.
  
- **Confidence Intervals**: Estimating a range within which a population parameter is likely to fall.

- **Regression Analysis**: Analyzing the relationship between dependent and independent variables.

- **Analysis of Variance (ANOVA)**: Comparing means across multiple groups to see if there are significant differences.

- **Sampling**: Using a sample to infer conclusions about a larger population.


In summary, descriptive statistics are used to summarize and describe data, while inferential statistics go beyond the data to make predictions or test hypotheses.


### **Sampling techniques**

Sampling is a fundamental process in research and data collection, where a subset of individuals or observations (called a **sample**) is selected from a larger group or population. The goal of sampling is to gather data that can represent the whole population, allowing researchers to draw conclusions, make inferences, or generalize findings without studying every individual.

---

 **Why Use Sampling?**
1. **Cost-Effectiveness**: Studying the entire population can be expensive; sampling reduces costs.
2. **Time Efficiency**: Collecting and analyzing data from a sample takes less time than from the whole population.
3. **Feasibility**: Accessing every member of a population may not be possible (e.g., populations that are dispersed or hidden).
4. **Accuracy and Manageability**: Smaller, well-chosen samples allow for precise and focused studies, reducing errors from overwhelming data sizes.

---

 **Key Concepts in Sampling**
- **Population**: The entire group you want to study or make conclusions about (e.g., all voters in a country).
- **Sample**: A subset of the population selected for the study.
- **Sampling Frame**: A list or database containing all members of the population from which the sample is drawn.
- **Sampling Error**: The difference between the characteristics of the sample and the actual population due to sampling.

---

 **Types of Sampling**

 1. **Probability Sampling**
- Every individual has a known, non-zero chance of being selected.
- **Examples**:
  - Simple random sampling
  - Systematic sampling
  - Stratified sampling
  - Cluster sampling
- **Advantages**: Results are more representative and can be generalized to the population.

### 2. **Non-Probability Sampling**
- Selection is based on non-random criteria, often relying on the researcher’s judgment or convenience.
- **Examples**:
  - Convenience sampling
  - Judgment sampling
  - Quota sampling
  - Snowball sampling
- **Advantages**: Faster and easier, especially for exploratory or preliminary research.

---

 **Importance of Sampling**
- **Generalization**: Enables conclusions about a population from a manageable group.
- **Practicality**: Reduces the logistical and financial constraints of studying an entire population.
- **Insightful Data**: When done correctly, provides accurate and reliable data for decision-making.

---

By understanding sampling techniques and their appropriate applications, researchers can ensure the reliability and validity of their findings.



### **1. Simple Random Sampling**
This method selects elements randomly from a population.

In [7]:
import random  # Importing the random module to use its random sampling functions

# Define the population
# Creating a list of integers from 1 to 100 to represent the population.
population = list(range(1, 101))  # Population of integers 1 to 100

# Define the sample size
# Setting the number of samples we want to randomly select.
sample_size = 10  # Specify the sample size

# Generate a random sample
# Using the `random.sample` function to select 'sample_size' unique elements from the population.
# This ensures that each element has an equal chance of being selected.
simple_random_sample = random.sample(population, sample_size)

# Print the generated sample
# Displaying the randomly selected elements from the population.
print("Simple Random Sample:", simple_random_sample)


Simple Random Sample: [59, 63, 41, 50, 34, 35, 43, 74, 6, 97]


### **2. Systematic Sampling**
This method selects every k th element from the population.

In [8]:
# Define the population
# Creating a list of integers from 1 to 100 to represent the population.
population = list(range(1, 101))  # Population of integers 1 to 100

# Define the sample size
# Setting the desired number of samples to 20.
sample_size = 20

# Calculate the sampling interval
# Dividing the population size by the sample size to determine the interval (k).
# This means we will pick every k-th element.
k = len(population) // sample_size  # Sampling interval

# Generate a systematic sample
# Using a list comprehension to select every k-th element from the population.
# Starting at index 0 and advancing by k to collect elements.
systematic_sample = [population[i] for i in range(0, len(population), k)]

# Print the first 'sample_size' elements of the systematic sample
# Ensuring that we only display the requested number of elements from the systematic sample.
print("Systematic Sample:", systematic_sample[:sample_size])


Systematic Sample: [1, 6, 11, 16, 21, 26, 31, 36, 41, 46, 51, 56, 61, 66, 71, 76, 81, 86, 91, 96]


### **3. Stratified Sampling**
This method divides the population into strata and samples from each.


In [10]:
import random  # This imports a tool that helps us pick random items.

# Define the population (grouped by strata)
# Imagine we have two groups of people: Group A and Group B.
# Group A has numbers from 1 to 50, and Group B has numbers from 51 to 100.
population = {'Group A': list(range(1, 51)), 'Group B': list(range(51, 101))}

# Define the sample size per stratum
# We want to pick 5 random numbers from each group (A and B).
sample_size_per_stratum = 5

# Generate a stratified sample
# For each group (A and B), we randomly pick 5 numbers.
# This creates a smaller set that still represents each group.
stratified_sample = {
    stratum: random.sample(population[stratum], sample_size_per_stratum)  # Pick 5 random numbers from the group
    for stratum in population  # Go through each group one by one
}

# Print the result
# Show the selected numbers from both groups (A and B).
print("Stratified Sample:", stratified_sample)


Stratified Sample: {'Group A': [23, 28, 46, 16, 20], 'Group B': [58, 80, 70, 66, 92]}


### **4. Cluster Sampling**

This method selects entire clusters (subgroups) randomly.

In [11]:
import random  # This imports a tool that lets us pick random items or groups.

# Define the population divided into clusters
# We divide the population into smaller groups called clusters.
# Each cluster contains numbers in a specific range:
clusters = [
    list(range(1, 21)),  # Cluster 1: Numbers from 1 to 20
    list(range(21, 41)),  # Cluster 2: Numbers from 21 to 40
    list(range(41, 61)),  # Cluster 3: Numbers from 41 to 60
    list(range(61, 81)),  # Cluster 4: Numbers from 61 to 80
    list(range(81, 101))  # Cluster 5: Numbers from 81 to 100
]

# Randomly select clusters
# We randomly pick 2 of the clusters to work with.
selected_clusters = random.sample(clusters, 2)  # Choose 2 random groups

# Combine selected clusters into a sample
# After picking the clusters, we combine all the numbers in those clusters into one list.
# This gives us the final sample to work with.
cluster_sample = [item for cluster in selected_clusters for item in cluster]

# Print the result
# Show all the numbers selected from the chosen clusters.
print("Cluster Sample:", cluster_sample)


Cluster Sample: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60]



### **5. Convenience Sampling**
This method selects the most easily accessible elements.

In [21]:
# Define the population
# This creates a list of numbers from 1 to 100. Imagine this as a group of 100 people or items.
population = list(range(1, 101))  # Create a population of numbers from 1 to 100.

# Simulate a convenience sample
# Convenience sampling means selecting a group of people or items that are easiest to access.

# Take the first 10 numbers from the population (1 to 10)
convenience_sample1 = population[:10]  # Take the first 10 numbers (1 to 10).

# Take everything except the last 10 numbers (1 to 90)
convenience_sample2 = population[:-10]  # Take everything except the last 10 numbers (1 to 90).

# Take only the last 10 numbers (91 to 100)
convenience_sample3 = population[-10:]  # Take the last 10 numbers (91 to 100).

# Take everything except the first 10 numbers (11 to 100)
convenience_sample4 = population[10:]  # Take everything except the first 10 numbers (11 to 100).

# Print the results of the convenience samples
# Showing the output of each sample type, explaining what it represents.

print("Convenience Sample (First 10):", convenience_sample1)  # Prints the first 10 elements (1 to 10)
print("Convenience Sample (Excluding Last 10):", convenience_sample2)  # Prints everything except the last 10 elements (1 to 90)
print("Convenience Sample (Excluding First 10):", convenience_sample3)  # Prints the last 10 elements (91 to 100)
print("Convenience Sample (Excluding First 10):", convenience_sample4)  # Prints everything except the first 10 elements (11 to 100)


Convenience Sample (First 10): [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Convenience Sample (Excluding Last 10): [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90]
Convenience Sample (Excluding First 10): [91, 92, 93, 94, 95, 96, 97, 98, 99, 100]
Convenience Sample (Excluding First 10): [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100]


### **6. Consecutive Sampling**
This method selects consecutive elements meeting certain criteria.

In [26]:
# Define the population
population = list(range(1, 101))  # Population of integers 1 to 100

# Define the sample size
sample_size1 = 10
sample_size2=30

# Select consecutive elements
consecutive_sample1 = population[:sample_size1]
consecutive_sample2 = population[:sample_size2]

print("Consecutive Sample:", consecutive_sample1)

print("Consecutive Sample:", consecutive_sample2)


Consecutive Sample: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Consecutive Sample: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]


### **7. Quota Sampling**
This method selects elements to fulfill specific quotas.

In [29]:
import random  # Import the random module to allow random selection of items from the groups.

# Define the population grouped by categories
# Here, the population is divided into two groups: 'Male' and 'Female'.
# 'Male' group contains numbers from 1 to 50, and 'Female' group contains numbers from 51 to 100.
population = {'Male': list(range(1, 51)), 'Female': list(range(51, 101))}

# Define quotas
# We want to select 5 samples from each group ('Male' and 'Female').
quota = {'Male': 5, 'Female': 5}

# Generate a quota sample
# For each group ('Male' and 'Female'), we randomly pick the number of samples defined by the quota (5 in each group).
# `random.sample()` is used to randomly select 5 items from each group.
quota_sample = {
    group: random.sample(population[group], quota[group])  # Pick 5 random numbers from each group
    for group in quota  # Go through each group ('Male' and 'Female')
}

# Print the result of the quota sample
# This will display the 5 random numbers selected from the 'Male' and 'Female' groups.
print("Quota Sample:", quota_sample)


Quota Sample: {'Male': [44, 40, 16, 41, 50], 'Female': [96, 58, 54, 67, 97]}


### **8. Judgment Sampling**
This method relies on the researcher’s judgment to select the sample.

In [32]:
# Define the population
# The population is a list of numbers from 1 to 100, representing a group of 100 items or people.
population = list(range(1, 101))  # Population of integers 1 to 100

# Use researcher judgment to select specific elements
# In judgment sampling, the researcher selects specific elements based on their own knowledge or criteria.
# Here, we manually choose the numbers 2, 4, 6, 8, and 10 as the sample.
judgment_sample = [2, 4, 6, 8, 10]  # Arbitrary selection of 5 numbers.

# Print the result of the judgment sample
# This will display the sample selected based on the researcher's judgment.
print("Judgment Sample:", judgment_sample)


Judgment Sample: [2, 4, 6, 8, 10]


### **9. Snowball Sampling**
This method involves participants recruiting others into the sample.

In [37]:
# Initial participants
# We start with one initial participant (represented by the number 1).
initial_sample = [1]

# Simulate snowball effect
# In snowball sampling, we begin with a small group and recruit new participants through referrals from the existing ones.
# The process continues in waves where each new participant can refer others.

snowball_sample = initial_sample  # Begin with the initial sample containing the first participant.
for _ in range(90):  # Repeat the process 4 more times (so we have 5 waves in total).
    # Generate the next participants by taking the last participant in the list and adding 1 to create a new participant.
    next_participants = [x + 1 for x in snowball_sample[-1:]]  # Add 1 to the last participant's number to create the new participant.

    # Add the new participants to the snowball sample.
    snowball_sample.extend(next_participants)

# Print the result of the snowball sample
# Display the participants recruited through the snowball effect after 5 waves.
print("Snowball Sample:", snowball_sample)


Snowball Sample: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91]
