<a href="https://colab.research.google.com/github/Siddhu290/Machine_Learning/blob/main/2024-07-04/Sampling_Population.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Sampling Techniques**
---
## Introduction to Sampling
The introduction emphasizes the critical nature of selecting a sample of individuals for research. The way participants are chosen will determine the population to which the research findings can be generalized. Random sampling is highlighted as a method to avoid bias in treatment groups, ensuring equal representation on known and unknown factors.

## Distinguishing Between a Sample and a Population
This section defines key terms:
- **Population**: All members meeting a set of specifications.
- **Element**: A single member of a population.
- **Sample**: Some elements selected from a population.
- **Census**: All elements included from a population.
Examples are provided to illustrate these concepts.

## Simple Random Sampling
1. **Defining the Population**: Identify the population of interest.
2. **Constructing a List**: Create a list of all members of the population.
3. **Drawing the Sample**: Randomly select individuals from the list.
4. **Contacting Members of the Sample**: Reach out to the selected individuals for participation.

## Stratified Random Sampling
This method involves dividing the population into subgroups (strata) and randomly selecting individuals from each stratum. This technique increases the representativeness of the sample and allows for comparisons among subgroups.

## Convenience Sampling
Convenience sampling involves selecting individuals who are readily available. It is quick and inexpensive but generally less representative than random sampling, which can lead to biased results.

## Quota Sampling
Quota sampling involves selecting a certain percentage of individuals from specified subgroups of the population. This method is often used by polling organizations when the population is large, and lists of members are not available.

## Sample Size
Determining the appropriate sample size depends on various factors, including population variability, statistical considerations, economic factors, and the availability of participants.

## Sampling Error
Sampling error is the degree of error in a sample's representation of a population. Larger samples are more likely to accurately represent the population, reducing sampling error.

## Evaluating Information from Samples
Evaluating the specific method by which a sample was drawn is crucial. Information from self-selected samples, which often represent a narrow subgroup, can be misleading.

## Case Analysis
A case study is presented where a psychologist interviews Columbine students to understand the factors behind the 1999 school shooting. The case highlights the limitations of using non-representative samples to draw conclusions.

## General Summary
The summary reiterates that research often involves drawing conclusions about a population based on a sample. The sample must be representative, and probability sampling is the best method for achieving this. Non-probability sampling, such as convenience sampling, is less reliable.

## Detailed Summary
- **Sampling**: Selecting some elements from a population for a study.
- **Population**: All individuals with a specific characteristic of interest.
- **Probability Sampling**: Each element has a known probability of being included.
- **Nonprobability Sampling**: No way of estimating the inclusion probability.
- **Simple Random Sampling**: Equal chance of selection for each individual.
- **Stratified Random Sampling**: Random selection from specified subgroups.
- **Convenience Sampling**: Quick and inexpensive but less representative.
- **Quota Sampling**: Selection of percentages from specified subgroups.


# Steps to Perform Simple Random Sampling

i) **Determine the population size (N).**

ii) **Determine the sample size (n).**

iii) **Number each member of the population under investigation in serial order.**
- Suppose there are 100 members, number them from 00 to 99.

iv) **Determine the starting point for selecting the sample by randomly picking a page from the random number tables and dropping your finger on the page blindly.**

v) **Choose the direction in which you want to read the numbers (from left to right, right to left, down, or up).**

vi) **Select the first ‘n’ numbers whose X digits are between 0 and N.**
- If N = 100, then X would be 2.
- If N is a four-digit number, then X would be 3, and so on.

vii) **Once a number is chosen, do not use it again.**

viii) **If you reach the endpoint of the table before obtaining ‘n’ numbers:**
- Pick another starting point and read in a different direction.
- Then use the first X digit instead of the last X digits and continue until the desired sample is selected.


# Stratified Random Sampling

## Overview
Stratified random sampling is used when a population is heterogeneous, composed of distinct subgroups (strata) such as male/female, rural/urban, literate/illiterate, etc. This method divides the population into homogeneous groups and randomly samples from each stratum, ensuring each subgroup is represented.

## Proportional Stratified Sample
In proportional stratified sampling, the sample size from each stratum is proportional to the stratum's size in the population.

### Example
For a population with 65% male, 35% female, 30% urban, and 70% rural:
- Urban Male: \(0.30 \times 0.65 = 0.195\)
- Urban Female: \(0.30 \times 0.35 = 0.105\)
- Rural Male: \(0.70 \times 0.65 = 0.455\)
- Rural Female: \(0.70 \times 0.35 = 0.245\)

If a sample size of 1000 is needed:
- Urban Male: \(0.195 \times 1000 = 195\)
- Urban Female: \(0.105 \times 1000 = 105\)
- Rural Male: \(0.455 \times 1000 = 455\)
- Rural Female: \(0.245 \times 1000 = 245\)

## Disproportional Stratified Sample
In disproportional stratified sampling, the sample size from each stratum is based on analytical considerations like variance, stratum population, and resource constraints.

### Example
For a population with variances:
- Urban Male: 3.0
- Urban Female: 5.5
- Rural Male: 2.5
- Rural Female: 1.75

Using these variances to allocate a sample size of 1000:
- Urban Male: 207
- Urban Female: 151
- Rural Male: 442
- Rural Female: 199

## Advantages
- More representative of the population.
- More precise and reduces bias.
- Saves time and resources.

## Limitations
- Requires detailed knowledge of the population's attributes.
- Preparing a stratified list can be difficult and may not always be available.
---

# **Descriptive Statistics**

## 1. Mean (Arithmetic Average)
- **Definition**: The sum of all values divided by the number of values.
- **Formula**:
  \[
  \bar{x} = \frac{\sum_{i=1}^n x_i}{n}
  \]
  Where \(\bar{x}\) is the sample mean, \(x_i\) are the individual values, and \(n\) is the number of values.

## 2. Median
- **Definition**: The middle value in a data set when ordered from least to greatest. If the number of observations is even, it is the average of the two middle numbers.
- **Calculation**: For a dataset \(\{ x_1, x_2, ..., x_n \}\):
  - If \( n \) is odd: \( \text{Median} = x_{\frac{n+1}{2}} \)
  - If \( n \) is even: \( \text{Median} = \frac{x_{\frac{n}{2}} + x_{\frac{n}{2} + 1}}{2} \)

## 3. Mode
- **Definition**: The value that appears most frequently in a data set.
- **Note**: A dataset may have one mode, more than one mode, or no mode at all.

## 4. Range
- **Definition**: The difference between the highest and lowest values.
- **Formula**:
  \[
  R = \text{Max} - \text{Min}
  \]

## 5. Standard Deviation
- **Definition**: A measure of the amount of variation or dispersion of a set of values.
- **Formula**:
  \[
  s = \sqrt{\frac{\sum_{i=1}^n (x_i - \bar{x})^2}{n-1}}
  \]
  Where \( s \) is the sample standard deviation, \( x_i \) are the values, and \( \bar{x} \) is the sample mean.

## 6. Variance
- **Definition**: The average of the squared differences from the mean.
- **Formula**:
  \[
  s^2 = \frac{\sum_{i=1}^n (x_i - \bar{x})^2}{n-1}
  \]

## 7. Skewness
- **Definition**: A measure of the asymmetry of the probability distribution of a real-valued random variable.
- **Interpretation**:
  - Skewness > 0: Right skew (positive skew)
  - Skewness < 0: Left skew (negative skew)
  - Skewness = 0: Symmetrical distribution

## 8. Kurtosis
- **Definition**: A measure of the "tailedness" of the probability distribution.
- **Interpretation**:
  - High kurtosis: Heavy tails and a sharp peak
  - Low kurtosis: Light tails and a flat peak

## Graphical Representation
- **Qualitative Data**: Pie charts, bar graphs
- **Quantitative Data**: Histograms, bar graphs

## Data Types
- **Qualitative Data**: Non-numerical data (e.g., education, marital status)
- **Quantitative Data**: Numerical data, which can be discrete or continuous

## Frequency Distribution
- **Definition**: A summary of how often each value or range of values occurs in a dataset.

## Summary
Descriptive statistics are used to describe the basic features of data in a study. They provide simple summaries about the sample and the measures. These summaries can be either quantitative (e.g., measures of central tendency and variability) or visual (e.g., graphs and charts).
