# Sampling_Method
## Bad ways to sample
**Convenience sample:** The researcher chooses a sample that is readily available in some non-random way.

**Example—** A researcher polls people as they walk by on the street.

Why it's probably biased: The location and time of day and other factors may produce a biased sample of people.

**Voluntary response sample:** The researcher puts out a request for members of a population to join the sample, and people decide whether or not to be in the sample.

**Example—** A TV show host asks his viewers to visit his website and respond to an online poll.

Why it's probably biased: People who take the time to respond tend to have similarly strong opinions compared to the rest of the population.

## Good ways to sample
**Simple random sample:** Every member and set of members has an equal chance of being included in the sample. Technology, random number generators, or some other sort of chance process is needed to get a simple random sample.

**Example—** A teachers puts students' names in a hat and chooses without looking to get a sample of students.

Why it's good: Random samples are usually fairly representative since they don't favor certain members.

**Stratified random sample:** The population is first split into groups. The overall sample consists of some members from every group. The members from each group are chosen randomly.

**Example—** A student council surveys 100100100 students by getting random samples of 252525 freshmen, 252525 sophomores, 252525 juniors, and 252525 seniors.

Why it's good: A stratified sample guarantees that members from each group will be represented in the sample, so this sampling method is good when we want some members from every group.

**Cluster random sample:** The population is first split into groups. The overall sample consists of every member from some of the groups. The groups are selected at random

**Example—** An airline company wants to survey its customers one day, so they randomly select 555 flights that day and survey every passenger on those flights.

Why it's good: A cluster sample gets every member from some of the groups, so it's good when each group reflects the population as a whole.

**Systematic random sample:** Members of the population are put in some order. A starting point is selected at random, and every n^{\text{th}}n 
th
 n, start superscript, start text, t, h, end text, end superscript member is selected to be in the sample.
 
**Example—** A principal takes an alphabetized list of student names and picks a random starting point. Every 20^{\text{th}}20 
th
 20, start superscript, start text, t, h, end text, end superscript student is selected to take a survey.
 

# types of statistical tests
The decision of which statistical test to use depends on **the research design, the distribution of the data, and the type of variable.** In general, if the data is normally distributed, parametric tests should be used. If the data is non-normal, non-parametric tests should be used.

**Correlational:** these tests look for an association between variables.

**Pearson Correlation:** Tests for the strength of the association between two continuous variables.

**Spearman Correlation** Tests for the strength of the association between two ordinal variables (does not rely on the assumption of normally distributed data).

**Chi-Square:** Tests for the strength of the association between two categorical variables.

**Comparison of Means:** these tests look for the difference between the means of variables.

**Paired T-Test:** Tests for the difference between two variables from the same population (e.g., a pre- and posttest score).

**Independent T-Test:** Tests for the difference between the same variable from different populations (e.g., comparing boys to girls).

**ANOVA:** Tests for the difference between group means after any other variance in the outcome variable is accounted for (e.g., controlling for sex, income, or age).

**Regression:** these tests assess if change in one variable predicts change in another variable.

**Simple Regression:** Tests how change in the predictor variable predicts the level of change in the outcome variable.

**Multiple Regression:** Tests how changes in the combination of two or more predictor variables predict the level of change in the outcome variable.

**Non-Parametric:** these tests are used when the data does not meet the assumptions required for parametric tests.

**Wilcoxon Rank-Sum Test:** Tests for the difference between two independent variables; takes into account magnitude and direction of difference.

**Wilcoxon Sign-Rank Test"** Tests for the difference between two related variables; takes into account the magnitude and 
direction of difference.

**Sign Test:** Tests if two related variables are different; ignores the magnitude of change—only takes into account direction.



# What Is P-Value?

In statistics, the p-value is the probability of obtaining results at least as extreme as the observed results of a statistical hypothesis test, assuming that the null hypothesis is correct. The p-value is used as an alternative to rejection points to provide the smallest level of significance at which the null hypothesis would be rejected. A smaller p-value means that there is stronger evidence in favor of the alternative hypothesis.

**Calculation:** P-values are calculated from the deviation between the observed value and a chosen reference value, given the probability distribution of the statistic, with a greater difference between the two values corresponding to a lower p-value.

### key takeaways
1- A p-value is a measure of the probability that an observed difference could have occurred just by random chance.

2- The lower the p-value, the greater the statistical significance of the observed difference.

3-P-value can be used as an alternative to or in addition to preselected confidence levels for hypothesis testing.



# Predictive analytics Example

Predictive analytics in healthcare helps medical institutions identify people who are at risk of developing chronic conditions and give them preventive care before the disease progresses. This type of analytics assigns scores to patients based on a variety of factors, including demographics, disabilities, age, and past patterns of care.

Diabetes Care published a study demonstrating that predictive analytics models for healthcare can determine a five to ten years life expectancy for older adults with diabetes, enabling doctors to craft customized treatment plans.

Identifying public health trends with predictive analytics

Additionally, predictive analytics in the healthcare industry helps identify potential population health trends. The Lancet Public Health journal published a study that used predictive analytics to uncover health trends. This study found that unless alcohol consumption patterns will change in the US, alcohol-related liver diseases will rise, causing deaths.

Detecting disease outbreaks with predictive healthcare analytics

When speaking of outbreak predictions, one can’t help but ask, “could predictive analytics have foreseen the COVID-19 pandemic?” The answer is yes. BlueDot, a Canadian company building predictive analytics and AI solutions, issued a warning about the rise of unfamiliar pneumonia cases in Wuhan on December 30, 2019. Only nine days later, the World Health Organization released an official statement declaring the novel coronavirus emergence.

To this day, predictive analytics in healthcare helps authorities and ordinary people to have a view on the pandemic. For instance, a research team at the University of Texas Health Science Center at Houston (UTHealth) developed a predictive analytics-based tool for COVID-19 tracking. This program produces and maintains a public health dashboard that shows current and future trends of the virus.

Shreela Sharma, Ph.D., and a member of the UTHealth research team, outlined the benefits of this predictive analytics tool: “The dashboard identifies the current hot spots, predicts future spread both at the state and county level, and houses relevant public health resources. It can effectively inform decision-makers across Texas to help mitigate the spread of COVID-19.”

## Probability Examples

### Grocery Store Staffing

Grocery stores often use probability to determine how many workers they should schedule to work on a given day.

For example, a grocery store may use a model that tells them there is a 75% chance that they’ll have more than 800 customers come into the store on a given day.

Based on this probability, they’ll schedule a certain amount of workers to be at the store on that day to handle that many customers.

### Natural Disasters

The environmental departments of countries often use probability to determine how likely it is that a natural disaster like a hurricane, tornado, earthquake, etc. will strike the country in a given year.

If the probability is quite high, then the department will make decisions about housing, resource allocation, etc. that will minimize the effects done by the natural disaster.

## Sales Forecasting

Many retail companies use probability to predict the chances that they’ll sell a certain amount of goods in a given day, week, or month.

This allows the companies to predict how much inventory they’ll need. For example, a company might use a forecasting model that tells them the probability of selling at least 100 products on a certain day is 90%.

This means they’ll need to make sure they have at least 100 products on hand to sell (or preferably more) so they don’t run out.