### What is Statistics?

Statistics is a branch of mathematics that involves the collection, analysis, interpretation, presentation, and organization of numerical data. It is a way of quantifying and understanding the world around us by using data and statistical methods.<br>

Statistics is used in a variety of fields, including business, finance, healthcare, social sciences, natural sciences, and engineering. Some of the key concepts in statistics include probability, hypothesis testing, regression analysis, and sampling techniques.<br>

Statisticians use various tools and techniques to analyze and interpret data, such as graphs, tables, summary statistics, and statistical models. They also use computer software and programming languages to automate and streamline data analysis processes.<br>

Statistics plays an important role in decision-making processes, as it helps to identify patterns and trends in data, make predictions and forecasts, and draw conclusions based on empirical evidence.

Statistics is a vast field that can be broadly divided into two types:

- Descriptive Statistics:

Descriptive statistics involve methods of organizing, summarizing, and presenting data in a meaningful way. These methods include measures of central tendency (such as mean, median, and mode) and measures of dispersion (such as range, variance, and standard deviation). Descriptive statistics are used to describe the main features of a dataset, such as its distribution, shape, and variability.

- Inferential Statistics:

Inferential statistics involve methods of using sample data to make inferences or draw conclusions about a larger population. These methods include hypothesis testing, confidence intervals, and regression analysis. Inferential statistics are used to make predictions or generalizations about a population based on a sample of data.

### What is population and sample in statistics with example?

In statistics, a population refers to the entire group of individuals, objects, or events that we are interested in studying. For example, if we want to study the heights of all adults in the United States, then the population would be all adult residents of the United States.

On the other hand, a sample is a smaller subset of the population that is selected for study. The purpose of selecting a sample is to make inferences about the population as a whole. For example, if we want to estimate the average height of all adults in the United States, we could select a random sample of adult residents and measure their heights. We could then use the data from the sample to make an inference about the average height of the entire population.

In summary, the population refers to the entire group of individuals, objects, or events that we want to study, while a sample is a smaller subset of the population that is selected for study.

Example: Suppose we want to study the average age of students in a university. The population in this case would be all the students in the university. However, it may not be practical or feasible to collect data from all students in the university. Therefore, we could select a sample of, say, 500 students, and collect data on their ages. We could then use the data from the sample to estimate the average age of all students in the university.

### What are the types of data in Statistics?

In statistics, there are two main types of data:

Quantitative data: This type of data consists of numerical values that can be measured or counted. It can be further divided into two sub-types:

a. Continuous data: This type of data can take on any numerical value within a given range. For example, the height of a person, the weight of an object, or the temperature of a room.

b. Discrete data: This type of data can only take on specific numerical values. For example, the number of children in a family, the number of cars in a parking lot, or the number of coins in a piggy bank.

Qualitative data: This type of data consists of non-numerical values that cannot be measured or counted. It can be further divided into two sub-types:

a. Nominal data: This type of data consists of categories that are not ordered. For example, the color of a car, the type of animal in a zoo, or the gender of a person.

b. Ordinal data: This type of data consists of categories that are ordered. For example, the education level of a person, the rating of a movie, or the size of a T-shirt.

It is important to identify the type of data in a study, as this can help determine which statistical methods and techniques are appropriate for analyzing the data.

### What is Interval & Ratio in statistics?

Interval and ratio are two subtypes of quantitative data in statistics.

Interval data is quantitative data that has numerical values where the difference between any two values is meaningful. In other words, interval data has a scale that allows for the comparison of the sizes of differences between values. An example of interval data is temperature, where the difference between 20 and 30 degrees Celsius is the same as the difference between 30 and 40 degrees Celsius.

Ratio data, on the other hand, is a type of quantitative data that has numerical values where the ratio between any two values is meaningful. Ratio data has a scale that allows for the comparison of the sizes of differences and the ratios between values. An example of ratio data is weight, where 10 kg is twice as heavy as 5 kg.

The main difference between interval and ratio data is the presence or absence of a true zero point. Ratio data has a true zero point, which represents the complete absence of the attribute being measured. Interval data, on the other hand, does not have a true zero point, meaning that zero does not indicate the complete absence of the attribute being measured.

In summary, interval and ratio data are two types of quantitative data in statistics. Interval data has a meaningful scale where the difference between values is meaningful, while ratio data has a meaningful scale where the ratio between values is meaningful and has a true zero point.

Some examples of interval and ratio data:

Interval data:

 - Temperature (in Celsius or Fahrenheit)
 - IQ score
 - Time (in seconds, minutes, or hours)
 - pH level
 - Calendar dates (days, months, years)
 - Longitude and latitude coordinates

Ratio data:

 - Height
 - Weight
 - Length
 - Distance
 - Time (in seconds or other units)
 - Volume
 - Age
 - Income
 - Number of items or units
 - Blood pressure
 - Speed

It is important to note that the same variable can be considered interval or ratio depending on the context. For example, when measuring length, it can be considered interval data if we are using a measurement scale where zero is arbitrary (e.g. Celsius temperature scale), but it is ratio data if we are using a measurement scale where zero is absolute and represents the complete absence of the attribute being measured (e.g. weight in kilograms).

### What are the Measures of central tendency in statistics with their formulas, examples & limitations?

Measures of central tendency are used to describe the center of a distribution or dataset. The three most common measures of central tendency are the mean, median, and mode.

 - Mean: The mean is the most commonly used measure of central tendency, and it is calculated by adding up all the values in a dataset and dividing by the total number of values. 
 
 The formula for the mean is:

mean = (sum of all values) / (number of values)

Example: Consider the following dataset: 5, 8, 12, 16, 20. The mean is calculated as: (5 + 8 + 12 + 16 + 20) / 5 = 12.2.

 - Limitations: The mean is sensitive to extreme values or outliers, which can distort its value.

 - Median: The median is the middle value in a dataset when the values are arranged in order from lowest to highest. If there is an even number of values, then the median is the average of the two middle values. 
 
 The formula for the median is:

median = middle value or (average of two middle values)

Example: Consider the following dataset: 5, 8, 12, 16, 20. The median is 12.

 - Limitations: The median is less sensitive to extreme values or outliers than the mean, but it may not accurately represent the center of the distribution if there are extreme values present.

 - Mode: The mode is the value that appears most frequently in a dataset. A dataset can have one mode, more than one mode, or no mode at all. 
 
 The formula for the mode is:

mode = value(s) with the highest frequency

Example: Consider the following dataset: 5, 8, 12, 16, 20, 20. The mode is 20.

 - Limitations: The mode may not be unique, or it may not exist if no value appears more than once in the dataset.

Limitations of measures of central tendency: It is important to note that while measures of central tendency provide useful information about the center of a distribution, they do not provide any information about the spread or variability of the data. Additionally, measures of central tendency can be influenced by outliers or skewed distributions, so it is important to consider other measures, such as the range or standard deviation, to gain a more complete understanding of the data.

### How to calculate mean, median and mode using python with examples?

In [6]:
import numpy as np
from scipy import stats

# create a dataset
data = [5, 8, 12, 16, 20]

# calculate the mean
mean = np.mean(data)

# print the mean
print("Mean:", mean)


# calculate the median
median = np.median(data)

# print the median
print("Median:", median)

Mean: 12.2
Median: 12.0


In [10]:
# create a dataset
data = [5, 8, 12, 16, 20, 20]

# calculate the mode
mode = stats.mode(data, keepdims=True)

# print the mode
print("Mode:", mode.mode)

# Note that the mode() function returns an object that includes both the mode(s) and the count(s) of the mode(s), so we need to access the mode attribute to get the mode value(s) themselves.

Mode: [20]
