# What is Statistics?
- Statistics is a scientific method for collecting, organizing, summarizing, analyzing, and presenting sample data as well as drawing conclusions about the population. 

## What is population? 
- All Individuals or items under investigation are known as population.

## What is Sample?
- A Small but representative part of the population which has been drawn or selected is called a sample.

## Write down the types of Data.
Link: https://intellspot.com/wp-content/uploads/2018/08/Types-of-Data-Infographic.png<img src="https://intellspot.com/wp-content/uploads/2018/08/Types-of-Data-Infographic.png" />

## What is Mean?
- The mean, also known as the average, is a measure of central tendency in statistics. It is calculated by summing up all the values in a dataset and then dividing the sum by the total number of values.

- Mean = $ \frac{\text{Number of Values}} {\text{Sum of all values}} = \frac{\sum x_i}{n}$

- CSV Data Link: https://github.com/User-zwj/Pandas-Practice/blob/master/survey_results_public.csv

In [1]:
import pandas as pd
import numpy as np

pd.set_option("display.max_columns", None)
df = pd.read_csv("https://raw.githubusercontent.com/User-zwj/Pandas-Practice/master/survey_results_public.csv")

salary = df['ConvertedComp']
mean = salary.mean()
print("Mean:", round(mean, 2))

Mean: 94466.81


## What is Median?
- The median is the middle value in a sorted list of numbers, either in ascending or descending order. It is a measure of central tendency that separates the higher half from the lower half of a data.
- The median is particularly useful because it's not affected by extreme values (outliers).

In [2]:
median = salary.median()
print("Median:", median)

Median: 59030.0


## What is Mode?
- The mode is the value that appears most frequently in a dataset. 

In [3]:
mode = df['Age'].mode()[0]
print("Mode:", mode )

Mode: 22.0


## What is Weighted Mean?
- The weighted mean is a type of mean that is calculated by multiplying the weight associated with a particular event or outcome with its associated quantitative outcome and then summing all the products together.
- $ \text{Weighted Mean} = \frac{\sum_{i=1}^{n} (x_i \times w_i)}{\sum_{i=1}^{n} w_i} $
- Weighted Mean CGPA = $ \frac{(3.5 * 3) + (3.8 * 4) + (4.0 * 3) + (3.2 * 3) + (3.9 * 4)} {(3 + 4 + 3 + 3 + 4)} = \frac{63.9}{ 17} = 3.76 $

In [4]:
Json_data = {  
                'CGPA': [3.5, 3.8, 4.0, 3.2, 3.9], # x_i value 
                'Credit': [3, 4, 3, 3, 4] # w_i value  
}

cgpa_data = pd.DataFrame(Json_data)
weighted_mean = (cgpa_data['CGPA'] * cgpa_data['Credit']).sum() / cgpa_data['Credit'].sum()
print("Weighted Mean (CGPA):", weighted_mean)

Weighted Mean (CGPA): 3.7


## What is Trimmed Mean?
- The trimmed mean, also known as the truncated mean, is a statistical measure of central tendency that involves calculating the mean after discarding a certain percentage of the highest and lowest values in a dataset.

In [5]:
# Define the percentage of data to trim from each end
trim_percentage = 20 # Trim 20% from each end

# Sort the data
sorted_salary = salary.sort_values()

# Calculate the number of data points to trim from each end
trim_count = int(len(sorted_salary) * (trim_percentage/100))
print("Trim Count is:", trim_count)

# Trim the data by removing the specified number of points from each end
trimmed_salary_list = sorted_salary.iloc[trim_count:-trim_count]

# Calculate the trimmed salary meam
trimmed_salary_mean = trimmed_salary_list.mean()
print("Trimmed Salay Mean: ", round(trimmed_salary_mean, 2))

Trim Count is: 5
Trimmed Salay Mean:  130424.73
