#      **2.Estimates of Location**

# **2.1.Definition of Location**
A fundamental task in many statistical analyses is to estimate a location parameter for the distribution; i.e., to find a typical or central value that best describes the data.


### 1.   **Mean** 

the mean is the sum of the data points divided by the number of data points. That is,

    Y¯=∑i=1NYi/N

  The mean is that value that is most commonly referred to as the average. We will use the term average as a synonym  for the mean and the term typical value to refer generically to measures of location.

### 2. **Median** 

the median is the value of the point which has half the data smaller than that point and half the data larger than that point. That is, if X1, X2, ... ,XN is a random sample sorted from smallest value to largest value, then the median is defined as:

    Ỹ =Y(N+1)/2if N is odd

    Ỹ =(YN/2+Y(N/2)+1)/2if N is even

### 3. **Mode** 

the mode is the value of the random sample that occurs with the greatest frequency. It is not necessarily unique. The mode is typically used in a qualitative fashion. For example, there may be a single dominant hump in the data perhaps two or more smaller humps in the data. This is usually evident from a histogram of the data.

# **2.2.Why Different Measures?**

### 1. **Normal Distribution**

The first histogram is a sample from a normal distribution. The mean is 0.005, the median is -0.010, and the mode is -0.144 (the mode is computed as the midpoint of the histogram interval with the highest peak).

### 2. **Exponential Distribution**

The second histogram is a sample from an exponential distribution. The mean is 1.001, the median is 0.684, and the mode is 0.254 (the mode is computed as the midpoint of the histogram interval with the highest peak).

### 3. **Cauchy Distribution** 

The third histogram is a sample from a Cauchy distribution. The mean is 3.70, the median is -0.016, and the mode is -0.362 (the mode is computed as the midpoint of the histogram interval with the highest peak).

### 4. **Lognormal Distribution**	

The fourth histogram is a sample from a lognormal distribution. The mean is 1.677, the median is 0.989, and the mode is 0.680 (the mode is computed as the midpoint of the histogram interval with the highest peak).

# **2.3.Robustness**

There are various alternatives to the mean and median for measuring location. These alternatives were developed to address non-normal data since the mean is an optimal estimator if in fact your data are normal.
Tukey and Mosteller defined two types of robustness where robustness is a lack of susceptibility to the effects of nonnormality.

### 1. **Robustness of validity** 

means that the confidence intervals for the population location have a 95% chance of covering the population location regardless of what the underlying distribution is.

### 2. **Robustness of efficiency** 

refers to high effectiveness in the face of non-normal tails. That is, confidence intervals for the population location tend to be almost as narrow as the best that could be done if we knew the true shape of the distributuion.

# **2.4.Alternative Measures of Location**

A few of the more common alternative location measures are:

### 1. **Mid-Mean** 

computes a mean using the data between the 25th and 75th percentiles.

### 2. **Trimmed Mean** 

similar to the mid-mean except different percentile values are used. A common choice is to trim 5% of the points in both the lower and upper tails, i.e., calculate the mean for data between the 5th and 95th percentiles.

### 3. **Winsorized Mean** 

similar to the trimmed mean. However, instead of trimming the points, they are set to the lowest (or highest) value. For example, all data below the 5th percentile are set equal to the value of the 5th percentile and all data greater than the 95th percentile are set equal to the 95th percentile.

# EXCERCISE
Given Iris Dataset includes three iris species with 50 samples each as well as some properties about each flower.The columns in this dataset are:
Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species .


### 1) Find mean of each class Sepal Width

### 2) Find median of each class Petal Length

### 3) Find mode of Petal Width of each class

In [None]:
#Loading Libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [None]:
data=pd.read_csv("/content/drive/MyDrive/Stastics-1/Iris.csv")
data.head()

In [None]:
data.rename(columns={'SepalLengthCm':'SL','SepalWidthCm':'SW','PetalLengthCm':'PL','PetalWidthCm':'PW'},inplace=True)
iris_setosa=data[data['Species']=='Iris-setosa']
iris_versicolor=data[data['Species']=='Iris-versicolor']
iris_virginica=data[data['Species']=='Iris-virginica']

## Ques-2
If the average man is 175 cm tall with a standard deviation of 6 cm, what is the
probability that a man selected at random will be 183 cm tall?


## QUES-3

If cans are assumed to have a standard deviation of 4 g, what does the average
weight need to be in order to ensure that 99 % of all cans have a weight of at least
250 g?

## QUES-4
If the average man is 175 cm tall with a standard deviation of 6 cm, and the
average woman is 168 cm tall with a standard deviation of 3 cm, what is the
probability that a randomly selected man will be shorter than a randomly selected
woman?


## QUES-5
Generate and plot the Probability Density Function (PDF) of a normal distribution, with a mean of 5 and a standard deviation of 3.

## QUES-6

Generate 1000 random data from this distribution.

###  1) Calculate the standard error of the mean of these data. 
(Correct answer: ca.0.096.)

### 2)Plot the histogram of these data.

### 3)From the PDF, calculate the interval containing 95 % of these data. 
(Correct answer: [ 0.88, 10.88].)