# Descriptive Statistics - Measures of Central Tendency and variability
Perform the following operations on any open source dataset (e.g., data.csv)
1. Provide summary statistics (mean, median, minimum, maximum, standard deviation) for a
dataset (age, income etc.) with numeric variables grouped by one of the qualitative
(categorical) variable. For example, if your categorical variable is age groups and quantitative
variable is income, then provide summary statistics of income grouped by the age groups.
Create a list that contains a numeric value for each response to the categorical variable.
2. Write a Python program to display some basic statistical details like percentile, mean,
standard deviation etc. of the species of ‘Iris-setosa’, ‘Iris-versicolor’ and ‘Iris-versicolor’ of
iris.csv dataset.
Provide the codes with outputs and explain everything that you do in this step.

In [7]:
import pandas as pd 
import numpy as np
import seaborn as sns

In [10]:
employee_data = {
    "emp_id" : range(1,101),
    "age" : np.random.randint(20,60,100),
    "no_of_projects" : np.random.randint(1,5,100),
    "salary" : np.random.normal(2000,500,100)
}

In [11]:
df = pd.DataFrame(employee_data)
df.head()

Unnamed: 0,emp_id,age,no_of_projects,salary
0,1,32,2,2158.987493
1,2,53,4,1074.432163
2,3,26,2,2723.474772
3,4,44,4,1945.300524
4,5,43,2,2466.368238


In [12]:
def categorize_age(age):
    if 20 <= age < 30:
        return '20-30'
    elif 30 <= age < 40:
        return '30-40'
    elif 40 <= age < 50:
        return '40-50'
    elif 50 <= age < 60:
        return '50-60'
    else:
        return 'Unknown'

In [13]:
df['AgeGroup'] = df['age'].apply(categorize_age)
df.head()

Unnamed: 0,emp_id,age,no_of_projects,salary,AgeGroup
0,1,32,2,2158.987493,30-40
1,2,53,4,1074.432163,50-60
2,3,26,2,2723.474772,20-30
3,4,44,4,1945.300524,40-50
4,5,43,2,2466.368238,40-50


In [14]:
statistics = df.groupby('AgeGroup')['salary'].describe()

In [15]:
print("summary statistics for salary grouped by the age group:\n", statistics)

summary statistics for salary grouped by the age group:
           count         mean         std          min          25%  \
AgeGroup                                                             
20-30      31.0  1979.824255  528.752128  1239.776534  1606.393230   
30-40      23.0  2052.768624  630.305541   835.051722  1661.635775   
40-50      26.0  1992.134763  449.395753  1009.722261  1739.623881   
50-60      20.0  2026.646332  550.093439  1074.432163  1637.363347   

                  50%          75%          max  
AgeGroup                                         
20-30     1798.994066  2331.419778  3339.175963  
30-40     2077.579727  2350.320144  3405.904034  
40-50     2071.657647  2316.250998  2616.905225  
50-60     1978.951397  2416.349376  2998.935227  


In [22]:
max_sal = df.groupby('AgeGroup')['salary'].max()
max_sal

AgeGroup
20-30    3339.175963
30-40    3405.904034
40-50    2616.905225
50-60    2998.935227
Name: salary, dtype: float64

In [23]:
min_sal = df.groupby('AgeGroup')['salary'].min()
min_sal

AgeGroup
20-30    1239.776534
30-40     835.051722
40-50    1009.722261
50-60    1074.432163
Name: salary, dtype: float64

In [24]:
range = max_sal - min_sal

In [25]:
range

AgeGroup
20-30    2099.399429
30-40    2570.852312
40-50    1607.182964
50-60    1924.503064
Name: salary, dtype: float64

In [29]:
df = pd.read_csv('iris.csv')
df.head()

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa
