Descriptive statistics is a branch of statistics that deals with summarizing, describing, and representing data. In data science, descriptive statistics is used to provide a quick summary of the characteristics of a dataset, such as measures of central tendency, measures of dispersion, and measures of distribution.

Some of the commonly used descriptive statistics in data science are:

Mean: It is the average value of a dataset.

Median: It is the middle value of a dataset when it is sorted in ascending or descending order.

Mode: It is the value that occurs most frequently in a dataset.

Range: It is the difference between the largest and smallest values in a dataset.

Variance: It is a measure of how spread out the data is from the mean.

Standard Deviation: It is the square root of the variance and gives a measure of the spread of the data.

Percentiles: It is the value below which a specified percentage of the data falls.

Skewness: It is a measure of the asymmetry of a dataset about its mean.

Kurtosis: It is a measure of the peakedness of a dataset about its mean.

These descriptive statistics provide valuable information about the data that can be used to make informed decisions and drive further analysis.

### Mean

In [1]:
import pandas as pd

# Create a sample dataset
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# Create a dataframe from the dataset
df = pd.DataFrame(data, columns=['Value'])

# Calculate the mean of the 'Value' column
mean = df['Value'].mean()

# Print the mean
print('Mean:', mean)


Mean: 5.5


### Median

In [2]:
import pandas as pd

# Create a sample dataset
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# Create a dataframe from the dataset
df = pd.DataFrame(data, columns=['Value'])

# Calculate the median of the 'Value' column
median = df['Value'].median()

# Print the median
print('Median:', median)

Median: 5.5


### Mode

In [5]:
import pandas as pd

# create sample data
data = [1, 2, 2, 3, 4, 4, 4, 5, 6]

# create pandas data frame from data
df = pd.DataFrame(data, columns=['Value'])

# calculate the mode
mode = df['Value'].mode()

# print the mode
print("Mode: ",mode[0])


Mode:  4


### Range

In [6]:
import pandas as pd

# create sample data
data = [1, 2, 2, 3, 4, 4, 4, 5, 6]

# create pandas data frame from data
df = pd.DataFrame(data, columns=['Value'])

# calculate the range
data_range = df['Value'].max() - df['Value'].min()

# print the range
print("The range of the data set is:", data_range)


The range of the data set is: 5


### Variance

In [7]:
import pandas as pd
import numpy as np

# create sample data
data = [1, 2, 2, 3, 4, 4, 4, 5, 6]

# create pandas data frame from data
df = pd.DataFrame(data, columns=['Value'])

# calculate the mean
mean = df['Value'].mean()

# calculate the deviations
deviations = df['Value'] - mean

# calculate the squared deviations
squared_deviations = deviations**2

# calculate the variance
variance = squared_deviations.sum() / (len(df) - 1)

# print the variance
print("The variance of the data set is:", variance)

The variance of the data set is: 2.5277777777777777


### Standard Deviation

In [8]:
import pandas as pd
data = [4, 5, 7, 3, 6, 2, 9, 8, 1, 10]
df = pd.DataFrame(data, columns=["Value"])

std = df['Value'].std()
print(std)


3.0276503540974917


### Percentiles 

In [9]:
import pandas as pd
data = [4, 5, 7, 3, 6, 2, 9, 8, 1, 10]
df = pd.DataFrame(data, columns=["Value"])

p25 = df['Value'].quantile(0.25)
p50 = df['Value'].quantile(0.50)
p75 = df['Value'].quantile(0.75)
print("25th percentile: ", p25)
print("50th percentile (median): ", p50)
print("75th percentile: ", p75)


25th percentile:  3.25
50th percentile (median):  5.5
75th percentile:  7.75


### Skewness 

In [10]:
import pandas as pd
import numpy as np

data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
df = pd.DataFrame(data, columns=['Values'])

skewness = df['Values'].skew()
print(skewness)


0.0


### Kurtosis 

In [11]:
import pandas as pd

data = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
df = pd.DataFrame(data, columns=['Value'])

kurtosis = df['Value'].kurtosis()
print(kurtosis)


-1.2000000000000002
