Pandas offers a variety of functions for statistical analysis, allowing you to easily perform descriptive and inferential statistics on datasets. Here’s a breakdown of some common statistical functions available in Pandas:

### 1. **Descriptive Statistics**
These functions help summarize data and describe the dataset’s features.

- **`mean()`**: Calculates the mean (average) of a Series or DataFrame.
- **`median()`**: Computes the median (middle value) of a Series or DataFrame.
- **`mode()`**: Returns the mode (most frequent value) of a Series or DataFrame.
- **`sum()`**: Returns the sum of values along an axis.
- **`count()`**: Returns the number of non-null values.
- **`std()`**: Calculates the standard deviation of a Series or DataFrame.
- **`var()`**: Computes variance.
- **`min()` / `max()`**: Returns the minimum or maximum value.
- **`quantile()`**: Calculates the quantiles (e.g., 0.25 for 25th percentile).
- **`cumsum()`**: Cumulative sum of values.
- **`cumprod()`**: Cumulative product of values.

### 2. **Correlations and Covariance**
Pandas provides functions for measuring relationships between variables.

- **`corr()`**: Computes pairwise correlation of columns, using methods like Pearson, Kendall, and Spearman.
- **`cov()`**: Returns covariance between Series or columns in a DataFrame.
- **`corrwith()`**: Computes the correlation between two DataFrames or between a DataFrame and Series.

### 3. **Rank and Sorting**
Pandas allows you to rank and sort data.

- **`rank()`**: Computes rank of each value, optionally using different ranking methods.
- **`sort_values()`**: Sorts values in ascending or descending order.
- **`sort_index()`**: Sorts the index of a DataFrame.

### 4. **Rolling and Expanding Windows**
These are used to perform calculations over a rolling or expanding window of data.

- **`rolling()`**: Provides rolling window calculations like mean, sum, etc.
- **`expanding()`**: Provides expanding window calculations that include all values up to a point.
- **`ewm()`**: Exponentially weighted window functions for smoothing and forecasting.

### 5. **Other Functions**
- **`describe()`**: Generates a summary of statistics, including count, mean, std, min, max, and quartiles.
- **`skew()`**: Measures the skewness (asymmetry) of the data distribution.
- **`kurt()`**: Measures the kurtosis (tailedness) of the data distribution.
- **`pct_change()`**: Computes percentage change between consecutive elements.
- **`mad()`**: Returns the mean absolute deviation of values.

These functions are usually applied to Series or DataFrame objects, making Pandas a powerful library for statistical analysis in Python.

In [1]:
import pandas as pd

salaries = pd.read_csv('salaries.csv')

## 1. Descriptive Statistics
### mean()
Calculate the average of each column.

In [2]:
mean_values = salaries.mean()
mean_values

id             50.5000
age            32.2900
income      66354.8610
expenses    27560.8275
savings     22816.7110
dtype: float64

### median()
Find the median of each column.

In [4]:
median_values = salaries.median()
median_values

id             50.500
age            32.000
income      65000.375
expenses    27000.000
savings     22500.500
dtype: float64

### mode()
Find the most frequent value in each column.

In [5]:
mode_values = salaries.mode()
mode_values


Unnamed: 0,id,age,income,expenses,savings
0,1,29.0,78500.5,26500.0,19500.0
1,2,30.0,,29000.5,20500.0
2,3,31.0,,30000.0,21000.0
3,4,,,33500.5,23500.0
4,5,,,,
...,...,...,...,...,...
95,96,,,,
96,97,,,,
97,98,,,,
98,99,,,,


### sum()
Calculate the sum of all values in each column.

In [6]:
total_sum = salaries.sum()
total_sum

id             5050.00
age            3229.00
income      6635486.10
expenses    2756082.75
savings     2281671.10
dtype: float64

### count()
Count the number of non-null values in each column.

In [7]:
non_null_count = salaries.count()
non_null_count

id          100
age         100
income      100
expenses    100
savings     100
dtype: int64

### std()
Calculate the standard deviation of each column.

In [8]:
std_dev = salaries.std()
std_dev

id             29.011492
age             5.031808
income      10630.116550
expenses     5082.489939
savings      5061.465243
dtype: float64

### var()
Calculate the variance of each column.

In [9]:
variance = salaries.var()
variance

id          8.416667e+02
age         2.531909e+01
income      1.129994e+08
expenses    2.583170e+07
savings     2.561843e+07
dtype: float64

### min() and max()
Find the minimum and maximum values in each column.

In [11]:
min_values = salaries.min()
min_values

id              1.00
age            22.00
income      45000.00
expenses    17000.25
savings     11000.00
dtype: float64

In [12]:
max_values = salaries.max()
max_values

id            100.00
age            42.00
income      83500.50
expenses    37000.50
savings     31500.75
dtype: float64

### quantile()
Calculate the 25th percentile (1st quartile).

In [13]:
quantile_25 = salaries.quantile(0.25)
quantile_25

id             25.7500
age            28.7500
income      59875.1875
expenses    24800.5625
savings     20037.6250
Name: 0.25, dtype: float64

### cumsum()
Compute the cumulative sum of each column.

In [14]:
cumulative_sum = salaries.cumsum()
cumulative_sum

Unnamed: 0,id,age,income,expenses,savings
0,1,25,50500.75,20050.25,15200.50
1,3,55,110901.60,45251.00,35250.50
2,6,77,159001.60,63251.50,47300.60
3,10,117,240001.85,98551.50,77551.10
4,15,152,315202.60,128651.50,102751.10
...,...,...,...,...,...
95,4656,3110,6387485.10,2655082.25,2199170.60
96,4753,3141,6450985.60,2681082.25,2221170.85
97,4851,3176,6525485.85,2711082.25,2246170.85
98,4950,3204,6585485.85,2736082.25,2266171.10


### cumprod()
Compute the cumulative product of each column. (**Investigate warning**)

In [16]:
cumulative_prod = salaries.cumprod()
cumulative_prod

  return bound(*args, **kwds)


Unnamed: 0,id,age,income,expenses,savings
0,1,25,5.050075e+04,2.005025e+04,1.520050e+04
1,2,750,3.050288e+09,5.052813e+08,3.047700e+08
2,6,16500,1.467189e+14,9.095317e+12,3.672509e+12
3,24,660000,1.188426e+19,3.210647e+17,1.110952e+17
4,120,23100000,8.937056e+23,9.664047e+21,2.799600e+21
...,...,...,...,...,...
95,0,0,inf,inf,inf
96,0,0,inf,inf,inf
97,0,0,inf,inf,inf
98,0,0,inf,inf,inf


## 2. Correlations and Covariance

### corr()
Calculate the correlation between columns.

In [17]:
correlation = salaries.corr()
correlation

Unnamed: 0,id,age,income,expenses,savings
id,1.0,0.034355,0.039697,0.053692,0.073731
age,0.034355,1.0,0.974423,0.977348,0.967789
income,0.039697,0.974423,1.0,0.988673,0.983218
expenses,0.053692,0.977348,0.988673,1.0,0.994247
savings,0.073731,0.967789,0.983218,0.994247,1.0


### cov()
Calculate the covariance between columns.

In [19]:
covariance = salaries.cov()
covariance

Unnamed: 0,id,age,income,expenses,savings
id,841.666667,5.015152,12242.47,7916.932,10826.65
age,5.015152,25.319091,52120.61,24994.8,24647.95
income,12242.469192,52120.612939,112999400.0,53415470.0,52901010.0
expenses,7916.931566,24994.804066,53415470.0,25831700.0,25576850.0
savings,10826.651515,24647.948293,52901010.0,25576850.0,25618430.0


### corrwith()
Compute correlation with another Series or DataFrame. For example, let’s calculate correlation of all columns with income.

In [22]:
correlation_with_income = salaries.corrwith(salaries['income'])
correlation_with_income

id          0.039697
age         0.974423
income      1.000000
expenses    0.988673
savings     0.983218
dtype: float64

## 3. Rank and Sorting
### rank()
Rank the values of the income column.

In [23]:
income_rank = salaries['income'].rank()
income_rank

0     12.0
1     29.0
2      5.0
3     96.0
4     72.0
      ... 
95    63.0
96    46.0
97    70.0
98    26.0
99    10.0
Name: income, Length: 100, dtype: float64

### sort_values()
Sort the rows by income in ascending order.

In [24]:
sorted_by_income = salaries.sort_values(by='income')
sorted_by_income

Unnamed: 0,id,age,income,expenses,savings
56,57,24,45000.00,17500.00,11500.00
10,11,24,45000.00,17000.25,11000.00
78,79,24,45000.75,18000.00,12000.00
30,31,24,46000.00,17500.25,12000.00
2,3,22,48100.00,18000.50,12050.10
...,...,...,...,...,...
3,4,40,81000.25,35300.00,30250.50
16,17,39,81250.75,36000.50,30500.00
76,77,41,82500.00,36000.25,31500.00
20,21,42,82500.75,37000.25,31500.50


### sort_index()
Sort the DataFrame by the index.

In [25]:
sorted_by_index = salaries.sort_index()
sorted_by_index

Unnamed: 0,id,age,income,expenses,savings
0,1,25,50500.75,20050.25,15200.50
1,2,30,60400.85,25200.75,20050.00
2,3,22,48100.00,18000.50,12050.10
3,4,40,81000.25,35300.00,30250.50
4,5,35,75200.75,30100.00,25200.00
...,...,...,...,...,...
95,96,34,72500.00,29000.50,24000.75
96,97,31,63500.50,26000.00,22000.25
97,98,35,74500.25,30000.00,25000.00
98,99,28,60000.00,25000.00,20000.25


## 4. Rolling and Expanding Windows
### rolling()
Compute the rolling mean of the income column with a window size of 2.

In [26]:
rolling_mean = salaries['income'].rolling(window=2).mean()
rolling_mean

0           NaN
1     55450.800
2     54250.425
3     64550.125
4     78100.500
        ...    
95    66250.250
96    68000.250
97    69000.375
98    67250.125
99    55000.125
Name: income, Length: 100, dtype: float64

### expanding()
Compute the expanding mean of the expenses column.

In [27]:
expanding_mean = salaries['expenses'].expanding().mean()
expanding_mean

0     20050.250000
1     22625.500000
2     21083.833333
3     24637.875000
4     25730.300000
          ...     
95    27657.106771
96    27640.023196
97    27664.104592
98    27637.194444
99    27560.827500
Name: expenses, Length: 100, dtype: float64

### ewm()
Compute the exponentially weighted mean of the savings column.

In [28]:
ewm_mean = salaries['savings'].ewm(span=2).mean()
ewm_mean

0     15200.500000
1     18837.625000
2     14138.569231
3     25014.122500
4     25138.552893
          ...     
95    23806.507479
96    22602.335826
97    24200.778609
98    21400.426203
99    17466.808734
Name: savings, Length: 100, dtype: float64

## 5. Other Functions
### describe()
Generate descriptive statistics summary.

In [29]:
description = salaries.describe()
description

Unnamed: 0,id,age,income,expenses,savings
count,100.0,100.0,100.0,100.0,100.0
mean,50.5,32.29,66354.861,27560.8275,22816.711
std,29.011492,5.031808,10630.11655,5082.489939,5061.465243
min,1.0,22.0,45000.0,17000.25,11000.0
25%,25.75,28.75,59875.1875,24800.5625,20037.625
50%,50.5,32.0,65000.375,27000.0,22500.5
75%,75.25,36.0,77125.625,32000.5625,27000.0625
max,100.0,42.0,83500.5,37000.5,31500.75


### skew()
Measure the skewness of each column.

In [30]:
skewness = salaries.skew()
skewness

id          0.000000
age         0.149981
income     -0.176893
expenses   -0.046982
savings    -0.262731
dtype: float64

### kurt()
Measure the kurtosis of each column.

In [31]:
kurtosis = salaries.kurt()
kurtosis

id         -1.200000
age        -0.903137
income     -0.965888
expenses   -0.638256
savings    -0.321100
dtype: float64

### pct_change()
Compute the percentage change between consecutive rows for the income column.

In [32]:
pct_change_income = salaries['income'].pct_change()
pct_change_income

0          NaN
1     0.196039
2    -0.203654
3     0.683997
4    -0.071599
        ...   
95    0.208323
96   -0.124131
97    0.173223
98   -0.194634
99   -0.166663
Name: income, Length: 100, dtype: float64