# Quickly Calculate Descriptive Stats for a Dataset

This notebook provides an example of how to calculate a quick set of descriptive statistics for a dataset.<br>
<br>

**Reference Books:**<br>
Dive Into Data Science by Bradford Tuckfield <br>
Website: https://nostarch.com/dive-data-science <br>
<br>

Pandas 1.x Cookbook, 2nd Edition, by M. Harrison and T. Petrou<br>
Website: https://www.packtpub.com/product/pandas-1x-cookbook-second-edition/9781839213106<br>
<br>

**Dataset Provider:**<br>
Capital Bikeshare <br>
The data was compiled and augmented by Hadi Fanaee-T and Joao Gama and posted online by Mark Kaghazgarian.<br>
Website: https://ride.capitalbikeshare.com/system-data <br>
<br>
**Python IDE:**<br>
Google Colab<br>
https://colab.research.google.com<br>


## 1.0 Import typical data science libraries into python <br>
<br>
NumPy brings the computational power of languages like C and Fortran to Python, a language much easier to learn and use, for scientific computing.<br>
Website: https://numpy.org  <br>
<br>

Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.<br>
Website: https://pandas.pydata.org <br>
<br>

MatPlotLib is a comprehensive library for creating static, animated, and interactive visualizations in Python.<br>
Website: https://matplotlib.org <br>
<br>
Run the codeblock below


In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## 2.0 Import data from Desktop to Google Colab
<br>
You may need to follow steps b - d each time you use the notebook.<br>
<br>
a. Go to the author's website and download the hour.csv data file to your desktop<br>
https://bradfordtuckfield.com/hour.csv<br>
b. Click on the file folder icon in Google Colab (center left of screen)<br>
c. Drag the hour.csv file from your desktop into Colab and drop it next to the sample drive folder.<br>
d. Run the codeblock below in order to see the first five lines of the dataset<br>


In [4]:
df = pd.read_csv('hour.csv')
df.head()

Unnamed: 0,instant,dteday,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,count
0,1,2011-01-01,1,0,1,0,0,6,0,1,0.24,0.2879,0.81,0.0,3,13,16
1,2,2011-01-01,1,0,1,1,0,6,0,1,0.22,0.2727,0.8,0.0,8,32,40
2,3,2011-01-01,1,0,1,2,0,6,0,1,0.22,0.2727,0.8,0.0,5,27,32
3,4,2011-01-01,1,0,1,3,0,6,0,1,0.24,0.2879,0.75,0.0,3,10,13
4,5,2011-01-01,1,0,1,4,0,6,0,1,0.24,0.2879,0.75,0.0,0,1,1


Another way to examine the first five lines of the dataset

In [7]:
print(df.head())

   instant      dteday  season  yr  mnth  hr  holiday  weekday  workingday  \
0        1  2011-01-01       1   0     1   0        0        6           0   
1        2  2011-01-01       1   0     1   1        0        6           0   
2        3  2011-01-01       1   0     1   2        0        6           0   
3        4  2011-01-01       1   0     1   3        0        6           0   
4        5  2011-01-01       1   0     1   4        0        6           0   

   weathersit  temp   atemp   hum  windspeed  casual  registered  count  
0           1  0.24  0.2879  0.81        0.0       3          13     16  
1           1  0.22  0.2727  0.80        0.0       8          32     40  
2           1  0.22  0.2727  0.80        0.0       5          27     32  
3           1  0.24  0.2879  0.75        0.0       3          10     13  
4           1  0.24  0.2879  0.75        0.0       0           1      1  


## 3.0 Manually Calculate Summary Statistics/Descriptive Statistics

Two different ways to look at the mean value of the count column in the dataset

In [8]:
print(df['count'].mean())

189.46308763450142


In [10]:
round(df['count'].mean(),2)

189.46

Different ways of manually looking at the mean, standard deviation, min, and max values for different columns (column, registered) in the dataset

In [11]:
print(df['count'].median())
print(df['count'].std()) 
print(df['registered'].min()) 
print(df['registered'].max())

142.0
181.38759909186473
0
886


In [18]:
df['count'].median()

142.0

In [19]:
df['count'].std()

181.38759909186473

In [20]:
df['registered'].min()

0

In [21]:
df['registered'].max()

886

In [16]:
print(df['count'].median(), ", median value of the count column")
print(round(df['count'].std(),2), ", standard deviation of the count column") 
print(df['registered'].min(), ", minimum value of the registered column") 
print(df['registered'].max(), ", maximum value of the registered colunn")

142.0 , median value of the count column
181.39 , standard deviation of the count column
0 , minimum value of the registered column
886 , maximum value of the registered colunn


## 4.0 Quickly Calculate Summary Statistics/Descriptive Statistics

In [22]:
print(df.describe())

          instant        season            yr          mnth            hr  \
count  17379.0000  17379.000000  17379.000000  17379.000000  17379.000000   
mean    8690.0000      2.501640      0.502561      6.537775     11.546752   
std     5017.0295      1.106918      0.500008      3.438776      6.914405   
min        1.0000      1.000000      0.000000      1.000000      0.000000   
25%     4345.5000      2.000000      0.000000      4.000000      6.000000   
50%     8690.0000      3.000000      1.000000      7.000000     12.000000   
75%    13034.5000      3.000000      1.000000     10.000000     18.000000   
max    17379.0000      4.000000      1.000000     12.000000     23.000000   

            holiday       weekday    workingday    weathersit          temp  \
count  17379.000000  17379.000000  17379.000000  17379.000000  17379.000000   
mean       0.028770      3.003683      0.682721      1.425283      0.496987   
std        0.167165      2.005771      0.465431      0.639357      0.

In [23]:
df.describe()

Unnamed: 0,instant,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,count
count,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0
mean,8690.0,2.50164,0.502561,6.537775,11.546752,0.02877,3.003683,0.682721,1.425283,0.496987,0.475775,0.627229,0.190098,35.676218,153.786869,189.463088
std,5017.0295,1.106918,0.500008,3.438776,6.914405,0.167165,2.005771,0.465431,0.639357,0.192556,0.17185,0.19293,0.12234,49.30503,151.357286,181.387599
min,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.02,0.0,0.0,0.0,0.0,0.0,1.0
25%,4345.5,2.0,0.0,4.0,6.0,0.0,1.0,0.0,1.0,0.34,0.3333,0.48,0.1045,4.0,34.0,40.0
50%,8690.0,3.0,1.0,7.0,12.0,0.0,3.0,1.0,1.0,0.5,0.4848,0.63,0.194,17.0,115.0,142.0
75%,13034.5,3.0,1.0,10.0,18.0,0.0,5.0,1.0,2.0,0.66,0.6212,0.78,0.2537,48.0,220.0,281.0
max,17379.0,4.0,1.0,12.0,23.0,1.0,6.0,1.0,4.0,1.0,1.0,1.0,0.8507,367.0,886.0,977.0
