# Introduction to Statistics

__Purpose:__ The purpose of this lecture is to offer a brief overview of the field of Statistics which is paramount to performing Data Science. We will cover the most important introductory topics of Statistics starting with the type of statistics. 

__At the end of this lecture you will be able to:__
> 1. Understand and define Statistics as well as differentiate between Descriptive and Inferential Statistics 

## 1.1 Introduction to Statistics 

### 1.1.1 What is Statistics? 

__Overview:__ 

- __[Statistics](https://en.wikipedia.org/wiki/Statistics):__ Statistics is a branch of Mathematics that deals with collecting, organizing, and interpreting data 
- Statistics is used in many fields from Social Science, Finance, Healthcare, and Sports. 
- The goal of Statistics is to study a population and observe their characteristics. However, it is often difficult to obtain an entire population. Instead, we deal with a subset of the population known as a sample.
- Examples of areas you encounter in your day-to-day life which Statistics play a large role in: 
> 1. Census Data 
> 2. Sports Boxscores 
> 3. Weather Forecasts 
> 4. Political Campaigning 
> 5. Stock Market 

### 1.1.2 How is Statistics used in Data Science? 

__Overview:__ 
- Statistics is one of the main disciplines that make up the field of Data Science
- Statistics provides tools for Data Scientists such as: 
> 1. Determining how much of your result can be attributed to "Signal" and how much can be attributed to "Noise"
> 2. Analyzing the efficacy of a data set and its collection methods before analysis is performed on the data
> 3. Summarizing a data set in terms of descriptive statistics as well as plots and other metrics 
> 4. Testing a hypothesis one may have about the data (i.e. one variable influences the other, two groups are identical, etc.) 
> 5. Characterize data into one of the common distributions and then use this for prediction 

### 1.1.3 Statistics in Python: 

__Overview:__ 
- Python has a wide range of useful functions to perform Statistical routines. These functions are found in the following two Modules: 
> 1. __[`scipy.stats`](https://docs.scipy.org/doc/scipy-0.18.1/reference/stats.html):__ The `stats` module in the SciPy Package offers many Statistical functions such as mean, zscore, correlation as well as other sub-modules for Continuous Distributions, Multivariate Distributions, and Discrete Distributions
> 2. __[`numpy`](https://docs.scipy.org/doc/numpy-1.14.0/reference/routines.statistics.html):__ The `numpy` package itself offers many Statistical Functions such as Order Statistics, Averages and Variances, Correlating, and Histograms

__Helpful Points:__
1. It is not necessary to access a sub-package within the `numpy` package like we did with `numpy.linalg`. Instead, we can simply execute the function directly from NumPy: `np.func_name`

__Practice:__ Import the Statistics Modules in Python 

### Example 1 (Importing Statistics Packages):

In [None]:
import numpy as np
import pandas as pd
from scipy import stats
import seaborn as sns
import math 
import random
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
%whos

### Example 2 (Testing Statistics Packages):

In [None]:
stats.describe([1,2,3,4,5]) # using scipy

In [None]:
np.mean([1,2,3,4,5]) # using numpy 

### 1.1.3 Types of Statistics:

__Overview:__ 
- There are two different types of statistics: 
> 1. __[Descriptive Statistics](https://en.wikipedia.org/wiki/Descriptive_statistics):__ The purpose of Descriptive Statistics is to provide a summary of the data and its properties 
>> - Descriptive Statistics can take the form of simple metrics known as summary statistics (i.e. number of observations, minimum value, maximum value, mean, variance, etc.) or they can take the form of visualizations (i.e. histograms, scatterplots, pie charts, box plots, etc.) 
>> - Examples of Descriptive Statistics:<br>
>> >__a.__ 75th percentile of height of men in the United States<br> 
>> >__b.__ Mean Field Goal Percentage of a professional baskebtall player in the NBA<br> 
>> >__c.__ Median salary of Data Scientistis across every major metropolitan area in the United States <br>
> 2. __[Inferential Statistics](https://en.wikipedia.org/wiki/Statistical_inference):__ The purpose of Inferential Statistics is to make inferences about a population using a subset of the population known as a sample 
>> - Inferential Statistics begin with a hypothesis about the population and then the sample is used to prove or disprove the hypoothesis, effectively inferring something about the population
>> - Examples of Inferential Statistics: <br>
>> >__a.__ Survey is sent out to 1000 residents in Chicago asking them about their political views. The survey designers are interested in knowing if Chicago residents in different areas have more or less conservative views. <br>
>> >__b.__ A clinician develops a new drug to help patients relieve anxiety. The clinician collects a sample of individuals and gives half of them the new drug and the other half are given a placebo drug. 

__Helpful Points:__ 
1. It is important to understand that the purpose of Descriptive Statistics is to simply collect and record metrics and nothing more. It does NOT involve any generalization beyond the summary statistics 
2. In an experiment it is common to use both Descriptive AND Inferential Statistics 