## Intro to BI

Business Intelligence is an application of data analysis that aids companies in making data-driven decisions. Using Business Intelligence technologies, we can provide insights into the performance of a business in the past and the present as well as make predictions or recommendations for the future. A key difference between Business Intelligence and data analysis is those insights are meant to be actionable and geared towards helping a business make decisions.

- Business Intelligence tools are designed to make sense of the huge quantities of data that organizations accumulate over time. The BI tools analyze this information and present it as actionable information that can guide decision making.

- Business Intelligence software makes up a large heterogeneous category of software. Not all tools in the category can be meaningfully compared to each other. There are several types of BI tools of which the most substantive are the Full-Stack Business Intelligence Tools and Data Visualization Tools.



#### Job Market Perspective

Data Analyst and Business Inteligence positions usually differ:
* DA: more technical skills (might require python or R, deeper knowledge in statistics).
* BI: more business related skills (usually uses drag and drop tools, excel, will hardly require ML).

### KPIs

Key Performance Indicators are metrics used by companies to give information on how well the business is doing. It is a crucial part of an analyst's job to use the most appropriate metrics to measure if the company is reaching certain objectives.

### Tableau

Tableau is one of the most popular and intuitive applications for business intelligence and visual data exploration today. The application makes it easy to connect to an existing data source, create interactive visualizations, and perform basic analysis to discover insights. In this lesson, we will learn about the different features of Tableau and how it can be a valuable addition to your analytics tool kit. We will also cover how to load data into Tableau - both from file and hosted data sources.

Tableau has several features that make it a great tool for data analysis. In this section, we will highlight some of the most important ones:

- `Drag & Drop Interface`: One of the things that makes Tableau so popular is its ease of use. It allows you to intuitively drag and drop fields where you want them and move things around to get exactly the visualization you want.

- `Variety of Visualizations`: Tableau has several types of visualizations you can use - from bar charts, line charts, and scatter plots to area charts, bubble charts, treemaps. Speaking of maps, it also allows you to visualize your data geographically.

- `Dashboards and Stories`: You can combine several visualizations into dashboards and stories that focus on communicating interesting insights with the stakeholders that need the information.

- `Data Exploration`: Perhaps one of Tableau's most useful features is how easy it is to explore data with it. You can start with an aggregated view of the data, filter to zoom in on something that looks interesting, add dimensions to get more well-rounded perspectives, and look at the underlying records to get to the bottom of what you are trying to investigate.



## AB Testing

An A/B test (aka split test), is an experiment to determine which of different variations of an online experience performs better.

The different versions are presented to the users at random and the result is analyzed after a period of time, to verify which version performed better.

AB testing can be performed with p-value...

### Mini Case Study

There are two equally divided groups of customers from an online store, A and B.

Users B received an email with "Offer ends this Saturday. Use code ###!". Users A did not receive the email.

Was sending the e-mail a good idea?

In [1]:
import pandas as pd

data = pd.read_csv('AB.csv')
data

Unnamed: 0,CustomerNo,Website View,Test Group
0,1,88,A
1,2,56,B
2,3,80,B
3,4,49,A
4,5,61,A
...,...,...,...
9995,9996,50,B
9996,9997,55,A
9997,9998,55,A
9998,9999,53,B


In [2]:
A = data[data['Test Group'] == 'A']['Website View']
B = data[data['Test Group'] == 'B']['Website View']
A

0       88
3       49
4       61
5       21
6       48
        ..
9991    69
9993    68
9994    33
9996    55
9997    55
Name: Website View, Length: 5000, dtype: int64

In [6]:
len(A) == len(B)

True

In [7]:
sum(B) - sum(A)

10359

In [8]:
import numpy as np
np.mean(A), np.mean(B)

(50.045, 52.1168)

In [None]:
A1 -> 2500 users -> avg1
A2 -> 2500 users -> avg2

A -> no email
B -> 2 days before
C -> 1 day before

AB -> p sig -> tstat = 10
AC -> p sig -> tstat = 20
BC -> not sig

AB testing refers to hypothesis testing in statistics, where we have a hypothesis (h1) and a null-hypothesis (h0).

- h0 = results for A and B are not different. mean(A) = mean(B)
- h1 = results for A and B are different

Once we set our hypothesis, we set our confidence interval. Let's say we set our confidence interval to 95%, which means that if you repeat this test over and over again the results will match the initial test in 95% of cases.

For a confidence interval of 95%, our threshold for comparison with the p-value is 0.05. 

If the p-value is lower then our 0.05 threshold, then we can reject the null hypothesis.

A p-value is a measure of the probability that an observed difference could have occurred just by random chance.

In [3]:
from scipy.stats import ttest_ind # z_test

p_value = ttest_ind(A, B)[1]

if p_value < 0.05:
    print("Rejected the null hypothesis! p_value =", p_value)
else:
    print("Failed to reject null hypothesis.")

Rejected the null hypothesis! p_value = 3.936341519171799e-08


In [4]:
ttest_ind(A, B)

Ttest_indResult(statistic=-5.497970336287157, pvalue=3.936341519171799e-08)

The null hypothesis is rejected, which means there is a significant difference between groups A and B.

We still need to measure if that difference is what we expected.

Failing to reject the null hypothesis is not the same as accepting it. A lack of evidence only means that you haven’t proven that something exists. It does not prove that something doesn’t exist. It might exist, but your study missed it. 

**That's not all folks!** We'll talk more about p-values in future classes.

A dense read, connecting AB testing with the statistical concepts we've learned so far [here](https://conversionsciences.com/ab-testing-statistics/).