# Probability

Probability is the study of randomness and uncertainty. It quantifies the likelihood of an event happening and is defined as a number between 0 and 1, where 0 indicates an impossible event and 1 represents a certain event.

### Key Concepts:

- Probability of an event (P): The chance that a specific event will occur.

Formula: 
$ 𝑃(𝐴)=
Number of favorable outcomes
/Total number of possible outcomes $

​
Example: The probability of rolling a 3 on a fair 6-sided die is 
$ 𝑃(3)= 1/6 $


- Complementary Probability: The probability that an event does not happen.

$ 𝑃(not A)=1−𝑃(𝐴) $

- Conditional Probability: The probability of event A happening given that event B has already occurred.

Formula: 
$ 𝑃(𝐴∣𝐵)=𝑃(𝐴∩𝐵)/𝑃(𝐵) $

 
Example: The probability of drawing an ace from a deck, given that a card already drawn was a heart.

## Combinatorics
Combinatorics is a branch of mathematics that studies the counting, arrangement, and combination of objects. It is closely related to probability because it often helps determine the number of possible outcomes.

### Key Concepts:

- Permutations: The number of ways to arrange objects where the order matters.

Formula for arranging 𝑛 distinct objects: 
$ 𝑃(𝑛)=𝑛! $ 

Example: The number of ways to arrange 3 letters (A, B, C) is $3!=6$

- Combinations: The number of ways to choose objects where the order does not matter.

Formula: 
$𝐶(𝑛,𝑘)=𝑛!/𝑘!(𝑛−𝑘)!$

 
Example: The number of ways to choose 2 cards from a deck of 52: 
$𝐶(52,2)$

## Bayesian Inference
Bayesian inference is a method of statistical inference that applies Bayes' theorem to update the probability estimate for a hypothesis as more evidence or information becomes available. It is central to the Bayesian approach to statistics, which contrasts with the frequentist approach.

Key Concepts:

Bayes' Theorem: Used to find the probability of a hypothesis based on prior knowledge and new evidence.
Formula: 
$ 𝑃(𝐻∣𝐸)=𝑃(𝐸∣𝐻)⋅𝑃(𝐻)/𝑃(𝐸) $

​
 
𝑃(𝐻∣𝐸): Posterior probability (the probability of the hypothesis given the evidence)
𝑃(𝐻): Prior probability (the initial probability of the hypothesis)
𝑃(𝐸∣𝐻): Likelihood (the probability of the evidence given the hypothesis)

Example: Updating the probability of having a disease after receiving a positive test result.
Distributions

In probability and statistics, a distribution describes how the values of a random variable are spread or distributed. Different types of distributions are used to model different kinds of random events.

### Key Types of Distributions:

- Normal Distribution: Also known as the bell curve, it describes data that clusters around the mean symmetrically. Many real-world phenomena (e.g., heights, IQ scores) follow a normal distribution.
Properties: Symmetrical, mean = median = mode.
- Binomial Distribution: Models the number of successes in a fixed number of independent trials, where each trial has two possible outcomes (e.g., flipping a coin).
Formula: 
$ 𝑃(𝑋=𝑘)=𝐶(𝑛,𝑘)⋅𝑝𝑘⋅(1−𝑝)^𝑛−𝑘 $

- Poisson Distribution: Models the number of events occurring within a fixed interval of time or space.
Example: The number of emails received in an hour.
Probability in Other Fields
Probability plays a role in various disciplines and fields:

Economics and Finance:

Probability is used to model uncertain events like stock market returns, economic fluctuations, or interest rate movements. Tools like stochastic processes and Markov chains help predict such behaviors.
Machine Learning:

Probability is at the heart of algorithms like Naive Bayes, Hidden Markov Models, and Gaussian Mixture Models. Bayesian networks are used to represent probabilistic relationships between variables.
Physics:

In quantum mechanics, probability is used to describe the likelihood of particle states and events. For example, the probability distribution of a particle's position is fundamental to the theory.
Genetics:

In genetics, probability is used to predict inheritance patterns, mutations, and evolutionary changes. Tools like Punnett squares help visualize genetic outcomes.
In summary:

Probability quantifies uncertainty.
Combinatorics helps calculate possible outcomes.
Bayesian Inference updates beliefs based on evidence.
Distributions model the likelihood of various outcomes.
Probability in other fields extends the concept to finance, physics, machine learning, and more.







In [6]:
import pandas as pd
import numpy as np
from scipy.stats import binom, poisson
from math import comb, factorial

In [10]:
# Load the dataset into a pandas DataFrame
df = pd.read_csv("marketing_company_data.csv")
df

Unnamed: 0,Campaign ID,Budget (USD),Clicks,Impressions,Conversions,Region,Date
0,1,8270,1943,92747,386,South,2023-01-01
1,2,1860,588,63752,460,South,2023-01-02
2,3,6390,3076,57573,347,East,2023-01-03
3,4,6191,2059,60101,189,East,2023-01-04
4,5,6734,2485,27646,190,East,2023-01-05
...,...,...,...,...,...,...,...
195,196,7546,4648,14116,251,North,2023-07-15
196,197,2986,3059,26470,225,North,2023-07-16
197,198,9338,2365,43344,293,South,2023-07-17
198,199,3911,2579,43918,305,North,2023-07-18


In [16]:
# Convert date column to datetime for easier handling
df['Date'] = pd.to_datetime(df['Date'])

In [18]:
# 1. Basic Probability: Probability of a campaign having more than 400 conversions
total_campaigns = len(df)
campaigns_with_more_than_400_conversions = df[df['Conversions'] > 400].shape[0]
P_more_than_400_conversions = campaigns_with_more_than_400_conversions / total_campaigns

print(f"1. P(Conversions > 400): {P_more_than_400_conversions:.4f}")



1. P(Conversions > 400): 0.1950


In [20]:
# 2. Complementary Probability: Probability of a campaign having less than or equal to 400 conversions
P_less_equal_400_conversions = 1 - P_more_than_400_conversions

print(f"2. P(Conversions <= 400): {P_less_equal_400_conversions:.4f}")


2. P(Conversions <= 400): 0.8050


In [22]:
# 3. Conditional Probability: Probability of more than 300 conversions given the campaign is in the South region
south_campaigns = df[df['Region'] == 'South']
south_campaigns_with_more_than_300_conversions = south_campaigns[south_campaigns['Conversions'] > 300].shape[0]
P_more_than_300_given_south = south_campaigns_with_more_than_300_conversions / len(south_campaigns)

print(f"3. P(Conversions > 300 | South): {P_more_than_300_given_south:.4f}")


3. P(Conversions > 300 | South): 0.4259


In [24]:
# 4. Binomial Distribution: Probability of getting exactly 3 successful campaigns out of 5 (success is defined as more than 200 conversions)
successful_campaigns = df[df['Conversions'] > 200].shape[0]
P_success = successful_campaigns / total_campaigns  # Probability of success (Conversions > 200)

n = 5  # Number of trials
k = 3  # Number of successes
P_binom_3_successes = binom.pmf(k, n, P_success)

print(f"4. P(exactly 3 successful campaigns out of 5): {P_binom_3_successes:.4f}")


4. P(exactly 3 successful campaigns out of 5): 0.3441


In [26]:
# 5. Poisson Distribution: Probability of observing exactly 5 campaigns with more than 1000 clicks per day
campaigns_with_more_than_1000_clicks_per_day = df[df['Clicks'] > 1000].groupby(df['Date'].dt.date).size()
lambda_clicks = campaigns_with_more_than_1000_clicks_per_day.mean()  # Average number of campaigns with > 1000 clicks per day
k_clicks = 5
P_poisson_5_clicks = poisson.pmf(k_clicks, lambda_clicks)

print(f"5. P(exactly 5 campaigns with more than 1000 clicks per day): {P_poisson_5_clicks:.4f}")


5. P(exactly 5 campaigns with more than 1000 clicks per day): 0.0031


In [28]:
# 6. Bayesian Inference: Update probability of success given that a campaign from the South had 500 conversions
prior_P_success_south = 0.70  # Prior probability (given)
# Assume likelihood and marginal probability are based on the dataset:
likelihood_500_given_success = south_campaigns[south_campaigns['Conversions'] == 500].shape[0] / successful_campaigns
P_500 = df[df['Conversions'] == 500].shape[0] / total_campaigns  # Overall probability of observing 500 conversions

posterior_P_success_given_500 = (likelihood_500_given_success * prior_P_success_south) / P_500

print(f"6. Updated P(Success | 500 conversions): {posterior_P_success_given_500:.4f}")


ZeroDivisionError: float division by zero

In [30]:
# 7. Combinatorics: Number of ways to choose 5 campaigns from the dataset
n_campaigns = total_campaigns
k_choice = 5
num_ways_to_choose_5_campaigns = comb(n_campaigns, k_choice)

print(f"7. Number of ways to choose 5 campaigns: {num_ways_to_choose_5_campaigns}")


7. Number of ways to choose 5 campaigns: 2535650040


In [32]:
# 8. Distribution Fitting: Fit normal distribution on the 'Conversions' column
mean_conversions = df['Conversions'].mean()
std_conversions = df['Conversions'].std()

print(f"8. Conversions Distribution - Mean: {mean_conversions:.4f}, Std Dev: {std_conversions:.4f}")

8. Conversions Distribution - Mean: 250.2700, Std Dev: 139.3054
