In [46]:
import pandas as pd
import scipy.stats as stats
from scipy.stats import pearsonr
from scipy.stats import spearmanr

In [6]:
data = pd.read_csv('marketing_company_data.csv')
data

Unnamed: 0,Campaign ID,Budget (USD),Clicks,Impressions,Conversions,Region,Date
0,1,8270,1943,92747,386,South,2023-01-01
1,2,1860,588,63752,460,South,2023-01-02
2,3,6390,3076,57573,347,East,2023-01-03
3,4,6191,2059,60101,189,East,2023-01-04
4,5,6734,2485,27646,190,East,2023-01-05
...,...,...,...,...,...,...,...
195,196,7546,4648,14116,251,North,2023-07-15
196,197,2986,3059,26470,225,North,2023-07-16
197,198,9338,2365,43344,293,South,2023-07-17
198,199,3911,2579,43918,305,North,2023-07-18


## 1. T-tests
A T-test is used to determine if there is a significant difference between the means of two groups. In Python, we can use scipy.stats.ttest_ind for independent samples T-test.

##### Example:
Suppose you want to test if the average Clicks are significantly different between the East and West regions.

##### Filter data for the two regions

In [8]:
east_clicks = data[data['Region'] == 'East']['Clicks']
west_clicks = data[data['Region'] == 'West']['Clicks']

##### perform an independent T-Test

In [10]:
t_stat, p_value = stats.ttest_ind(east_clicks, west_clicks)
print(f"T-statistic: {t_stat}, P-value: {p_value}")

T-statistic: -0.8244695015088567, P-value: 0.41180530825950423


##### Interpret the results:

If p_value < 0.05, you reject the null hypothesis (i.e., the average Clicks are different between the two regions).
If p_value >= 0.05, you fail to reject the null hypothesis.

## 2. Chi-square Test
A Chi-square test examines the relationship between two categorical variables. In Python, you can use scipy.stats.chi2_contingency.

##### Example:
Testing if there is a relationship between Region and whether Conversions are above or below the median.

##### Create a categorical variable for Conversions:

In [13]:
data['High_Conversions'] = data['Conversions'] > data['Conversions'].median()

##### Create a contingency table

In [15]:
contingency_table = pd.crosstab(data['Region'], data['High_Conversions'])

##### Perform the Chi-square test:

In [17]:
chi2_stat, p, dof, expected = stats.chi2_contingency(contingency_table)
print(f"Chi-square statistic: {chi2_stat}, P-value: {p}")

Chi-square statistic: 1.6065342798016067, P-value: 0.6579091248139446


##### Interpret the results:

If p < 0.05, there is a relationship between Region and Conversions.
If p >= 0.05, no significant relationship exists.

## 3. ANOVA (Analysis of Variance)
ANOVA is used to compare the means of more than two groups. In Python, use scipy.stats.f_oneway.

##### Example:
Testing if there is a difference in mean Impressions across different regions.

##### Filter the data for each region:

In [22]:
south_impressions = data[data['Region'] == 'South']['Impressions']
north_impressions = data[data['Region'] == 'North']['Impressions']
east_impressions = data[data['Region'] == 'East']['Impressions']
west_impressions = data[data['Region'] == 'West']['Impressions']

##### Perform the ANOVA test:

In [24]:
f_stat, p_value = stats.f_oneway(south_impressions, north_impressions, east_impressions, west_impressions)
print(f"F-statistic: {f_stat}, P-value: {p_value}")


F-statistic: 2.49256151042264, P-value: 0.06134359015485362


##### Interpret the results:

If p_value < 0.05, at least one group mean is significantly different.
If p_value >= 0.05, no significant difference in means.


## 1. Pearson Correlation
Pearson correlation measures the linear relationship between two variables. Use scipy.stats.pearsonr to calculate the correlation coefficient.##### 

Example:
Find the correlation between Budget and Conversions.

In [38]:

correlation, p_value = pearsonr(data['Budget (USD)'], data['Conversions'])
print(f"Pearson Correlation: {correlation}, P-value: {p_value}")

Pearson Correlation: 0.06073416629740151, P-value: 0.3929298642553684


##### Interpret the results:


If p_value < 0.05, the correlation is statistically significant.
The correlation coefficient value lies between -1 (perfect negative) and 1 (perfect positive).

## 2. Spearman Correlation
Spearman correlation measures the rank-order relationship between two variables. Use scipy.stats.spearmanr.

##### Example:
Find the Spearman correlation between Clicks and Impressions.

In [44]:

correlation, p_value = spearmanr(data['Clicks'], data['Impressions'])
print(f"Spearman Correlation: {correlation}, P-value: {p_value}")

Spearman Correlation: -0.0455502754079164, P-value: 0.521865148062953


In [None]:
TASKS

- T-tests: Comparing Conversions in different regions.
- Chi-square tests: Analyzing the relationship between Region and high/low Impressions.
- ANOVA: Testing for differences in Budget (USD) across regions.
- Correlation analysis: Exploring relationships between Impressions, Clicks, and Conversions.