# Project Business Statistics: E-news Express

By: Mohammad Sharaf,
At : 12 Nov 2022

## Define Problem Statement and Objectives

The advent of e-news, or electronic news, portals has offered us a great opportunity to quickly get updates on the day-to-day events occurring globally. The information on these portals is retrieved electronically from online databases, processed using a variety of software, and then transmitted to the users. There are multiple advantages of transmitting new electronically, like faster access to the content and the ability to utilize different technologies such as audio, graphics, video, and other interactive elements that are either not being used or aren’t common yet in traditional newspapers.

E-news Express, an online news portal, aims to expand its business by acquiring new subscribers. With every visitor to the website taking certain actions based on their interest, the company plans to analyze these actions to understand user interests and determine how to drive better engagement. The executives at E-news Express are of the opinion that there has been a decline in new monthly subscribers compared to the past year because the current webpage is not designed well enough in terms of the outline & recommended content to keep customers engaged long enough to make a decision to subscribe.

[Companies often analyze user responses to two variants of a product to decide which of the two variants is more effective. This experimental technique, known as A/B testing, is used to determine whether a new feature attracts users based on a chosen metric.]

#Objective
The design team of the company has researched and created a new landing page that has a new outline & more relevant content shown compared to the old page. In order to test the effectiveness of the new landing page in gathering new subscribers, the Data Science team conducted an experiment by randomly selecting 100 users and dividing them equally into two groups. The existing landing page was served to the first group (control group) and the new landing page to the second group (treatment group). Data regarding the interaction of users in both groups with the two versions of the landing page was collected. Being a data scientist in E-news Express, you have been asked to explore the data and perform a statistical analysis (at a significance level of 5%) to determine the effectiveness of the new landing page in gathering new subscribers for the news portal by answering the following questions:

- Do the users spend more time on the new landing page than on the existing landing page?
- Is the conversion rate (the proportion of users who visit the landing page and get converted) for the new page greater than the conversion rate for the old page?
- Does the converted status depend on the preferred language?
- Is the time spent on the new page the same for the different language users?


#Data Dictionary
The data contains information regarding the interaction of users in both groups with the two versions of the landing page.

- user_id - Unique user ID of the person visiting the website
- group - Whether the user belongs to the first group (control) or the second group (treatment)
- landing_page - Whether the landing page is new or old
- time_spent_on_the_page - Time (in minutes) spent by the user on the landing page
- converted - Whether the user gets converted to a subscriber of the news portal or not
- language_preferred - Language chosen by the user to view the landing page


## Import all the necessary libraries

In [None]:
import numpy as np #library used for working with arrays
import pandas as pd #library used for data manipulation and analysis
import matplotlib.pyplot as plt #library for plots and visualisations
import seaborn as sns #library for visualisations
sns.set_theme(style="darkgrid")
%matplotlib inline

import scipy.stats as stats #this library contains a large number of probability distributions and statistical functions

## Reading the Data into a DataFrame

---



In [None]:
# read the data from a shared link
url ='https://drive.google.com/file/d/1RfJgcBGtOJmgKCMMXQfAXbQAH6LzJ0JQ/view?usp=share_link';
file_id=url.split('/')[-2];
dwn_url='https://drive.google.com/uc?id=' + file_id;
df = pd.read_csv(dwn_url);

## Explore the dataset and extract insights using Exploratory Data Analysis

- Data Overview
  - Viewing the first and last few rows of the dataset
  - Checking the shape of the dataset
  - Getting the statistical summary for the variables
- Check for missing values
- Check for duplicates

In [None]:
# Viewing the first and last few rows of the dataset
print('First 5 rows :');
print(df.head(5));
print('________________________________________________');
print('Last 5 rows :');
print(df.tail(5));
print('________________________________________________');

#Checking the shape of the dataset
print('Num of rows / Num of columns : ',df.shape);
print('________________________________________________');
print('Missing Values');
# To check if we have missing values we can use isNull method, as you can see we have no missing values.
print(df.isnull().sum());
print('________________________________________________');
print('Duplicate Values');
# To check if we have Duplicate values we can use duplicated method.
print(df.duplicated());

First 5 rows :
   user_id      group landing_page  time_spent_on_the_page converted  \
0   546592    control          old                    3.48        no   
1   546468  treatment          new                    7.13       yes   
2   546462  treatment          new                    4.40        no   
3   546567    control          old                    3.02        no   
4   546459  treatment          new                    4.75       yes   

  language_preferred  
0            Spanish  
1            English  
2            Spanish  
3             French  
4            Spanish  
________________________________________________
Last 5 rows :
    user_id      group landing_page  time_spent_on_the_page converted  \
95   546446  treatment          new                    5.15        no   
96   546544    control          old                    6.52       yes   
97   546472  treatment          new                    7.07       yes   
98   546481  treatment          new                    6.20

### Univariate Analysis

### Bivariate Analysis

## 1. Do the users spend more time on the new landing page than the existing landing page?

### Perform Visual Analysis

### Step 1: Define the null and alternate hypotheses

### Step 2: Select Appropriate test

### Step 3: Decide the significance level

### Step 4: Collect and prepare data

### Step 5: Calculate the p-value

### Step 6: Compare the p-value with $\alpha$

### Step 7:  Draw inference

**A similar approach can be followed to answer the other questions.**

## 2. Is the conversion rate (the proportion of users who visit the landing page and get converted) for the new page greater than the conversion rate for the old page?

## 3. Is the conversion and preferred language are independent or related?

## 4. Is the time spent on the new page same for the different language users?

## Conclusion and Business Recommendations

___