# Basic Statistics Case Study

# Business Problem-4

## BACKGROUND:

Software development projects typically follow six basic phases: Requirements, design, implementation (and integration), testing (validation), deployment (installation) and maintenance. First, general requirements are gathered, and the scope of the functionality is defined. Then, alternative scenarios for the required functionality are developed and evaluated. Implementation, usually 50% or more of the development time, is the phase in which the design is translated into programs and integrated with other parts of the software – this is when software engineers actually develop the code. During the final phases, programs are tested, software is put into use, and faults or performance issues are addressed. ApDudes, a developer of applications for tablet computers, was having difficulty meeting project deadlines; only 10% of their projects had been completed within budget and on time last year and that was starting to hurt business. The group’s project manager was tasked with studying problems within the implementation phase. He found that software engineers were having difficulty prioritizing their work, and that they often became overwhelmed by the magnitude of the projects. As a result, two changes were made. Each project was broken down into smaller, distinct tasks, or jobs, and each job was assigned a priority. The project manager believes that this classification and prioritization system would speed the completion of high priority jobs, and
thus lower overall project completion time

## BUSINESS PROBLEM:

We will focus on the prioritization system. If the system is working, then high priority jobs, on average, should be completed more quickly than medium priority jobs, and medium priority jobs should be completed more quickly than low priority jobs. Use the data provided to determine whether this is, in fact, occurring.

## DATA AVAILABLE:

The data set contains a random sample of 642 jobs completed over the last six months. The
variables in the data set are:

    Days The number of days it took to complete the job
    Priority The priority level assigned to that job
    
#### Import Libraries    

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats as stats
import seaborn as sns
from scipy import stats

%matplotlib inline

In [2]:
priority_data=pd.read_csv('Priority_Assessment.csv')
priority_data

Unnamed: 0,Days,Priority
0,3.3,High
1,7.9,Medium
2,0.3,High
3,0.7,Medium
4,8.6,Medium
...,...,...
637,2.5,Low
638,0.3,High
639,0.3,Medium
640,1.3,Medium


In [4]:
priority_data.isnull().sum()

Days        0
Priority    0
dtype: int64

In [5]:
days = 'Days'

In [6]:
low= priority_data.loc[priority_data.Priority=='Low', days]
medium= priority_data.loc[priority_data.Priority=='Medium', days]
high= priority_data.loc[priority_data.Priority=='High', days]

In [11]:
print( 'mean High_Priority:', high.mean(), '| mean Medium_Priority:', medium.mean(), 
      '| mean Low_Priority:', low.mean() )

mean High_Priority: 3.023619631901845 | mean Medium_Priority: 2.5000000000000004 | mean Low_Priority: 4.228358208955225


In [7]:
def check_normality(data):
    test_stat_normality, p_value_normality=stats.shapiro(data)
    print("p value:%.4f" % p_value_normality)
    if p_value_normality <0.05:
        print("Reject null hypothesis >> The data is not normally distributed")
    else:
        print("Fail to reject null hypothesis >> The data is normally distributed")  

### 1. Defining Hypothesis

To determine if the reengineering effort changed the critical incidence rate. Is there evidence that the critical incidence rate
improved?

    H₀: μ₁=μ₂ or The mean of the samples is the same. No difference in the prioritization system.
    H₁: At least one of them is different.There is difference in the prioritization system.
    
The performance of the methods by using a 0.05 significance level. the hypothesis testing to check whether there is a difference between the performance of the methods by using a 0.05 significance level.

### 2. Assumption Check

    H₀: The data is normally distributed.
    H₁: The data is not normally distributed.

    H₀: The variances of the samples are the same.
    H₁: The variances of the samples are different.    

In [8]:
check_normality(low)
check_normality(medium)
check_normality(high)

p value:0.0000
Reject null hypothesis >> The data is not normally distributed
p value:0.0000
Reject null hypothesis >> The data is not normally distributed
p value:0.0000
Reject null hypothesis >> The data is not normally distributed


In [9]:
stat, pvalue_levene= stats.levene(low, medium, high)

print("p value:%.4f" % pvalue_levene)
if pvalue_levene <0.05:
    print("Reject null hypothesis >> The variances of the samples are different.")
else:
    print("Fail to reject null hypothesis >> The variances of the samples are same.")

p value:0.1993
Fail to reject null hypothesis >> The variances of the samples are same.


### 3. Selecting the Proper Test

In [10]:
F, p_value = stats.f_oneway(low, medium, high)
print("p value:%.6f" % p_value)
if p_value <0.05:
    print("Reject null hypothesis")
else:
    print("Fail to reject null hypothesis")

p value:0.164115
Fail to reject null hypothesis


### 4. Decision and Conclusion

There are significant differences in the average completion time for the three priority levels (at a significance level of 0.05). There is neither a statistical nor real priority jobs are being completed faster than medium world reason to conclude that high priority jobs on average.