<a href="https://colab.research.google.com/github/cpaniaguam/CSC104/blob/main/CSC104MiniProject3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Selecting people to perform physically demanding tasks
Even in today's high-tech world, many jobs require significant physical strength to be performed properly, especially in construction, maintenance and repair work, law enforcement, highway maintenance, and [many more](https://stacker.com/stories/3222/50-most-physical-jobs-america).



##Question: How would you choose a person to perform a physically demanding job?

One natural way is to take the candidate to the job site to have them demonstrate adequate proficiency and strength at performing the task. However, this can be time intensive when selecting a large number of candidates from an equally large applicant pool. Besides, applicants risk getting injured if they are not stong enough to perform the tasks.

One could try, instead, an indirect approach: use a measure of physical strength that is simple to apply, low risk of injury, and that can be associated with how adequately a person does a job.

This mini-project will use data collected for a [study](https://tinyurl.com/y7evtzgy) in which such indirect approaches were tested against actual strength and job performance of real workers at the job. You will 
1. apply scatterplots for preliminary exploration of the potential relationships between variables of interest
2. use and test correlation coeffiencients to assess the plausibility and strength of pair of variables being linearly associated
4. verify necessary conditions for implementing a linear model
5. construct and interprert simple linear regression models
6. do inference on the regression estimates
6. Make predictions using linear models and give appropiate interpretations



##The data
The dataset you will be using is below. A description of the variables follows the two code cells.

In [None]:
# %precision 2 
import pandas as pd
df=pd.read_csv('https://raw.githubusercontent.com/cpaniaguam/CSC104/main/phystrength.csv')
description = df.describe().T
description.style.format({'count':'{:.0f}', 'mean':'{:.2f}',
                              'std':'{:.2f}','min':'{:.2f}','25%':'{:.2f}',
                              '50%':'{:.2f}','75%':'{:.2f}','max':'{:.2f}'})

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
GRIP,147,110.23,23.63,29.0,94.0,111.0,124.5,189.0
ARM,147,78.75,21.11,19.0,64.5,81.5,94.0,132.0
RATINGS,147,41.01,8.52,21.6,34.8,41.3,47.7,57.2
SIMS,147,0.2,1.68,-4.17,-0.96,0.16,1.07,5.17


In [None]:
df

Unnamed: 0,GRIP,ARM,RATINGS,SIMS
0,105.5,80.5,31.8,1.18
1,106.5,93.0,39.8,0.94
2,94.0,81.0,46.8,0.84
3,90.5,33.5,52.2,-2.45
4,104.0,47.5,31.2,1.00
...,...,...,...,...
142,147.0,71.0,57.2,0.53
143,109.5,86.5,43.0,2.89
144,54.0,67.5,41.7,-1.38
145,126.0,63.5,37.0,1.33


# Data description


As you can gather from above, the data consist of 147 cases and 4 variables. Each case represents a worker who performs physically demanding tasks ([lineworkers](https://en.wikipedia.org/wiki/Lineworker), [mechanics](https://en.wikipedia.org/wiki/Mechanic), [electricians](https://en.wikipedia.org/wiki/Electrician), and [construction and maintenance workers](https://en.wikipedia.org/wiki/Construction_worker)). The first two variables (`GRIP` for grip and `ARM` for arm strength) are measures of strength (in lbs.) from each worker. These measurements were gathered using a machine called [Jackson Evaluation Equipment (JES)](https://lafayetteevaluation.com/products/jackson-strength-system). The last two variables (`RATINGS` and `SIMS`) are job performance measurements. `RATINGS` is a rank given to each worker by their respective supervisors (the higher the better). `SIMS` is an artificial variable based on a simulation that required workers to exert force on an artificial wrench while standing and kneeling. Larger scores indicate better performance.

### Task 1
a. Look at the distributions for each of the variables. In a text cell describe their shape (center, symmetry, modes, skewness, etc.) Answer questions like the following to start getting a feel for the data: What proportion of workers exerted force greater than 100 lbs.? Were the workers in the sample mostly 'good'?

b. As analysts we like it when variables in our data follow well-known distributions, such as the normal distribution. Do any of the variables appear to be normal? Conduct normality checks using the [*empirical rule*](https://learn.zybooks.com/zybook/SALVECSC104PaniaguaSpring2021/chapter/4/section/4?content_resource_id=48842584), and *qq plots* or *normal probability plots*. Do these methods agree?

In [None]:
# Your code for task 1 goes here

##Task 2: Exploring relationships among the variables
We want to use `GRIP` and `ARM` as predictors of `RATINGS` and `SIMS`. Are these variables linearly associated?
1. Construct scatterplots of `ARM` and `GRIP` against `RATINGS`and `SIMS`. Do they seem to be correlated? By looking at these plots, how strong will you say the relations seem to be?
2. Obtain the respective correlation coefficients and coefficients of determination. Verify your predictions.
3. Write down your conclusions.

In [None]:
# Your code for task 2 goes here

[Your conclusions for task 2 go here.]

# Inferential Analysis

##Task 3: Are these correlation coefficients significant?

Conduct the four possible tests for significance at the $0.05$ significance level. For each write the null and alternative hypotheses, the type of alternative hypothesis, and the p-value. Write down your conclusions.



In [None]:
# Your code for task 3 goes here

[Your conclusions for task 3 goes here]

##Task 4: Looking at the residuals and checking assumptions
Before we can make predictions with a linear regression model, we must check the assumptions for the model are met. Otherwise any predictions we make would be unreliable.

1. Obtain the estimates for the slope and intercept for each regression model.
2. Construct residual plots. Do the zero mean, constant variance, normality, and independce assumptions reasonably check out? Write your conclusions in the text cell below.

In [None]:
# Code for task 4 goes here

[Your conclusions for task 4 go here]

##Task 5: Make some predictions (if everything worked out!)

1. A worker with 125 lbs of  `GRIP` and 95 lbs of `ARM` applies for one of these jobs. Using the appropiate model, obtain $95\%$ confidence intervals for `RATINGS` and `SIMS`. Would you recommend a manager to hire such a candidate?

In [None]:
# Your code for this part goes here

[Your recommendation for the manager goes here]