## Background

Uncertainty is a facininating concept. It is extremely vast and applies to almost all aspects of life. You could spend your entire life learning about it and still not understand all of it.
Uncertainty is everywhere, in everything we do. For example, uncertainty affects the economy in many ways. Peoples attitude to spending are usually based on their level of confidence in the economy. Uncertainty of our future climate drives us to recycle more, burn more efficient fuel, reduce our emmisions and respect the environment. In a sense, it is almost motivational as once we are aware of something, we typically strive to improve it.
In science, we frequently measure or observe things. We do this so that we can understand them better. By putting a measure on something it allows us to describe something, to visualise it or to comprehend it. Humans will design an experiment or build a model to test or measure a phenomenon and uncertainty will be the measure of the doubt that exists about the result or prediction. 

## Project - Dataset

This project will focus on the field of dimensional metrology and more specifically, measurement systems. A measurement system might be described as a system encompassing the measuring instrument, the software, firmware, people/users, environment, method and the articles that they inspect. When designing a measurement system many aspects must be considered but ultimately the system should have an acceptable accuracy and variance for what it is designed to inspect. In conjunction with this, each measurement result should be accompanied by an uncertainty value since no measurement is ever perfect. When a measurement is taken, many phenomena can occur to influence that result and generally speaking that influence could be captured in the overall uncertainty. Estimating this uncertainy, it is possible to mitigate for the risk of misclassification based off results/observations made.         

## Scenario

You are asked to provide consultation to an aerospace company that manufactures a component. Post manufacture it inspects each component dimensionally to ensure it meets design intent/criteria. The component itself has 4 geometric features that must be verified through inspection, each having a nominal size and an associated tolerance. A measurement system is used to inspect the components to meet volume requirements. There is an issue with this component when it comes to final assembly. It is not fitting with other components at the final assembly plant located in another country. The equipment was installed correctly, has all the necessary certified paperwork and is passing its daily qualification checks. What is going on?

The brief of the project is to create a synthetic dataset. This dataset will attempt to capture what the above situation might look like (in a fairly simplistic approach with assumptions) if it were a real investigation of the measurement phenomenon and try to answer some additional questions along the way.

## Investigation

Investigation starts with the component. It has four geometric features that are inspected. They are relatively simple in terms of geometric complexity. Two features are lengths, one is a width and other is a diameter. There are no complex surfaces or advanced geometric constraints to be inspected.

The features are:

length_1 is 14.010mm +/- 0.020: length_2 is 10.050mm +/- 0.050: width_1 is  50.200mm +/- 0.100: Dia_1 is 30.000mm +/- 0.070

Instruments:

It was found that the instrument is a coordinate measuring machines, often referred to as 'CMM's'. For now lets call it instrument_1
It has a calibration certificate stating the equipments maximum permissable error for error of length indication. After speaking with the OEM, it is understood that this is essentially an accuracy statement for the equipment when measuring length type features.

It is stated in linear type form:  MPEE  = X + L/K   where X and K are constants. (x = um, L = mm, K = no units)

instrument_1 = 2.8 + L/1000 (result is in microns)



The certificate shows that instrument_1 was calibrated and serviced 2 months ago. It is noted that instrument_1 has a positive bias of 0.5um. The instrument does not exibit any other systematic errors.


Users:

Looking at user of the measurement system, It was discovered that one operator was responsible for running the system.

Mary ran instrument_1 and was quite experienced. A standardised method for use of the equipment was also in place. 


Environment:

instrument_1 was located in a temperature controlled room cerified at 19-21 deg C. However, the climate control was malfunctioning and the temperature was found to be at a static 25 Deg C 


## Summary of Investigation

After further investigation I decide the following:

(1) I need to pull some previous data on measurements and review the results for each feature (2) I need to calculate what the likely instrument error is for measurements of each feature (3) I need to compensate for bias in the equipment (4) I need to investigate the effects of temperature (5) estimate any likely variance (6) estimate what the overall uncertainty might be and (7) analyse the data and generate any conclusions.  

## Simulating the results from 100 components

I would expect the results to be normally distributed for each feature. I'm going to make a few assumptions here. Having reviewed the measurment results I found that length_1 and width_1 were manufactured close to the upper tolerance limit. The other features were relatively centred on nominal. All data should be floating point numbers and should be positive values based on the specifications above.

In [47]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

#define the features, 100 samples each having different stdev and means
length_1 = np.random.normal(14.027, 0.005, 100)
length_2 = np.random.normal(10.050, 0.0004, 100)
width_1 = np.random.normal(50.228, 0.004, 100)
dia_1 = np.random.normal(30.000, 0.0001, 100)

# type 'variable name' to view the array of data. I just don't need to see it yet

features = pd.DataFrame({'length_1':np.random.normal(14.027, 0.005, 100),'length_2':np.random.normal(10.050, 0.001, 100),'width_1':np.random.normal(50.228, 0.003, 100),'dia_1':np.random.normal(30.000, 0.0001, 100)}) 

features.head(21)
#can use df.head() to free up screen space

Unnamed: 0,length_1,length_2,width_1,dia_1
0,14.019296,10.050061,50.227714,30.000052
1,14.020608,10.0518,50.225594,30.000101
2,14.02436,10.051394,50.228696,29.999852
3,14.02951,10.048584,50.231483,29.999997
4,14.017988,10.050769,50.227817,29.999902
5,14.029914,10.048846,50.227994,29.999962
6,14.030224,10.048965,50.227674,29.999943
7,14.025415,10.047568,50.229888,30.000103
8,14.014414,10.049784,50.228913,29.999926
9,14.021438,10.051785,50.226428,29.999888
