In [2]:
##### SETUP THE LOOK AND FEEL OF OUR NOTEBOOK FOR MATPMD1 #####

# Call out to HTML & javascript to set our look and feel. We need libraries IRdisplay and js for this.
library("IRdisplay")

ChangeDisplaySettings<-TRUE
if (ChangeDisplaySettings == TRUE) {

    # This command will change the size of R plots. Adjust width and height to suit.
    options(repr.plot.width=8,repr.plot.height=6)

    # The following changes font size and colour for notebook pages

    display_html("
        <style>
            body {background-color: grey;
                  color: black;
                  font-family: Calibri, sans-serif;
                  font-size: 100%;
            }
            h1 {color: black;
                font-size: 200%;
            }
            h2 {color: black;
                font-size: 150%;
            }
            h3 {color: black;
                font-size: 100%;
            }
            p {padding: 10px 0px 10px;
               text-align: justify;
            }
            li {line-height: 100%;
                padding-top: 1%;
                padding-bottom: 1%;
                text-align: justify;
            }
            strong {font-weight: bold;
            }
            /* types of lists */
            ul.nobull {
                 list-style-type: none;
            }
            ol.i {
                list-style-type: lower-roman;
            }
            ol.a {
                list-style-type: lower-alpha;
            }
            ol.A {
                list-style-type: upper-alpha;
            }
            /* this is to make text justified in paragraphs and lists*/
            .text_cell_render p {
                text-align: justify;
                text-justify: inter-word;
            }
            .text_cell_render li {
                text-align: justify;
                text-justify: inter-word;
            }
            .rendered_html table, .rendered_html td, .rendered_html th {font-size: 100%;
            }
            .container {
                width: 80% !important;
            }
        </style>
    ")
}

<html>
<head>
    <h2>University of Stirling</h2>
    <h2>Computing Science and Mathematics</h2>
    <h2>MATPMD1 Statistics for Data Science</h2>
    <h1>Chapter 1  General Background
    </h1>    
</head>

<body>
    <p> Two important branches of applied mathematics are <strong>Probability Theory</strong> and <strong>Statistics</strong>.
    </p>
    <p>There has been a rapid expansion in both the theory and application of both these areas in the last 100 years and in particular in the years since the advent of powerful computers.
    </p>
</body>

<body>    
    <p>Many statistical techniques have been developed as a result of this. In this module we are going to investigate some of these. We are going to:
    <ul>
        <li>consider different types of data and how to visualise them
        </li>
        <li>conduct statistical tests to better understand a set of data
        </li>
        <li>reflect on appropriate choice of data collection method and application of statistical tests
    </ul>
</body>

<body>
    <p>We will be using the statistical programming language R to visualise data and conduct tests.
    </p>
    <p>Code snippets illustrating how to perform these statistical procedures in R can be found throughout the following chapters.
    </p>
</body>

<body>
    <h2> 1.2 Definition of Statistics
    </h2>
    <p>The term Statistics can be described as:
        <ul>
            <li>A collection of data, originally about the state of the nation e.g. size of the population, levels of trade or unemployment. The dictionary definition is a (large) collection of numerical facts or figures. 
            </li>
            <li>Alternatively, Statistics is 'the Science concerned with the collection, classification and interpretation of data'.
            </li>
        </ul>
</body>

<body>
    <h2> 1.3 Statistical methods
    </h2>
    <p>In this module we shall make a distinction between two different types of problem which involve using different elements of statistics which we label:
    </p>
    <ol class="A">
        <li>Descriptive Statistics</li>
        <li>Statistical Inference</li>
    </ol>
</body>

<body>
    <h3>1.3.1 Descriptive Statistics
    </h3>
    <p>Descriptive Statistics aid understanding of a set of data by providing measures that can be used to summarise the characteristics of the data.
    </p>
    <p>These measures can give an overview of how variable the data is, or how spread out the data is.
    </p>
</body>

<body>
    <h3>1.3.2 Statistical Inference
    </h3>
    <p>In many cases we will want to extrapolate conclusions from a <strong>sample</strong> and apply them to a wider <strong>population</strong>.
    </p>
    <p>In many other cases we may be looking at problems where we wish to draw general conclusions on the basis of a limited amount of data.
    </p>
</body>

<body>
    <p>In both cases, because they are based on a subset of the whole population, the conclusions will be subject to uncertainty. 
    </p>
    <p>Inference is the branch of Statistics which attempts to quantify the uncertainty using probability and related measures. This will involve model testing and estimation.
    </p>
</body>

<body>
    <h2>1.4 Population sampling
    </h2>
    <p>You will hopefully see that one very important aspect of statistics is the choice of the sample. 
    </p>
    <p>When a population is too large to gather data for every single one of its members, sampling is performed on a subset of the population.
    </p>
</body> 

<body>
    <p>For example, the use of opinion polls or a survey of television viewing would usually be conducted only in a subset of the population.
    </p>
    <p>This is a very common type of study in that it is usually not feasible to examine the entire population of interest so information is collected on a representative sample and we use the results from the sample to infer the results for the population.
    </p>
</body>

<body>
    <h2>1.5 Methodologies: Descriptive and Inference
    </h2>
    <p>Statisticians are concerned with the design of appropriate methods of data collection. The design of the experiment is of crucial importance if the analysis of the data is going to yield the greatest amount of correct and accurate information.
    </p>
    <p>However, if we do not wish to extrapolate our conclusions to a wider population and only want to describe the data already collected, this would technically be termed using descriptive statistics.
    </p>
</body>

<body>
    <p>Therefore, we can clearly identify the two different methodologies that we will follow in practice (A <strong>Descriptive Statistics</strong> and B <strong>Statistical Inference</strong>):
    </p>
    <img src="MATPMD1Chapter1Fig1.png" style="width:75%" >
</body>

<body>
    <h2>1.6 Statistics everywhere
    </h2>
    <p>The science of <strong>Statistics</strong> is applied in a vast array of disciplines and a wide variety of contexts e.g. Social Sciences.
</body>