# "Tutorial 02: Basic Terminologies In Statistics"
> "Some basic terms that you will encounter in statistics"

- toc: true 
- badges: true
- comments: true
- categories: [basic-stats]
- sticky_rank: 2
- hide: true
- search_exclude: true

# Data, Information and Knowledge

* **Data**:
    
    * Data is a collection of text, numbers and symbols with no meaning. 
    
    * It, therefore, has to be processed or provided with a context to make it meaningful.
    
    * Example:
    
        * 161.2, 175.3, 166.4, 164.7, 169.3 (units in cm).
        
        * Cat, dog, gerbil, rabbit, cockatoo.

* **Information**:

    * Information is the result of processing data. It enables the processed data to be used in a context and have a meaning.
    
    * In simpler words, information is data that has meaning.

    * If we put information into an equation with data, it will look like this: *Data + Meaning = Information*
    
    * Example:
    
        * 161.2, 175.3, 166.4, 164.7, 169.3 are the heights of the five tallest 15-year-old students in a class.
        
        * Cat, dog, gerbil, rabbit is a list of household pets.

* **Knowledge**:
    
    * Knowledge is the state of knowing/learning something through the given information.
    
    * If we put knowledge into an equation with information, it will look like this: *Information + Application or Use = Knowledge*
    
    * Example:
        
        * The tallest student is 175.3 cm.
        
        * A lion is not a household pet as it is not in the list, and it lives in the wild.

* So, to conclude: *Data* is a collection of facts. *Information* is how you understand those facts in context. *Knowledge* is learning something from the given information.

# Individuals and Variables

* Individuals are people or objects included in a study.
    
    * e.g. five individuals could be five people, five records or five reports.
    
    
* A variable is a characteristic of the individual to be measured or observed.
    
    * e.g. age, time etc.
    
    
* Example: Millions of Americans rely on caffeine to get them up in the morning. The data below shows nutritional content of some popular drinks at Ben's Beans coffee shop.

In [48]:
#hide_input
import pandas as pd
from IPython.display import display, HTML

## function to display multiple tables side by side
def multi_table(table_list):
    ## Acceps a list of IpyTable objects and returns a table which contains each IpyTable in a cell
    return HTML(
        '<table><tr style="background-color:#121212;">' + 
        ''.join(['<td>' + table._repr_html_() + '</td>' for table in table_list]) +
        '</tr></table>'
    )

df = pd.DataFrame(
    {
        "Drink": ["Brewed Coffee", "Caffe Latte", "Caffe Mocha", "Cappuccino", "Iced Brewed Coffee", "Chai Latte"],
        "Type": ["Hot", "Hot", "Hot", "Hot", "Cold", "Hot"],
        "Calories": [4, 100, 170, 60, 60, 120],
        "Sugar (g)": [0, 14, 27, 8, 15, 25],
        "Caffeine (mg)": [260, 75, 95, 75, 120, 60]
    }
)

multi_table(
    [
        pd.DataFrame().style.hide_index(), 
        pd.DataFrame().style.hide_index(),
        pd.DataFrame().style.hide_index(), 
        pd.DataFrame().style.hide_index(), 
        pd.DataFrame().style.hide_index(), 
        pd.DataFrame().style.hide_index(), 
        pd.DataFrame().style.hide_index(), 
        pd.DataFrame().style.hide_index(), 
        df
    ]
)

Unnamed: 0,Drink,Type,Calories,Sugar (g),Caffeine (mg),Unnamed: 6,Unnamed: 7,Unnamed: 8
0.0,Brewed Coffee,Hot,4.0,0.0,260.0,,,
1.0,Caffe Latte,Hot,100.0,14.0,75.0,,,
2.0,Caffe Mocha,Hot,170.0,27.0,95.0,,,
3.0,Cappuccino,Hot,60.0,8.0,75.0,,,
4.0,Iced Brewed Coffee,Cold,60.0,15.0,120.0,,,
5.0,Chai Latte,Hot,120.0,25.0,60.0,,,
,,,,,,,,Drink  Type  Calories  Sugar (g)  Caffeine (mg)  0  Brewed Coffee  Hot  4  0  260  1  Caffe Latte  Hot  100  14  75  2  Caffe Mocha  Hot  170  27  95  3  Cappuccino  Hot  60  8  75  4  Iced Brewed Coffee  Cold  60  15  120  5  Chai Latte  Hot  120  25  60

Unnamed: 0,Drink,Type,Calories,Sugar (g),Caffeine (mg)
0,Brewed Coffee,Hot,4,0,260
1,Caffe Latte,Hot,100,14,75
2,Caffe Mocha,Hot,170,27,95
3,Cappuccino,Hot,60,8,75
4,Iced Brewed Coffee,Cold,60,15,120
5,Chai Latte,Hot,120,25,60


1. Individuals in the data set?
    
   * Answer: All Ben's Beans drinks.
   
   
2. Variables in the data set?
   
   * Answer: Type, Calories, Sugar(g), Cafeine(mg)

[freeCodeCamp.org](https://www.youtube.com/watch?v=xxpc-HPKN28&t=13s), [Khan Academy](https://www.khanacademy.org/math/ap-statistics/analyzing-categorical-ap/analyzing-one-categorical-variable/v/identifying-individuals-variables-and-categorical-variables-in-a-data-set)

Question 01: Data: The number 40 000 is a piece of data, as is the name Iqbal Ahmed. Without anything else to
help us, these two items of data are meaningless.

Information: If we now say that "Iqbal Ahmed is a teacher" and "$40 000 is a teacher’s salary’, the
data is given meaning or context, and makes more sense to us.

Knowledge: builds on the information. Knowledge is ‘Iqbal Ahmed is a teacher and he earns
$40 000 per year’.



Question 02: 5, 10, 15, 20 are items of data. Explain how these could become information and what
knowledge could be gained from them.