# ATSC 100 Notes
**This document contains my notes for my DSCI 100 course, complete with code breakdowns and explinations.**

In [1]:
suppressPackageStartupMessages({
    library(tidyverse)
    library(repr)
    library(tidymodels)
})

set.seed(123)

## General DSCI Info

Data science is the use of reproducible and audible processes to get insight from data.

**Problem Types**

1. Classification: Predict a new class/category for a new observation.
2. Prediction: Predict a value for a new observation.
3. Clustering: Find previously unknown unlabelled subgroups in data.
4. Inference: Estimate an average or proportion for a representative sample.

**Question Types**

1. Descriptive: asks bout summarized characteristics of data, no interpretation.
    - Ex: How many ppl live in each province in Canada?
2. Exploratory: asks whether there are patterns, trends or relationships within a single data set; often to develop a hypothesis for further studies.
    - Ex: Does political party voting change with indicators of wealth in a set of 2k ppl in Canada?
3. Predictive: a question that asks about predicting a *measurement* / *label* for observations. Focus on what things predict what outcome, NOT what causes the outcome.
    - Ex: What political party will a person vote for in next election.
4. Inferential: A question that looks for patterns, trends or relationships in a single data set **and** a quantification of how applicable these findings are to the wider population. 
    - Ex: Does political party voting change with indicators of wealth for *all people* in Canada?


### Loading data into R: 
- Load library: 
    - `library(tidyverse)`
- Use `read_csv()` with only the file path argument string if:
    - The data frame has:
        - headers 
        - uses , as the deliminator 
        - does not have row names 

### Filter & Select 
- Filter: filter rows based on values.
    - `filter(tbl, logicalStatementThatReturnsTrueOrFalse)`
- Select: filter columns based on values.
    - `select(tbl, logicalStatementThatReturnsTrueOrFalse)`

### Arrange & Slice
- Arrange: order observations up/down based on values in a column.
    - `arrange(tbl, by=desc(col_name))`
    - Ascending is by default, above we specify descending order.
- Slice: selects rows based on row nubmer.
    - `slice(tbl, 1:10)`
    - Above columns 1 through 10 inclusive were selected.

In [2]:
random_data <- tibble(
  Col1 = 1:20,        
  Col2 = runif(20),          # Random uniform distribution
  Col3 = sample(letters, 20, replace = TRUE),  # Random letters
  Col4 = rpois(20, lambda = 2),               # Random Poisson distribution
  Col5 = rnorm(20, mean = 5, sd = 2),          # Random normal with mean and sd
  Col6 = runif(20, min = 10, max = 20),       # Random uniform with specified range
  Col7 = sample(1:100, 20, replace = TRUE)    # Random integers between 1 and 100
)

random_data<-tibble(random_data)
random_data



Col1,Col2,Col3,Col4,Col5,Col6,Col7
<int>,<dbl>,<chr>,<int>,<dbl>,<dbl>,<int>
1,0.28757752,y,1,6.790251,14.82902,23
2,0.78830514,y,1,6.756267,18.9035,79
3,0.40897692,i,1,6.643162,19.14438,85
4,0.8830174,c,2,6.377281,16.08735,37
5,0.94046728,h,1,6.107835,14.1069,8
6,0.0455565,z,4,4.876177,11.47095,51
7,0.52810549,g,0,4.388075,19.353,74
8,0.89241904,j,2,4.239058,13.01229,50
9,0.55143501,i,3,3.610586,10.60721,98
10,0.45661474,s,0,4.584165,19.47727,74
