# Table of Contents

---

- What are some characteristics of x data?  
- What should I disregard?
- What tools (libraries) exist to help analyze the data w/ code?
- Patterns/Components of TSA
- References

---

---

---

### 1. What are some characteristics of x data?

- **Mean** : **average**

- **Median** : **middle** # when ordered from least to greatest

- **Mode** : **most** that occurs

- **Range** : largest value - smallest value to return a scalar

- **Outliers** : not normal; very different from what's already there. An anomaly

- **Standard deviation** : $ \sigma $
    - **variance** (spread) : $ \sigma{^2} $

- **Types of standard/common distributions** 
    - [5.2a | P 20/27, 5.2b | P 621/647, 5.2c | P 559/287, 5.2d | P 437/442, 5.2e | P A1/622, 5.2f | P 373/398]
    
- **Lookahead** : to denote any knowledge of the future as in to find out something about the future earlier than you ought to know it. Should be prevented. [5.1a | P 28/46, 5.1a | P 67/85]

- **Upsampling** : [5.1a | P 52/70]
    1. Think : Low --> High; When I play pool, I have to 1st rack the balls which increases the "frequency"/amount of balls being on the table
    -  **Downsampling** : reduce frequency of data [5.1a | P 52/70]
    1. Think : High --> Low; When I play pool, I hit the balls to reduce the "frequency"/amount of any balls being on the table
    
- **Periodogram** : 
   - [5.1a | P 246/264]

---

#### 1.1 Table Headings for various datasets
- **Sensor(s)** : the instrument(s),(hardware(s),software(s),device(s) used to capture the measurements of some TS phenomena  

- **Dataset** : a collection of TS data - phenomena, saved measurements with timestamps
    
- **Observation** : a single point along both the x-axis (time) and y-axis (measurement), hence each observation contains a timestamp and some numerical value
    - Observations : more than one **observation**; typically a list (of entire dataset)
- **Phenomena** : the observed variable(s)
- **Timestamps** : the time in which the sensor captured a measurement from the phenomena    
- **Data Quality** : the condition in which the measurements are presented. $ \exists $ errors (ie : misreadings, NaN, negative #s when all should be positive)? 
    - How much data cleaning should we do?
    - Dataset Quality : the condition in which the dataset itself is presented. What file formats can we load with?
        
- **Measurement** : the numerical value of the observed phenomenon which is a real # (y-axis)
    - **Real #** : any possible # which includes
        - **Natural** : Whole #s; ie : 1, 882, 4797, N, so only positive #s
        - **Integers** : **Natural** (or positive) #s along w/ negative #s; ie : -N, -4797, -882, -1, 0, 1, 882, 4797, N
        - **Rational** : Can be divided so fractions/decimal-based #s

    - **Continuous** : 
        - A phenomenon whose value is **obtained by measuring** - can measure at anytime
        - Any numeric value - **real #s**
            - *divided into smaller increments*
        - There are an **infinite number of possible values between any two values** [5.1g | 24/28];
            - has no end
        - Ex :
            - temperature
            - weight
            - height
            - age
            - blood sugar level when checking diabetes
            - stock value/price per share
            - grade in course
        - Think : "If I was to count the #, how long would it take?"

    - **Discrete** : 
        - The phenomenon has a **countable number** of possible values - can count the specific #
        - Counts are nonnegative integers (so ONLY **natural #s**) 
            - *cannot be divided into smaller increments*
        - There are a **finite number of possible values between any two values** [5.1g | 29/33];
            - has an end
        - Ex :
            - #people in a room : can't have 5.7 people in a room
            - #cars : can't have 3.8 cars
            - #calls in call center
            - #building on campus
            - #retail/sales transactions
            - #email advertisements from WeBuyBlack
            - #earthquakes in SolCal
            - #stocks
            - #heart beats when exercising
            - #students in class
            - #students in research lab
        - Think : "If I was to count the #, how long would it take?" and "number of..."
            
- **Frequency** : 
    - Refers to the time interval between the observations of a TS [5.1d]
    - ~Rate at which something happens or is repeated [5.1h]~
    - Returns a numerical value with a unit of time
    - Ex [5.1d]
    
- **Periodicity** : 
    - Observations occur at what unit of time (secs, mins, hrs, etc)
    - Returns the unit of time
    - Ex on [5.1a | P 240/258]

- **Sampling** : occurs every so often

- **Percent** : % = (f / n) x 100, where n is a sum all the phenomenons

- **Pattern** : see below

---

---
### 2. What should I disregard?
---

---

### 3. What tools (libraries) exist to help analyze the data w/ code?

1. [Python](https://docs.python.org/3/contents.html) : high-level, general-purpose programming language.
2. [Numpy](https://numpy.org/doc/stable/user/index.html)
3. [Scipy](https://scipy.github.io/devdocs/tutorial/stats.html)
4. Pandas : A data frame analysis package in Py. Name refers to “panel data” which is what social scientists call time series data. Based on tables of data with row and column indices. Can index by time period, downsample, etc. [1a | P 31/49]
5. [Scikit-learn](https://scikit-learn.org/stable/user_guide.html)
6. [Matplotlib](https://matplotlib.org/stable/users/index.html)

---

---

### 4. Patterns/Components of TS Data

- [See mathematical representation](https://github.com/Brinkley97/time_series_analysis_basics/blob/main/overview.ipynb)

---

1. ***T*rend** : *T*ypical
    1. Exists when there is a long-term increase or decrease in the data & does not have to be linear. [5.1b]
    2. Shows the variation of data with time or the frequency of data. Can see how data increases or decreases over time as well as if it's stable. [5.1c]
    3. When a TS exhibits an upward or downward movement in the long run, it is said to have a general trend [5.1e | P 22/34] 
    4. Observed when there is an increasing or decreasing slope observed in the time series [5.1f | 5. Patterns in a TS]
    
$\newline$

2. **Seasonal** : patterns are of fixed calendar based frequencies
    1. Occurs when a TS is affected by **seasonal factors** (ie : time of the year or the day of the week). Always of a fixed and known period. [5.1b]
    2. Used to find the variations which occur at **regular intervals** of time. [5.1c]
    3. Also see [5.1a | P 63/81]
    4. Manifests as **repetitive** and period variations in a TS [5.1e | P26/38]
        - Might have a fixed period of variations
        - Observed within the same year and corresponds to annual divisions of time such as seasons, quarters, and periods of festivity and holidays and so on
            - ie : Peaks and troughs in the monthly sales volume of seasonal goods such as Christmas gifts or seasonal clothing
        - The average periodicity for season changes would be smaller compared to cyclical bc $\exists$ less observations
    5. Observed when there is a **distinct repeated pattern observed between regular intervals due to seasonal factors** [5.1f | 5. Patterns in a TS]

$\newline$

3. **Cyclic** : patterns are not of fixed calendar based frequencies [5.1f | 5. Patterns in a TS]
    1. Occurs when the data exhibit rises and falls that are **not of a fixed frequency** [5.1b]
    2. Oscillations (Os) in TS which last for more than a year. [5.1c]
        -  (Os) : Movement back and forth at a regular speed.
    3. Also see [5.1a | P 63/81]
    4. Are movements observed after every few units of time, but they occur less frequently than seasonal fluctuations [5.1e | P31/43]
        - Might not have a fixed period of variations
        - The average periodicity for cyclical changes would be larger (most commonly in years) compared to seasonal bc $\exists$ more observations
        - Manifests as repetitive crests and troughs
            - ie : Economics and business often show cyclical changes that correspond to usual business and macroeconomic cycles such as periods of recessions followed by every of boom, but are separated by few years of time span.
    5. Happens when the rise and fall pattern in the series does **not happen in fixed calendar-based intervals** [5.1f | 5. Patterns in a TS] 

$\newline$

4. **Irregularity** : Outliers
    1. Purely random and usually caused by unforeseeable circumstances. [5.1c]
    2. See [5.1a | P54/72]
    3. Unexpected variations : Are stochastic and cannot be framed in a mathematical model for a definitive future prediction [5.1e | P32/44]
        - This type of error is due to lack of information about explanatory variables that can model these variations or due to presence of a random noise

$\newline$

5. ***S*tationary** : *S*ame.  
    1. Remain the same anywhere in the series. [5.1c]
    2. See **ergodic** [5.1a | P 242/260]
    
---

---

### 5. References

----
#### 1. Sources

a. Book : [Practical TSA - Prediction with Statistics and Machine Learning](https://www.oreilly.com/library/view/practical-time-series/9781492041641/ch01.html#:~:text=One%20of%20the%20pioneers%20of,how%20and%20when%20to%20trade.) by Aileen Nielsen

b. Book : [Forecasting: Principles and Practice 3rd Ed](https://otexts.com/fpp3/) by Rob J Hyndman and George Athanasopoulos

c. Article : [Understanding TSA in Python](https://www.simplilearn.com/tutorials/python-tutorial/time-series-analysis-in-python#what_are_the_different_components_of_time_series_analysis) by Simplilearn

d. Online Glossary : [Glossary of Statistical Terms : PERIODICITY - IMF](https://stats.oecd.org/glossary/detail.asp?ID=2041) by OECD

e. Book : [Practical TSA](https://www.packtpub.com/product/practical-time-series-analysis/9781788290227) by Dr. Avishek Pal, Dr. PKS Prakash

f. Website : [TSA in Python – A Comprehensive Guide with Examples](https://www.machinelearningplus.com/time-series/time-series-analysis-python/) by Selva Prabhakaran

g. Book : [Introduction to Statistics: An Intuitive Guide for Analyzing Data and Unlocking Discoveries](https://statisticsbyjim.com/basics/introduction-statistics-intuitive-guide/) by Jim Frost

h. Online Glossary : [Glossary of Statistical Terms : FREQUENCY](https://stats.oecd.org/glossary/detail.asp?ID=3655) by OECD

---

---
#### 2. Further Explore

a. Book : Statistical Prediction Analysis by J. Altchison & I.R. Dunsmore

b. Book : Statistical Inference 2nd Ed by George Casella & Roger L. Berger

c. Book : Statistical Decision Theory and Bayesian Analysis 2nd Ed by James O. Berger

d. Book : Mathematical Statistics with Applications 8th Ed by Irwin Miller & Marylees Miller

e. Book : Mathematical Statistics and Data Analysis 3rd Ed by John A. Rice

f. Book : Bayesian Statistics An Introduction by Peter M. Lee

g. Paper : [On a Method of Investigating Periodicities in Disturbed Series, with Special Reference to Wolfer's Sunspot Numbers](https://www.jstor.org/stable/91170) by G. Udny Yule
   - Udny Yule’s seminal paper, one of the first applications of autoregressive moving average analysis to real data, illustrates a way to remove the assumption of periodicity from analysis of a putatively periodic phenomenon. [5.1a | 14/32]

h. Paper : [Understanding the Lomb-Scargle Periodogram](https://arxiv.org/abs/1703.09824) by Jacob T. VanderPlas
   - This expansive article provides an intuitive understanding of periodicity estimators generally and the Lomb-Scargle method in particular. [5.1a | P 258/276]

i. Book : [5.1b]
   - Practical examples of how to use dynamic regression to supplement traditional statistical forecasting models with a series of alternative models for seasonality when SARIMA is not expected to be a good fit—usually because the periodicity is too complex or the periods are too long relative to the amount of data or computing resources available. [5.1a | P 400/418]