# Statistics and Mathematics in Data Science

**Prepared by:** Timothy Jonah E. Borromeo<br>
<b><u><i>Introduction to Data Science</b></u></i><br><br>

## Vital Role of Statistics and Mathematics in Data Science

Data Science is a multifaceted discipline, encompassing aspects of statistics, computer science, and domain knowledge. Statistics and Mathematics serve as the foundational pillars of Data Science, providing a rigorous framework for making sense of data, ensuring that insights are derived systematically and accurately. <br>

They enable data scientists to quantify uncertainty and variability through `probabilty theory` and `statistical ineference`, allowing for rigorus `hypothesis testing` and robust model building. Mathematical principles such as `calculus` and `linear algebra` are essential for developing `optimization` algorithms and understanding the inner workings of `machine learning` models. Moreover, `descriptive` and `inferential statistics` facilitate data summarization, visualization, and the reduction of complex, high-dimensional data into more manageable forms.

## What types of <i>math</i> do Data Scientists need to know?

Data Scientists utilize mathematics to:
- Understand and use Machine Learning algorithms
- Analyze datasets from various sources
- Identify patterns in data
- Forecast trends and growth

> Data Scientists utilizes mathematical fucnctions to perform <u>data analysis</u> and apply <u>machine learning</u> like clustering, regression, and classification.

<!-- Data Scientists use three main types of maths — <b><u>Linear Algebra, Calculus, and Statistics</u></b> -->
## Key Concepts

### Core Statistical Foundations
- **Descriptive Statistics**
  * Descriptive statistics summarize and describe the main features of a dataset. They provide a "snapshot" of the data’s distribution, central tendency, and variability.
    - > **Measures of Central Tendency:**<br>
        Mean: Average value. Sensitive to outliers.<br>
        Median: Middle value when data is ordered. Robust to outliers.<br>
        Mode: Most frequent value. Useful for categorical data
      
- **Inferential Statistics**
  * Inferential statistics generalize conclusions from a sample to a population, quantifying uncertainty through probability.
    - > Central Limit Theorem (CLT)<br>
      > Confidence Intervals<br>
      > Hypothesis Testing
- **Regression Analysis**
  * Regression models relationships between a dependent variable (y) and one or more independent variables (X).
    - > *Linear Regression*<br>
      > *Logistic Regression*<br>
      > *Regularization*
- **Bayesian Statistics**
  * Bayesian statistics updates beliefs (probabilities) as new evidence is observed, using Bayes’ theorem.
    - > *Prior Distribution*: Initial belief about a parameter (e.g., "30% of emails are spam").<br>
*Likelihood*: Probability of observing the data given the parameter (
P
(
data
∣
θ
)
P(data∣θ)).<br>
*Posterior Distribution*: Updated belief after observing data (
P
(
θ
∣
data
)
P(θ∣data))<br>
*Markov Chain Monte Carlo (MCMC)*: Computational method to approximate complex posteriors.
- **Time Series Analysis**
  * Time series analysis models data points collected sequentially over time.
    - > *Trend*: Long-term upward/downward movement (e.g., rising global temperatures).<br>
        *Seasonality*: Regular patterns (e.g., holiday sales spikes).<br>
        *Stationarity*: Mean and variance constant over time (required for ARIMA models).<br>
        *Autocorrelation*: Correlation of a series with its lagged values.


### Core Mathematical Foundations
- **Probabilty**
  * Probabilty quantifies uncertainty by modeling the likelihood of events or outcomes. It provides the language to describe randomness of data.
    - > Distributions (normal, binomial, Poisson), Bayes’ theorem, random variables, expectation, variance, covariance. 
- **Linear Algebra**
  * Linear algebra deals with vectors, matrices, and linear transformations. It underpins how data is structured and manipulated computationally.
    - > Vectors, matrices, eigenvalues, eigenvectors, matrix factorization (SVD, PCA).
- **Calculus**
  * Calculus studies rates of change (derivatives) and accumulation (integrals). It is essential for optimizing models and understanding how variables interact.
    - >  Derivatives, integrals, gradients, optimization (e.g., gradient descent).
- **Discrete Mathematics**
  * Discrete math deals with countable, non-continuous structures. It is foundational for algorithm design and combinatorial analysis.
    - > Graph theory, combinatorics, logic.

# Machine Learning Fundamentals

Machine Learning lies at the core of AI, enabling systems to learn from given data and improving predictive and classification perfomances over time. By building a strong foundation in mathematics and statistics, you can gain a deeper understanding of data science and machine learning.

## How they work together

Mathematics provides the language and tools for representing and manipulating data, while statistics provides the methods for analyzing and interpreting that data. In machine learning, these two areas are combined to build models that can learn from data and make predictions.

# Activity (Individual)

???

# References

[Mathematics and Data Science: Its Role and Relevance - pickl.ai](https://www.pickl.ai/blog/mathematics-and-data-science-its-role-and-relevance/)<br>
[The Vital Role of Statistics and Mathematics in Data Science and AI Engineering](https://www.linkedin.com/pulse/vital-role-statistics-mathematics-data-science-ai-abdul-qadir-4swzc/)<br>
[How much math is involved in Data Science?](https://www.multiverse.io/en-GB/blog/how-much-math-data-science)<br>
[Introduction to Mathematics and Statistics for Data Science and machine learning - medium.com](https://medium.com/@avikumart_/introduction-to-mathematics-and-statistics-for-data-science-and-machine-learning-2560403baf31)