# Introduction to Data Science

Data science is an interdisciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from structured and unstructured data.

It combines elements of statistics, computer science, and domain expertise to solve real-world problems.

In this notebook, we explore the fundamentals of data science in a structured, slide-based format.

- What data science is
- Why it matters
- How it is used in practice

$$new_page$$

## The Data Science Lifecycle

The data science process typically follows a structured lifecycle:

1. Problem Definition
2. Data Collection
3. Data Cleaning
4. Exploratory Data Analysis (EDA)
5. Modeling
6. Evaluation
7. Deployment

Each stage is iterative and may require revisiting earlier steps as new insights emerge.

This lifecycle helps ensure that insights are reliable, reproducible, and actionable.

$$new_page$$

## Data Types and Sources

Data can be broadly categorized into:

- **Structured data** (tables, databases, spreadsheets)
- **Semi-structured data** (JSON, XML, logs)
- **Unstructured data** (text, images, audio, video)

Common data sources include:
- Databases
- APIs
- Web scraping
- Sensors and IoT devices
- User-generated content

Understanding your data source is critical before analysis begins.

$$new_page$$

## Exploratory Data Analysis (EDA)

EDA is the process of analyzing datasets to summarize their main characteristics.

Typical EDA tasks include:
- Understanding distributions
- Identifying missing values
- Detecting outliers
- Exploring relationships between variables

EDA often uses visualizations such as:
- Histograms
- Box plots
- Scatter plots

This step helps guide modeling decisions and prevents faulty assumptions.

$$new_page$$

## Example: Simple Python Code

Below is a basic Python example demonstrating data loading and inspection.

```python
import pandas as pd

# Load dataset
df = pd.read_csv("data.csv")

# Inspect data
print(df.head())
print(df.describe())