# Lesson 1: Introduction to Python Libraries for Data Analytics

This lesson introduces Python libraries and explains why they are essential
for working with real-world data. We will compare built-in libraries with
external libraries and understand how data analysis becomes easier using
specialized tools.

## Lesson Objectives

By the end of this lesson, you should be able to:

- Understand what Python libraries are
- Distinguish between built-in and external libraries
- Explain why libraries are critical for data analytics
- Load and inspect a dataset using Pandas


## What is a Python Library?

A Python library is a collection of reusable code that helps programmers
perform common tasks efficiently.

Instead of writing complex logic from scratch, libraries allow us to focus
on solving the actual problem. In data analytics, libraries handle tasks like
reading datasets, cleaning data, performing calculations, and creating plots.

Without libraries, analysing large or messy datasets would be impractical.


## Built-in vs External Libraries

### Built-in Libraries
These libraries come pre-installed with Python and are available immediately.

Examples:
- `math` – mathematical operations
- `csv` – reading CSV files
- `statistics` – basic statistical calculations

### External Libraries
External libraries must be installed separately but provide powerful
functionality for data analysis.

Examples:
- `numpy` – numerical computing
- `pandas` – data manipulation and analysis
- `matplotlib` – data visualization


In [1]:
import math

numbers = [4, 9, 16, 25]

square_roots = [math.sqrt(n) for n in numbers]
square_roots


[2.0, 3.0, 4.0, 5.0]

Built-in libraries are useful, but they are limited when working with large
datasets. Reading files, filtering rows, or summarizing data quickly becomes
difficult using only built-in tools.

This is where external libraries like Pandas become essential.


In [2]:
import csv

with open('../data/sample_jobs.csv', 'r') as file:
    reader = csv.reader(file)
    for row in reader:
        print(row)


FileNotFoundError: [Errno 2] No such file or directory: '../data/sample_jobs.csv'

### Limitations of using `csv`

- Data is returned as plain lists
- No automatic column handling
- Difficult to filter or analyze efficiently

For real data analytics tasks, this approach is not scalable.


In [None]:
import pandas as pd

df = pd.read_csv('../data/sample_jobs.csv')
df.head()


## Why Pandas is Widely Used in Data Analytics

Pandas provides a tabular structure called a DataFrame, which makes data
easy to inspect, filter, and manipulate.

With Pandas, we can:
- Access columns directly
- Handle missing values
- Perform aggregations
- Prepare data for visualization or modeling


In [None]:
df.info()


## Common Python Libraries for Data Analytics

| Library | Primary Use |
|-------|------------|
| NumPy | Numerical and array operations |
| Pandas | Data cleaning and manipulation |
| Matplotlib | Basic plotting |
| Seaborn | Statistical visualizations |
| SciPy | Scientific computations |


## Installing External Libraries

External libraries can be installed using pip:

```bash
pip install pandas numpy matplotlib seaborn


### End of Lesson 1

In the next lesson, we will explore how Object-Oriented Programming (OOP)
concepts can be applied to organize and manage data-related code effectively.
