# Pandas-DataFrame And Series
Pandas is a powerful data manipulation library in Python, widely used for data analysis and data cleaning. It provides two primary data structures: Series and DataFrame. A Series is a one-dimensional array-like object, while a DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns).

- Key Features  
    a. Works seemlessly with structured data formats like CSV, excel  
    b. Handles missing values easily  
    c. Built on numpy for fast computation
  
- Why use Pandas?  
    a. Performance: Handles millions of rows efficiently  
    b. Ease of Use: Beginner-friendly syntax for cleaning and transforming data  
    c. Integration: Works well with libraries like matplotlib and Scikit-Learn   

- Real life examples of Pandas in Action  
    a. Finance - Analyzing time-series data like stock prices to identify market trends  
    b. Retail - Tracking inventory and finding the most sold products in a store  
    c. Healthcare - Analysing patient records and outcomes from clinical trials

Overlap between NumPy and Pandas  
NumPy: ndarray → can be 1D (vector), 2D (matrix), nD (tensor).  
Pandas:  
Series → basically a 1D labeled array.  
DataFrame → basically a 2D labeled array (can be seen as a special tensor).  
So why not just stick to NumPy?

1. It adds labels (not just numbers).
2. Handles heterogeneous data (different types per column).
3. Provides high-level operations (filter, group, merge, pivot) that NumPy doesn’t.
    - NumPy = low-level numerical operations (fast, vectorized).
    - Pandas = high-level data analysis operations (built on top of NumPy).

You can implement filtering, merging, pivot-like reshaping in NumPy, but Pandas makes it more expressive, less error-prone, and directly tied to column/row labels.

Key Differences between Data Manipulation and Data Analysis

In [4]:
# Execute the cell to know the Differences
import pandas as pd
data = [{'Data Manipulation' : 'Preparing and Cleaning Data', 'Data Analysis' : 'Extracting insights from prepared data'}, 
{'Data Manipulation' : 'Organize and structure raw data', 'Data Analysis' : 'Find Patterns, trends and solve problems'}, 
{'Data Manipulation' : 'Fixing errors in student grade sheet', 'Data Analysis' : 'Analyzing which student scored the highest'}]
df = pd.DataFrame(data, index=['Focus','Goal','Example'])
from tabulate import tabulate
print(tabulate(df, headers="keys", tablefmt="fancy_grid"))

╒═════════╤══════════════════════════════════════╤════════════════════════════════════════════╕
│         │ Data Manipulation                    │ Data Analysis                              │
╞═════════╪══════════════════════════════════════╪════════════════════════════════════════════╡
│ Focus   │ Preparing and Cleaning Data          │ Extracting insights from prepared data     │
├─────────┼──────────────────────────────────────┼────────────────────────────────────────────┤
│ Goal    │ Organize and structure raw data      │ Find Patterns, trends and solve problems   │
├─────────┼──────────────────────────────────────┼────────────────────────────────────────────┤
│ Example │ Fixing errors in student grade sheet │ Analyzing which student scored the highest │
╘═════════╧══════════════════════════════════════╧════════════════════════════════════════════╛
