# What is Pandas?

Pandas is a powerful open-source Python library used for data manipulation, analysis, and cleaning. It provides two primary data structures:

1. **Series**: One-dimensional labeled array.
2. **DataFrame**: Two-dimensional labeled data structure (like a table in a database or an Excel spreadsheet).

It is widely used in data science and analytics for tasks like:

1. Importing data from various file formats (CSV, Excel, SQL, etc.)
2. Cleaning and transforming data
3. Grouping and summarizing
4. Merging and joining datasets
5. Time series analysis

---

# What is NumPy?

NumPy (Numerical Python) is a foundational Python library for numerical computing. It provides:

1. **ndarray**: N-dimensional array object
2. Fast operations on arrays, including element-wise operations, linear algebra, statistical operations, etc.

It’s known for speed and efficiency, especially with large arrays and matrices.

---

## **Key Differences Between Pandas and NumPy**

| Feature                   | **NumPy**                                       | **Pandas**                                          |
|---------------------------|--------------------------------------------------|-----------------------------------------------------|
| **Main Data Structure**   | `ndarray` (multi-dimensional array)             | `Series` (1D), `DataFrame` (2D)                     |
| **Use Case**              | Numerical computations                          | Data analysis and manipulation                      |
| **Data Types Supported**  | Homogeneous (all elements must be same type)    | Heterogeneous (each column can be a different type) |
| **Indexing**              | Integer-based indexing                          | Label-based and integer indexing                    |
| **Ease of Use**           | Low-level; requires more code for data handling | High-level; easier data wrangling and analysis      |
| **Speed**                 | Faster for numerical operations                 | Slightly slower due to overhead                     |
| **Missing Data Support**  | Limited                                          | Built-in support (e.g., `NaN` handling)             |

---

### ***When to Use What?***

- **Use NumPy for**: Fast numerical operations, matrix algebra, or when working with homogeneous numeric data.
- **Use Pandas for**: Real-world data analysis involving labeled, heterogeneous, or missing data.

---

## **The Primary Data Structures in Pandas Are:**

### 👉 Series

- A one-dimensional labeled array.
- Can hold any data type (integers, strings, floats, Python objects, etc.).
- Has both values and an index.
    ```python
    import pandas as pd

    s = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
    print(s)
    ```

### 👉 DataFrame

- A two-dimensional labeled data structure (like a table with rows and columns).
- Can contain different data types in different columns.
- Has both row and column indexing.
    ```python
    import pandas as pd

    data = {
        'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35]
    }
    df = pd.DataFrame(data)
    print(df)
    ```

In [1]:
import pandas as pd

In [2]:
# Create Series

sr = pd.Series([10,20,30,40,50])
sr

0    10
1    20
2    30
3    40
4    50
dtype: int64

In [3]:
sr.dtype

dtype('int64')

In [4]:
sr.values

array([10, 20, 30, 40, 50])

In [5]:
sr.index

RangeIndex(start=0, stop=5, step=1)

In [6]:
sr.name

In [7]:
sr.name = 'calories'

In [8]:
sr.name

'calories'

In [9]:
# Indexing

sr[0]

np.int64(10)

In [10]:
sr[0:2]

0    10
1    20
Name: calories, dtype: int64

In [11]:
sr[3]

np.int64(40)

In [12]:
# iloc -> location based indexing

sr.iloc[3]

np.int64(40)

In [13]:
sr.iloc[[1, 3, 4]]

1    20
3    40
4    50
Name: calories, dtype: int64

In [14]:
index = ['apple', 'banana', 'grapes', 'orange', 'strawberry']

In [15]:
sr.index = index
sr

apple         10
banana        20
grapes        30
orange        40
strawberry    50
Name: calories, dtype: int64

In [16]:
sr['grapes']

np.int64(30)

In [17]:
# sr.iloc['grapes']   # will throw error, because iloc will only work with number based indexes
sr.iloc[2]

np.int64(30)

In [18]:
# loc -> label based indexing

sr.loc['grapes']

np.int64(30)

In [19]:
# In label based indexing both start and stop values are included in output
sr.loc[['apple', 'banana']]

apple     10
banana    20
Name: calories, dtype: int64

In [20]:
sr['apple':'orange']

apple     10
banana    20
grapes    30
orange    40
Name: calories, dtype: int64

In [21]:
sr

apple         10
banana        20
grapes        30
orange        40
strawberry    50
Name: calories, dtype: int64