## Introduction to Pandas
- **Pandas** is a Python library used for data manipulation and analysis.
- It provides two primary data structures:
  - **Series**: One-dimensional labeled array.
  - **DataFrame**: Two-dimensional labeled data structure (like a table).

---

## Key Features of Pandas
1. **Data Manipulation:**
   - Easy handling of missing data.
   - Supports operations like filtering, grouping, merging, and reshaping data.


2. **Data Cleaning:**
   - Tools to handle missing or inconsistent data effectively.


3. **Data Analysis:**
   - Powerful aggregation and summary statistics.


4. **Integration:**
   - Works well with other Python libraries like NumPy, Matplotlib, and Scikit-learn.

---

In [1]:
import numpy as np
import pandas as pd

In [2]:
pd.__version__

'1.4.2'

# Series in Pandas

## Definition:
- A **Series** in Pandas is a one-dimensional labeled array capable of holding data of any type (integer, float, string, etc.).
- Each element in a Series is associated with an **index** that allows for easy access and manipulation of the data.

---

## Key Features of a Series:
1. **Labeled Index:** Each data value is associated with an index, which can be customized.
2. **Homogeneous Data:** All elements in a Series must have the same data type.
3. **Supports Multiple Data Types:** Can hold integers, floats, strings, or objects.
4. **Efficient Operations:** Built-in functions allow for efficient and fast operations on the data.


In [3]:
myindex = ['USA', 'Canada', 'Mexico']
mydata = [1776, 1867, 1821]

In [4]:
myser = pd.Series(data=mydata) # Series creates the default numerical index for the data if we do not pass the index 

In [5]:
myser

0    1776
1    1867
2    1821
dtype: int64

In [6]:
type(myser)

pandas.core.series.Series

In [7]:
myseries = pd.Series(data=mydata, index=myindex)
print(myseries)

USA       1776
Canada    1867
Mexico    1821
dtype: int64


In [8]:
type(myseries)

pandas.core.series.Series

## Creating a Series from a Dictionary

In [9]:
ages = {'Luffy':20, 
        'Zoro': 21,
        'Sanji': 21
       } 

- Pandas automatically converts the dictionary keys into the index of the Series.
- The dictionary values become the data of the Series.

In [10]:
myser =  pd.Series(ages) 

In [11]:
myser

Luffy    20
Zoro     21
Sanji    21
dtype: int64

In [12]:
myser['Zoro']

21

In [13]:
q1 = {'Japan': 80, 'China': 450, 'India': 200, 'USA': 250}
q2 = {'Brail': 100, 'China': 500, 'India': 210, 'USA': 260}

In [14]:
sales_q1 = pd.Series(q1)
sales_q1

Japan     80
China    450
India    200
USA      250
dtype: int64

In [15]:
sales_q2 = pd.Series(q2)
sales_q2

Brail    100
China    500
India    210
USA      260
dtype: int64

In [16]:
# Access the data inside the sales_q1 series using the name index 
sales_q1['Japan']

80

In [17]:
sales_q2.keys()

Index(['Brail', 'China', 'India', 'USA'], dtype='object')

# performing the Operation on the Two  Series 

result = sales_q1 + sales_q2

- This operation performs an element-wise addition between the two Series.
- `Pandas matches values by index. If an index exists in one Series but not in the other, 
  the result for that index will be NaN (Not a Number).`
- Brail and Japan have no matching index in the other Series, so their values are NaN.

In [18]:
sales_q1 + sales_q2

Brail      NaN
China    950.0
India    410.0
Japan      NaN
USA      510.0
dtype: float64

`sales_q1.add(sales_q2, fill_value = 0)`

- The `.add()` function provides an option to handle missing indices using the fill_value parameter.
- If an index is missing in one Series, its value is replaced with `fill_value` (in this case, 0).
- For missing indices:
    - Japan: 80 + 0 = 80.0 (as Japan is missing in sales_q2).
    - Brail: 0 + 100 = 100.0 (as Brail is missing in sales_q1).


In [19]:
sales_q1.add(sales_q2, fill_value = 0)

Brail    100.0
China    950.0
India    410.0
Japan     80.0
USA      510.0
dtype: float64