### Introduction
Pandas (stands for Python Data Analysis) is an open-source software library designed for data manipulation and analysis.

- Revolves around two primary Data structures: `Series (1D) and DataFrame (2D)`
- Built on top of NumPy, efficiently manages large datasets, offering tools for `data cleaning`, `transformation`, and `analysis`.
- Tools for working with time series data, including date range generation and frequency conversion. For example, we can convert date or time columns
  into pandasâ€™ datetime type using `pd.to_datetime()`, or specify `parse_dates=True` during CSV loading.
- Seamlessly integrates with other Python libraries like NumPy, Matplotlib, and scikit-learn.
- Provides methods like `.dropna()` and `.fillna()` to handle missing values seamlessly

Important Facts to Know :

- **DataFrames**: It is a two-dimensional data structure constructed with rows and columns, which is more similar to Excel spreadsheet.
- **pandas**: This name is derived for the term "panel data" which is econometrics terms of data sets.

`What is Pandas Used for?`
- Reading and writing data from various file formats like CSV, Excel and SQL databases.
- Cleaning and preparing data (handling missing values, filtering, removing duplicates).
- Merging, joining, and reshaping datasets.
- Performing statistical analysis and descriptive statistics.
- Visualizing data quickly.

#### Pandas Basics

`Pandas Introduction`

    Pandas is an open-source Python library used for data manipulation, analysis and cleaning. It provides fast and flexible tools to work with tabular data, similar to spreadsheets or SQL tables.

In [2]:
# installation
!pip install -q pandas

In [3]:
# import the library
import pandas as pd

In [8]:
# Data Structures in Pandas
# 1. Pandas Series
"""
A Pandas Series is one-dimensional labeled array capable of holding data of any type (integer, string, float, Python objects etc.). 
The axis labels are collectively called indexes. Series is created by loading the datasets from existing storage which can be a SQL database, 
a CSV file or an Excel file.
"""
import pandas as pd
import numpy as np

s = pd.Series()
print('pandas series', s)
data = np.array(['raj', 'kumar', 'malyala'])
print('pandas series data:\n', pd.Series(data))

pandas series Series([], dtype: object)
pandas series data:
 0        raj
1      kumar
2    malyala
dtype: object


Key Features of Pandas Series:
- Supports integer-based and label-based indexing.
- Stores heterogeneous data types.
- Offers a variety of built-in methods for data manipulation and analysis.

In [9]:
# Creating a Pandas Series
import pandas as pd

data = [1, 2, 3, 4]
 
ser = pd.Series(data)
print(ser)

0    1
1    2
2    3
3    4
dtype: int64


In [14]:
# Accessing elements of Series

"""
- Position-based Indexing - In this we use numerical positions similar to lists in Python.
- Label-based Indexing - This method also custom index labels assigned to elements.
"""
# 1. Position-based Indexing

import pandas as pd
import numpy as np

data = np.array(['g','e','e','k','s','f', 'o','r','g','e','e','k','s'])
ser = pd.Series(data)
print('original data serie:\n', ser)
print('Position-based Indexing:\n', ser[:5])

original data serie:
 0     g
1     e
2     e
3     k
4     s
5     f
6     o
7     r
8     g
9     e
10    e
11    k
12    s
dtype: object
Position-based Indexing:
 0    g
1    e
2    e
3    k
4    s
dtype: object


In [19]:
# label based indexting

import pandas as pd
import numpy as np

data = np.array(['g','e','e','k','s','f', 'o','r','g','e','e','k','s'])
ser = pd.Series(data,index=[10,11,12,13,14,15,16,17,18,19,20,21,22])
print('Actual data: \n', ser)
print(ser[16])

Actual data: 
 10    g
11    e
12    e
13    k
14    s
15    f
16    o
17    r
18    g
19    e
20    e
21    k
22    s
dtype: object
o
