#**Python Pandas - Introduction**
Pandas is an open-source Python Library providing high-performance data manipulation and analysis tool using its powerful data structures.

Using Pandas, we can accomplish five typical steps in the processing and analysis of data, regardless of the origin of data — load, prepare, manipulate, model, and analyze.

*Python with Pandas is used in a wide range of fields including academic and commercial domains including finance, economics, Statistics, analytics, etc.*

#Key Features of Pandas
1.   Fast and efficient DataFrame object with default and customized indexing.
2.   Tools for loading data into in-memory data objects from different file formats.
3.   Data alignment and integrated handling of missing data.
4.   Reshaping and pivoting of date sets.
5.   Label-based slicing, indexing and subsetting of large data sets.
6.   Columns from a data structure can be deleted or inserted.
7.   Group by data for aggregation and transformations.
8.   High performance merging and joining of data.
9.   Time Series functionality.

#Python Pandas - Environment Setup

Standard Python distribution doesn't come bundled with Pandas module. A lightweight alternative is to install NumPy using popular Python package installer, ***pip.***
**pip install pandas**
If you install Anaconda Python package, Pandas will be installed by default

#Introduction to Data Structures

Pandas deals with the following three data structures −

1. Series
2. DataFrame
3. Panel

These data structures are built on top of Numpy array, which means they are fast

Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). The axis labels are collectively called index.

DataFrame: is a two-dimensional array with heterogeneous data. 
The table represents the data of a sales team of an organization with their overall performance rating. The data is represented in rows and columns. Each column represents an attribute and each row represents a person.

In [1]:
import pandas as pd
import numpy as np

In [14]:
#pandas.Series( data, index, dtype, copy)
s = pd.Series()  
print(s)

Series([], dtype: float64)


  s = pd.Series()


In [15]:
data = np.array(['a','b','c','d'])
s = pd.Series(data)
print(s)

0    a
1    b
2    c
3    d
dtype: object


In [7]:
data = np.array(['a','b','c','d'])
s = pd.Series(data,index=[100,101,102,103])
print(s)

100    a
101    b
102    c
103    d
dtype: object


Create a Series from dict:
A dict can be passed as input and if no index is specified, then the dictionary keys are taken in a sorted order to construct index. If index is passed, the values in data corresponding to the labels in the index will be pulled out.

In [8]:
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data)
print(s)

a    0.0
b    1.0
c    2.0
dtype: float64


Observe − Dictionary keys are used to construct index.

In [9]:
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data,index=['b','c','d','a'])
print(s)

b    1.0
c    2.0
d    NaN
a    0.0
dtype: float64


In [10]:
#Create a Series from Scalar
s = pd.Series(5, index=[0, 1, 2, 3])
print(s)

0    5
1    5
2    5
3    5
dtype: int64


In [16]:
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

#retrieve the first element
print(s['a'])

1


In [13]:
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

#retrieve the first three element
print(s[:3])

a    1
b    2
c    3
dtype: int64
