# Background

Pandas is an open-source Python Library providing high-performance data manipulation and analysis tool using its powerful data structures. The name Pandas is derived from the word Panel Data – an Econometrics from Multidimensional data.

In 2008, developer Wes McKinney started developing pandas when in need of high performance, flexible tool for analysis of data.

Prior to Pandas, Python was majorly used for data munging and preparation. It had very little contribution towards data analysis. Pandas solved this problem. Using Pandas, we can accomplish five typical steps in the processing and analysis of data, regardless of the origin of data — load, prepare, manipulate, model, and analyze.

Python with Pandas is used in a wide range of fields including academic and commercial domains including finance, economics, Statistics, analytics, etc.

## Key Features of Pandas

- Fast and efficient DataFrame object with default and customized indexing.
- Tools for loading data into in-memory data objects from different file formats.
- Data alignment and integrated handling of missing data.
- Reshaping and pivoting of date sets.
- Label-based slicing, indexing and subsetting of large data sets.
- Columns from a data structure can be deleted or inserted.
- Group by data for aggregation and transformations.
- High performance merging and joining of data.
- Time Series functionality.

## Pandas deals with the following three data structures 

- Series
- DataFrame
- Panel

These data structures are built on top of Numpy array, which means they are fast.

Dimension & Description
The best way to think of these data structures is that the higher dimensional data structure is a container of its lower dimensional data structure. For example, DataFrame is a container of Series, Panel is a container of DataFrame.

|Data Structure|Dimensions|Description|
|---|---|---|
|Series|1|1D labeled homogeneous array, sizeimmutable.|
|Data Frames|2|General 2D labeled, size-mutable tabular structure with potentially heterogeneously typed columns.|
|Panel|3|General 3D labeled, size-mutable array.|

Building and handling two or more dimensional arrays is a tedious task, burden is placed on the user to consider the orientation of the data set when writing functions. But using Pandas data structures, the mental effort of the user is reduced.

#### Mutability
All Pandas data structures are value mutable (can be changed) and except Series all are size mutable. Series is size immutable.

Note − DataFrame is widely used and one of the most important data structures. Panel is used much less.

# Series
Series is a one-dimensional array like structure with homogeneous data. For example, the following series is a collection of integers 10, 23, 56, …

10	23	56	17	52	61	73	90	26	72
Key Points
Homogeneous data
Size Immutable
Values of Data Mutable

In [3]:
import pandas as pd
s = pd.Series()
s

Series([], dtype: float64)

## Creating the series from nd array


In [4]:
import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])  
s = pd.Series(data)
s
# By deafult index is applied = range(n); where n = len(array))-1

0    a
1    b
2    c
3    d
dtype: object

In [5]:
# now passing an arry with custom index
series2 = pd.Series(data, index=['a','b','c','d'])
series2

a    a
b    b
c    c
d    d
dtype: object

## Creating the series from Dictonary

In [6]:
dic = {'pen':10,'book':200,'copy':50}
series3 = pd.Series(data)
series3
d = pd.Series(data, dtype = str)

#### Checking data type

In [7]:
type(series3)

pandas.core.series.Series

In [8]:
type(series3[1])

str

## Creating the series from Scalar


In [9]:
series4 = pd.Series(1)
series4

0    1
dtype: int64

In [10]:
series5 = pd.Series(4,index=[0,1,2,3])
series5

0    4
1    4
2    4
3    4
dtype: int64

### Accessing data from Series with Position

In [12]:
s6[0]

'a'

Retriving rang of indexes

In [13]:
# Selecting first 2 indexes 0 and 1. it does not include 2th index
s[:2] 

0    a
1    b
dtype: object

In [14]:
s[:3]

0    a
1    b
2    c
dtype: object

In [15]:
# All the elements in Series 
# Same as 
s
s[:]

0    a
1    b
2    c
3    d
dtype: object

In [16]:
#it Gives all the elements to the end from 0th index
s[0:]

0    a
1    b
2    c
3    d
dtype: object

 ### Retriving in reverse order

In [17]:
# Reverse order index from -1
# and retrives elemnet in reverse order
s[-2:]

2    c
3    d
dtype: object

In [18]:
# This Gives error 
# s[-1]

### Selecting on the base of label


In [19]:
s1 = pd.Series([1,2,3,4,5], index=['a','b','c','d','e'])
s1['a']

1

In [20]:
# Selecting Multiple Elements
# Make use double [[]] brackets
s1[['a','c','d']]

a    1
c    3
d    4
dtype: int64

### Droping the elements from a series

In [30]:
s1.drop('a')
s1
# we can see element is no droped from the original series

a    1
b    2
c    3
d    4
e    5
dtype: int64

In [33]:
s2 = s1.drop('a')
s2
# drop fucntion returns the series back after removing the specified columns

b    2
c    3
d    4
e    5
dtype: int64

In [43]:
s = pd.Series([1, 3, 5, np.nan, 6, 8])
s


0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

# Data Frames
A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns.

### Features of DataFrame
- Potentially columns are of different types
- Size – Mutable
- Labeled axes (rows and columns)
- Can Perform Arithmetic operations on rows and columns

### Create DataFrame
A pandas DataFrame can be created using various inputs like 

- Lists
- dict
- Series
- Numpy ndarrays
- Another DataFrame

### Creating Object

In [46]:
df = pd.DataFrame()
df

In [47]:
type(df)

pandas.core.frame.DataFrame

## DataFrame Creation from Lists

In [51]:
ls = [2,4,6,8]
df = pd.DataFrame(ls)
df

Unnamed: 0,0
0,2
1,4
2,6
3,8


In [65]:
ls = [['Ali',10],['Malik',20],['Hassan',30]]
df = pd.DataFrame(arr,columns=['Name','Age'])
df

Unnamed: 0,Name,Age
0,Ali,10
1,Malik,20
2,Hassan,30


In [66]:
type(df)

pandas.core.frame.DataFrame

In [67]:
type(df['Name'])

pandas.core.series.Series

In [68]:
type(df['Age'])

pandas.core.series.Series

In [69]:
type(df['Age'][1])

numpy.int64

In [71]:
type(df['Name'][1])

str

In [73]:
df = pd.DataFrame(ls,columns=['Name','Age'], dtype= float)
df

Unnamed: 0,Name,Age
0,Ali,10.0
1,Malik,20.0
2,Hassan,30.0


In [74]:
type(df['Age'][1])

numpy.float64

In [75]:
type(df['Name'][1])

str

## Creating a DataFrame from Dict of ndarrays / Lists

In [79]:
dic = {'Name': ['Ali','Noor','Hassan'], 'Age' :[10,45,23] }
df = pd.DataFrame(dic)
df

Unnamed: 0,Name,Age
0,Ali,10
1,Noor,45
2,Hassan,23


In [82]:
# using Custom Index
df = pd.DataFrame(dic, index=['stu1','stu2','stu3'])
df

Unnamed: 0,Name,Age
stu1,Ali,10
stu2,Noor,45
stu3,Hassan,23


## Creating the DataFrame using list of Dictionary

In [85]:
ls = [{'a':1, 'b':2},{'a':5, 'b':10, 'c':20}]
d  = pd.DataFrame(ls)
d
#Note − Observe, NaN (Not a Number) is appended in missing areas.


Unnamed: 0,a,b,c
0,1,2,
1,5,10,20.0


In [87]:
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]

#With two column indices, values same as dictionary keys
df1 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b'])

df1

Unnamed: 0,a,b
first,1,2
second,5,10


In [88]:
#With two column indices with one index with other name
df2 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b1'])
df2

Unnamed: 0,a,b1
first,1,
second,5,


## Create a DataFram from Dict of Series

In [None]:
d