## HISTORY of Pandas:

### Definition:
**pandas** is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool,
built on top of the Python programming language.


------------------



Pandas is an open-source Python Library providing high-performance data manipulation and analysis tool using its powerful data structures. The name Pandas is derived from the word Panel Data – an Econometrics from Multidimensional data.

In 2008, developer Wes McKinney started developing pandas when in need of high performance, flexible tool for analysis of data.

    1-.Pandas is an open-source
    2- BSD-licensed Python library providing high-performance
    3- easy-to-use data structures 
    4- data analysis tools for the Python programming language.
    
 ### USE of Pandas:
 data munging and preparation.
 
 Industry Uses Pandas:
 1- finance
 2- economics 
 3- Statistics
 4- analytics
 
### Key Features of Pandas
1.	Fast and efficient DataFrame object with default and customized indexing.
2.	Tools for loading data into in-memory data objects from different file formats.
3.	Data alignment and integrated handling of missing data.
4.	Reshaping and pivoting of date sets.
5.	Label-based slicing, indexing and subsetting of large data sets.
6.	Columns from a data structure can be deleted or inserted.
7.	Group by data for aggregation and transformations.
8.	High performance merging and joining of data.
9.	Time Series functionality.



### ENVIRONMENT:

- pip install pandas
- conda install pandas


No environemnt setup needed.



### Python Deal With Following Data forms:
    •	Series
    •	DataFrame
    •	Panel



## Input and Output

- Series:

Series is a one-dimensional array like structure with homogeneous data

10	23	56	17	52	61	73	90	26	72


- DataFrame

DataFrame is a two-dimensional array with heterogeneous data. For example,

|Name|	Age	|Gender	|Rating
-------------------------------
Steve	32	Male	3.45
---------------------------------
Lia	    28	Female	4.6]
---------------------------------
Vin	    45	Male	3.9
---------------------------------
Katie	38	Female	2.78
---------------------------------

- Panel

Panel is a three-dimensional data structure with heterogeneous data. It is hard to represent the panel in graphical representation. But a panel can be illustrated as a container of DataFrame.

[panel image link](https://i.stack.imgur.com/7Bj9w.jpg)


# Series

Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). The axis labels are collectively called index

### SYNTAX:
pandas.Series( data, index, dtype, copy)

where
data: data takes various forms like ndarray, list, constants
index : Default np.arrange(n) if no index is passed.
dytpe: dtype is for data type
copy: Copy data.


A series can be created using various inputs like −
1.	Array
2.	Dict
3.	Scalar value or constant



In [2]:
# Create an Empty Series

#import the pandas library and aliasing as pd
import pandas as pd
s = pd.Series()
s

Series([], dtype: float64)

In [3]:
# Series from ndarray

import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data)

s

0    a
1    b
2    c
3    d
dtype: object

In [4]:
s1 = pd.Series(data,index=[100,101,102,103])
s1

100    a
101    b
102    c
103    d
dtype: object

In [5]:
# Series from dict
#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data)
s

a    0.0
b    1.0
c    2.0
dtype: float64

In [9]:
# Accessing Data from Series

import pandas as pd
Acs = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

#retrieve the first element
Acs[0]

1

In [11]:
#retrieve the first three element
Acs[:3]

a    1
b    2
c    3
dtype: int64

In [12]:
Acs[-3:]

c    3
d    4
e    5
dtype: int64

In [13]:
# index Based
Acs['a']

1

In [15]:
#retrieve multiple elements
Acs[['a','c','d']]

a    1
c    3
d    4
dtype: int64

# DataFrame

A Data frame is a two-dimensional data structure

NAME	ID	    MARK
John	12001	80%
Jose	12002	85%
Ram	    12003	70%


where ROW and COLUMN

### SYNTAX

pandas.DataFrame( data, index, columns, dtype, copy)

where 
data: data takes various forms like ndarray, series, map, lists, dict, constants and also another DataFrame.
index : Default np.arrange(n) if no index is passed.
columns: For column labels, the optional default syntax is - np.arrange(n). This is only true if no index is passed.
dytpe: dtype is for data type
copy: Copy data.


### using various inputs like −

1-Lists
2-dict
3-Series
4-Numpy ndarrays
5-Another DataFrame



In [18]:
# Empty DataFrame

#import the pandas library and aliasing as pd
import pandas as pd
df = pd.DataFrame()
print(df)

Empty DataFrame
Columns: []
Index: []


In [19]:
# DataFrame from Lists

import pandas as pd
data = [1,2,3,4,5]
df = pd.DataFrame(data)
df

Unnamed: 0,0
0,1
1,2
2,3
3,4
4,5


In [20]:
data_ind = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data_ind,columns=['Name','Age'])
print(df)

     Name  Age
0    Alex   10
1     Bob   12
2  Clarke   13


In [21]:
###  DataFrame from Dict

import pandas as pd
data_ = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data_)
print(df)

    Name  Age
0    Tom   28
1   Jack   34
2  Steve   29
3  Ricky   42


In [23]:
# change the index
df = pd.DataFrame(data_, index=['rank1','rank2','rank3','rank4'])
print(df)

        Name  Age
rank1    Tom   28
rank2   Jack   34
rank3  Steve   29
rank4  Ricky   42


In [32]:
data_value= {'Name':'RAM','Age': 39}
df_ = pd.DataFrame(data_value,index=[0,1,2])
print(df_)

  Name  Age
0  RAM   39
1  RAM   39
2  RAM   39


In [33]:
# DataFrame from List of Dicts

dataset = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(dataset)
print(df)

## NaN (Not a Number)

   a   b     c
0  1   2   NaN
1  5  10  20.0


In [37]:
dataset2 = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df_ = pd.DataFrame(dataset2, index=['first', 'second'])
print(df_)

        a   b     c
first   1   2   NaN
second  5  10  20.0


In [38]:
# Different colums

data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]

#With two column indices, values same as dictionary keys
df1 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b'])

#With two column indices with one index with other name
df2 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b1'])
print(df1)
print(df2)

        a   b
first   1   2
second  5  10
        a  b1
first   1 NaN
second  5 NaN


In [39]:
## DataFrame from Dict of Series

import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
   'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print(df)

   one  two
a  1.0    1
b  2.0    2
c  3.0    3
d  NaN    4


In [43]:
# Selection by integer location

import pandas as pd

d1 = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
   'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d1)
print(df.iloc[2])

one    3.0
two    3.0
Name: c, dtype: float64


In [45]:
# Slice Rows
df = pd.DataFrame(d1)
print(df.iloc[1:2])

   one  two
b  2.0    2


In [51]:
# Addition two dataframe

import pandas as pd

df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])

df = df.append(df2)
df

Unnamed: 0,a,b
0,1,2
1,3,4
0,5,6
1,7,8


In [52]:
# Drop the row
# Drop rows with label 0
df = df.drop(0)
df

Unnamed: 0,a,b
1,3,4
1,7,8


In [48]:
df_data=df * df2
df_data

Unnamed: 0,a,b
0,5,12
0,25,36
1,21,32
1,49,64


In [47]:
df_data=df +df2
df_data

Unnamed: 0,a,b
0,6,8
0,10,12
1,10,12
1,14,16


# Panel

A panel is a 3D container of data

### Syntax
pandas.Panel(data, items, major_axis, minor_axis, dtype, copy)

where 
data:	Data takes various forms like ndarray, series, map, lists, dict, constants and also another DataFrame
items:	axis=0 ( each item corresponds to a DataFrame contained inside.)
major_axis:	axis=1 (it is the index (rows) of each of the DataFrames.)
minor_axis:	axis=2 (it is the columns of each of the DataFrames)
dtype:	Data type of each column
copy:	Copy data. Default, false

multiple ways like −

- From ndarrays
- From dict of DataFrames


In [54]:
#creating an empty panel
import pandas as pd
p = pd.Panel()
print(p)

<class 'pandas.core.panel.Panel'>
Dimensions: 0 (items) x 0 (major_axis) x 0 (minor_axis)
Items axis: None
Major_axis axis: None
Minor_axis axis: None


Panel is deprecated and will be removed in a future version.
The recommended way to represent these types of 3-dimensional data are with a MultiIndex on a DataFrame, via the Panel.to_frame() method
Alternatively, you can use the xarray package http://xarray.pydata.org/en/stable/.
Pandas provides a `.to_xarray()` method to help automate this conversion.

  exec(code_obj, self.user_global_ns, self.user_ns)


In [53]:
#From 3D ndarray
# creating an empty panel
import pandas as pd
import numpy as np

data = np.random.rand(2,4,5)
p = pd.Panel(data)
print(p)

<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 4 (major_axis) x 5 (minor_axis)
Items axis: 0 to 1
Major_axis axis: 0 to 3
Minor_axis axis: 0 to 4


Panel is deprecated and will be removed in a future version.
The recommended way to represent these types of 3-dimensional data are with a MultiIndex on a DataFrame, via the Panel.to_frame() method
Alternatively, you can use the xarray package http://xarray.pydata.org/en/stable/.
Pandas provides a `.to_xarray()` method to help automate this conversion.

  exec(code_obj, self.user_global_ns, self.user_ns)


In [55]:
# Using major_axis
# creating an empty panel
import pandas as pd
import numpy as np
data = {'Item1' : pd.DataFrame(np.random.randn(4, 3)), 
   'Item2' : pd.DataFrame(np.random.randn(4, 2))}
p = pd.Panel(data)
print(p.major_xs(1))

      Item1     Item2
0 -1.649097  1.930826
1  0.097310 -1.002785
2  0.779116       NaN


Panel is deprecated and will be removed in a future version.
The recommended way to represent these types of 3-dimensional data are with a MultiIndex on a DataFrame, via the Panel.to_frame() method
Alternatively, you can use the xarray package http://xarray.pydata.org/en/stable/.
Pandas provides a `.to_xarray()` method to help automate this conversion.

  exec(code_obj, self.user_global_ns, self.user_ns)


In [56]:
# Using minor_axis

# creating an empty panel
import pandas as pd
import numpy as np
data = {'Item1' : pd.DataFrame(np.random.randn(4, 3)), 
   'Item2' : pd.DataFrame(np.random.randn(4, 2))}
p = pd.Panel(data)
print(p.minor_xs(1))

      Item1     Item2
0  0.269986  0.175688
1  0.752451  0.966065
2 -0.449551  0.711768
3 -0.534607 -0.945365


Panel is deprecated and will be removed in a future version.
The recommended way to represent these types of 3-dimensional data are with a MultiIndex on a DataFrame, via the Panel.to_frame() method
Alternatively, you can use the xarray package http://xarray.pydata.org/en/stable/.
Pandas provides a `.to_xarray()` method to help automate this conversion.

  exec(code_obj, self.user_global_ns, self.user_ns)
