### Introduction
Data processing is important part of analyzing the data, because data is not always available in desired format.
Various processing are required before analyzing the data such as cleaning, restructuring or merging etc. Numpy,
Scipy, Cython and Panda are the tools available in python which can be used fast processing of the data. Further,
Pandas are built on the top of Numpy.

### Data structures
Pandas provides `two` very useful data structures to process the data i.e. `Series and DataFrame`.

### Series
The Series is a `one-dimensional array` that can store various data types, including mix data types. The row labels
in a Series are called the index. Any list, tuple and dictionary can be converted in to Series using ‘series’ method
as shown below,

In [None]:
import pandas as pd

In [14]:
import platform
print(platform.python_version())
print(pd.__version__)

3.9.12
1.4.2


In [None]:
import pandas as pd

h = ('Python', 3.10, '06-12-2000')
s = pd.Series(h, name='data')
print(s)
print(type(s))

In [None]:
s = pd.Series([2,-1,3,5], name='columns_name')
print(s)

In [None]:
dic1 = {'name' : 'IBM', 'date' : '2010-09-08', 'shares' : 100, 'price' : 15000}
ds = pd.Series(dic1)
print(ds)

In [None]:
t = ("python", 3.10, 1991,)
td = pd.Series(t, index=['name', 'version', 'year'])
print(td)
print(type(td))

In [None]:
td['year']
# td[0]

In [None]:
td[['name', 'version']]


In [None]:
s = pd.Series([2,-2,4,8], name='data')
s

In [None]:
s + [1000,2000,3000,4000]

In [None]:
s + 2000

In [None]:
s < 0

In [None]:
s

### Index labels
Each item in a Series object has a unique identifier called the index label. By default, it is simply the rank of the item in the Series (starting at 0) but you can also set the index labels manually:

In [None]:
s2 = pd.Series([68, 83, 112, 68], index=["alice", "bob", "charles", "darwin"])
print(s2)

You can then use the `Series` just like a `dict`:

In [None]:
s2['bob']

You can still access the items by integer location, like in a regular array:

In [None]:
s2[1]

To make it clear when you are accessing by label or by integer location, it is recommended to always use the `loc` attribute when accessing by label, and the `iloc` attribute when accessing by integer location:

In [None]:
s2.loc['alice']

In [None]:
s2.iloc[1]

Slicing a `Series` also slices the index labels:

In [None]:
s2.iloc[1:3]

In [13]:
s4 = pd.Series([1, 2, 3], index=[0, 1, 2])
s5 = pd.Series([1, 2, 3], index=[2, 1, 0])
s6 = s4 + s5
s6

0    4
1    4
2    4
dtype: int64

### Series name
A `Series` can have a `name`:

In [None]:
s6 = pd.Series([83, 68, 55], index=["bob", "alice", 'bil'], name="weights")
s6

### 1.2.2 DataFrame
**`DataFrame` is the widely used data `structure of pandas`. Note that, `Series` are used to work with `one dimensional
array`, whereas `DataFrame` can be used with `two dimensional arrays`. DataFrame has two different index i.e.
column-index and row-index.**

In [1]:
import pandas as pd
data = { 'name' : ['Python', 'Django', 'Pandas'],
'year' : ['1991', '1995', '2000'],
'shares' : [100, 300, 900],
'price' : [120.3, 1000.3, 3200.2]
}

df = pd.DataFrame(data)
print(type(df))

<class 'pandas.core.frame.DataFrame'>


In [2]:
df

Unnamed: 0,name,year,shares,price
0,Python,1991,100,120.3
1,Django,1995,300,1000.3
2,Pandas,2000,900,3200.2


In [3]:
df[['name', 'year']]

Unnamed: 0,name,year
0,Python,1991
1,Django,1995
2,Pandas,2000


**`Additional columns` can be added after defining a `DataFrame`**

In [4]:
df['owner'] = 'Unknown'

In [5]:
df

Unnamed: 0,name,year,shares,price,owner
0,Python,1991,100,120.3,Unknown
1,Django,1995,300,1000.3,Unknown
2,Pandas,2000,900,3200.2,Unknown


Currently, the `row index` are set to 0, 1 and 2. These can be changed using `‘index’` attribute

In [6]:
df.index = ['one', 'two', 'three']
df

Unnamed: 0,name,year,shares,price,owner
one,Python,1991,100,120.3,Unknown
two,Django,1995,300,1000.3,Unknown
three,Pandas,2000,900,3200.2,Unknown


`any column` of the `DataFrame` can be set as `index` using `‘set_index()’ attribute`:

In [7]:
df = df.set_index(['name'])

In [8]:
df

Unnamed: 0_level_0,year,shares,price,owner
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Python,1991,100,120.3,Unknown
Django,1995,300,1000.3,Unknown
Pandas,2000,900,3200.2,Unknown


Data can be accessed in two ways i.e. using `row` and `column` index:

In [9]:
# access data using column-index
df['year']

name
Python    1991
Django    1995
Pandas    2000
Name: year, dtype: object

In [None]:
# access data by row-index


Any column can be deleted using `‘del’` or `‘drop’` commands:

In [10]:
del df['owner']
df

Unnamed: 0_level_0,year,shares,price
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Python,1991,100,120.3
Django,1995,300,1000.3
Pandas,2000,900,3200.2


In [11]:
df = df.drop(columns=['price'])
df

Unnamed: 0_level_0,year,shares
name,Unnamed: 1_level_1,Unnamed: 2_level_1
Python,1991,100
Django,1995,300
Pandas,2000,900
