# DataStructure in Pandas

>Pandas provide three data structure for processing the data :

# <u> Series :</u>

Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index. The basic method to create a Series is to call:

*s = pd.Series(data, index=index)*

In [81]:
import pandas as pd 
import numpy as np

x = [3,4,5,6,7,8]

var = pd.Series(x,index=['a>','b>','c>','d>','e>','f>'],dtype="float",name="Series")

print(var)
print(type(var))
print(var[2])

a>    3.0
b>    4.0
c>    5.0
d>    6.0
e>    7.0
f>    8.0
Name: Series, dtype: float64
<class 'pandas.core.series.Series'>
5.0


In [82]:
dic = {"name":['Python','C','C++','Java','SQL','UX/UI'],"pop":[10,9,6,8,4,7.5],"rank":[1,2,3,4,5,6]}
var1 = pd.Series(dic)
print(var1)

name    [Python, C, C++, Java, SQL, UX/UI]
pop                  [10, 9, 6, 8, 4, 7.5]
rank                    [1, 2, 3, 4, 5, 6]
dtype: object


In [83]:
s = pd.Series(12,index=[1,2,3,4,5,6,7,8,9])
print(s)
print(type(s))

1    12
2    12
3    12
4    12
5    12
6    12
7    12
8    12
9    12
dtype: int64
<class 'pandas.core.series.Series'>


In [84]:
s1 = pd.Series(12,index=[1,2,3,4,5,6,7,8,9])
s2 = pd.Series(12,index=[1,2,3,4,5])
print(s1+s2)

1    24.0
2    24.0
3    24.0
4    24.0
5    24.0
6     NaN
7     NaN
8     NaN
9     NaN
dtype: float64


In [85]:
a = pd.Series(np.random.randn(5), index=["a", "b", "c", "d", "e"],dtype="int32")
a

a   -0.541231
b   -0.926319
c   -0.259605
d   -1.641534
e    0.972045
dtype: float64

_<u>Note:</u>_

_pandas supports non-unique index values. If an operation that does not support duplicate index values is attempted, an exception will be raised at that time._

# Series is ndarray-like

**Series** acts very similarly to a **ndarray** and is a valid argument to most NumPy functions. However, operations such as slicing will also slice the index.

In [86]:
s.to_numpy()

array([12, 12, 12, 12, 12, 12, 12, 12, 12], dtype=int64)

# <u> DataFrame :</u>

DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used pandas object. Like Series, DataFrame accepts many different kinds of input:

>Dict of 1D ndarrays, lists, dicts, or Series

>2-D numpy.ndarray

>Structured or record ndarray

>A Series

>Another DataFrame

Along with the data, you can optionally pass index (row labels) and columns (column labels) arguments. If you pass an index and / or columns, you are guaranteeing the index and / or columns of the resulting DataFrame. Thus, a dict of Series plus a specific index will discard all data not matching up to the passed index.

If axis labels are not passed, they will be constructed from the input data based on common sense rules.

In [87]:
d = {
    "one": pd.Series([1.0, 2.0, 3.0], index=["a", "b", "c"]),
    "two": pd.Series([1.0, 2.0, 3.0, 4.0], index=["a", "b", "c", "d"]),
}

v = pd.DataFrame(d)
print(type(v))
print(v)

<class 'pandas.core.frame.DataFrame'>
   one  two
a  1.0  1.0
b  2.0  2.0
c  3.0  3.0
d  NaN  4.0


# <u>Arithmetic Operations:</u>

In [88]:
v1 = pd.DataFrame({"A":[1, 2, 3, 4], "B":[5, 6, 7, 8]})
v1

Unnamed: 0,A,B
0,1,5
1,2,6
2,3,7
3,4,8


In [89]:
v1["C"] = v1["A"]**v1["B"]
v1


Unnamed: 0,A,B,C
0,1,5,1
1,2,6,64
2,3,7,2187
3,4,8,65536


In [90]:
v1["C"] = v1["A"]*v1["B"]
v1

Unnamed: 0,A,B,C
0,1,5,5
1,2,6,12
2,3,7,21
3,4,8,32


In [91]:
v1["C"] = v1["A"]+v1["B"]
v1

Unnamed: 0,A,B,C
0,1,5,6
1,2,6,8
2,3,7,10
3,4,8,12


In [92]:
v1["C"] = v1["A"]-v1["B"]
v1

Unnamed: 0,A,B,C
0,1,5,-4
1,2,6,-4
2,3,7,-4
3,4,8,-4


In [93]:
v1["C"] = v1["A"]/v1["B"]
v1

Unnamed: 0,A,B,C
0,1,5,0.2
1,2,6,0.333333
2,3,7,0.428571
3,4,8,0.5


In [94]:
v1["C"] = v1["A"]%v1["B"]
v1

Unnamed: 0,A,B,C
0,1,5,1
1,2,6,2
2,3,7,3
3,4,8,4


In [95]:
v2 = pd.DataFrame({"A":[10, 20, 30, 40], "B":[15, 16, 17, 18]})
v2["Python"] = v2["A"]<=33
v2["Python_1"] = v2["A"]>=16.5
v2

Unnamed: 0,A,B,Python,Python_1
0,10,15,True,False
1,20,16,True,True
2,30,17,True,True
3,40,18,False,True


# <u>Delete and Insert Functions</u>:

## Insert :-

In [96]:
v3 = pd.DataFrame({"A":[10, 20, 30, 40, 90], "B":[15, 16, 17, 18, 23]})
v3

Unnamed: 0,A,B
0,10,15
1,20,16
2,30,17
3,40,18
4,90,23


In [97]:
v3.insert(2,"Inserted",v3["A"]+v3["B"])
v3

Unnamed: 0,A,B,Inserted
0,10,15,25
1,20,16,36
2,30,17,47
3,40,18,58
4,90,23,113


In [98]:
v3['Copied'] = v3["A"][:3]
v3

Unnamed: 0,A,B,Inserted,Copied
0,10,15,25,10.0
1,20,16,36,20.0
2,30,17,47,30.0
3,40,18,58,
4,90,23,113,


## Delete :-

In [99]:
v3.pop("Copied")
v3

Unnamed: 0,A,B,Inserted
0,10,15,25
1,20,16,36
2,30,17,47
3,40,18,58
4,90,23,113


In [100]:
del v3["B"]
v3

Unnamed: 0,A,Inserted
0,10,25
1,20,36
2,30,47
3,40,58
4,90,113


# <u>Write CSV File</u>:

In [109]:
v4 = pd.DataFrame({"A":[10, 20, 30, 40, 90], "B":[15, 16, 17, 18, 23]})
v4.insert(2,"Inserted",v4["A"]+v4["B"])
v4['Copied'] = v4["A"][:3]
v4.to_csv("test.csv",index=False)