## SERIES - One Dimension List or 1D array


The first main data type we will learn about for pandas is the Series data type. Let's import Pandas and explore the Series object.

A Series is very similar to a NumPy array (in fact it is built on top of the NumPy array object). What differentiates the NumPy array from a Series, is that a Series can have axis labels, meaning it can be indexed by a label, instead of just a number location. It also doesn't need to hold numeric data, it can hold any arbitrary Python Object.

Let's explore this concept through some examples:

In [5]:
import pandas as pd
import numpy as np

s = pd.Series([1,2,3,4,5])    

print(s)
print(s[3])

0    1
1    2
2    3
3    4
4    5
dtype: int64
4


In [8]:
s = pd.Series([1,2,3,4,5], index=['a','b','ca','d','e'])
#index gives a row name
print(s)
print(s[3])


print(s['ca'])

a     1
b     2
ca    3
d     4
e     5
dtype: int64
4
3


## dataframe
#2D Data structure. Data is present in tabular format


In [12]:
df = pd.DataFrame(np.random.randint(2,6,(2,4)), columns=['a','b','c','d'])
df

Unnamed: 0,a,b,c,d
0,3,3,5,5
1,5,3,3,2


In [13]:
df = pd.DataFrame(np.random.randint(2,6,(2,4)), columns=['a','b','c','d'], index = ['q','w'])
df

Unnamed: 0,a,b,c,d
q,4,2,5,4
w,2,2,5,4


In [15]:
df['a']

q    4
w    2
Name: a, dtype: int32

In [18]:
df.iloc[0]

a    4
b    2
c    5
d    4
Name: q, dtype: int32

In [22]:
#initializing list of list

data = np.array([['amar',10],['ankit',25],['rahul', 15]])
print(data)

#creating the dataframe
df = pd.DataFrame(data, columns=['name','age'])


[['amar' '10']
 ['ankit' '25']
 ['rahul' '15']]


In [24]:
df

Unnamed: 0,name,age
0,amar,10
1,ankit,25
2,rahul,15


In [25]:
#initializing as a dictionary

data = {'Name':['tom','jerry','sp'],'age':[15,20,25]}
df=pd.DataFrame(data)
df

Unnamed: 0,Name,age
0,tom,15
1,jerry,20
2,sp,25


In [26]:
#initializingdata to dicts of series 

d = {'one':pd.Series([10,20,30,40], index=['a','b','c','d']),
     'two':pd.Series([10,20,30,40], index=['a','b','c','e'])}

df=pd.DataFrame(d)
df

Unnamed: 0,one,two
a,10.0,10.0
b,20.0,20.0
c,30.0,30.0
d,40.0,
e,,40.0


In [27]:
d = [{'b':3, 'c':4},{'a':1,'b':2,'c':3}]
df = pd.DataFrame(d, index=['one','two'])
df

Unnamed: 0,a,b,c
one,,3,4
two,1.0,2,3


In [32]:
name = ['ankit','ram','raj']
age = [15,25,35]
#get the lsit of tuples
#merging them using zip function

listoftuples = list(zip(name, age))
listoftuples

#assign data to tuples
#converting list of tuples to dataframe

df=pd.DataFrame(listoftuples, columns=['name','age'], index=list('abc')) #['a','b','c']
df


Unnamed: 0,name,age
a,ankit,15
b,ram,25
c,raj,35


In [35]:
df = pd.DataFrame(np.random.randn(15,4), columns=['A','B','C','D'])
df

Unnamed: 0,A,B,C,D
0,-0.243424,0.020938,0.846085,1.037755
1,1.298942,-0.994713,-1.85283,0.259055
2,-0.8648,-0.228652,-0.465934,0.959612
3,-0.178981,-0.193575,3.005254,-0.749252
4,1.138283,-0.237546,-0.640991,1.35524
5,0.194088,-1.371822,0.497645,2.157433
6,1.077661,-0.29053,0.52589,-0.00423
7,1.780902,1.181523,-0.479603,-1.464199
8,1.598723,1.507339,0.930278,-1.835974
9,-0.371421,-0.028686,0.83833,-1.55622


In [41]:
df['A'].argmax()


The current behaviour of 'Series.argmax' is deprecated, use 'idxmax'
instead.
The behavior of 'argmax' will be corrected to return the positional
maximum in the future. For now, use 'series.values.argmax' or
'np.argmax(np.array(values))' to get the position of the maximum
row.
  """Entry point for launching an IPython kernel.


7

In [42]:
df.head()

Unnamed: 0,A,B,C,D
0,-0.243424,0.020938,0.846085,1.037755
1,1.298942,-0.994713,-1.85283,0.259055
2,-0.8648,-0.228652,-0.465934,0.959612
3,-0.178981,-0.193575,3.005254,-0.749252
4,1.138283,-0.237546,-0.640991,1.35524


In [43]:
df.columns

Index(['A', 'B', 'C', 'D'], dtype='object')

In [44]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15 entries, 0 to 14
Data columns (total 4 columns):
A    15 non-null float64
B    15 non-null float64
C    15 non-null float64
D    15 non-null float64
dtypes: float64(4)
memory usage: 560.0 bytes


In [45]:
df.rename({'A':'Ankit'}, axis=1)#, inplace=True)
df

Unnamed: 0,Ankit,B,C,D
0,-0.243424,0.020938,0.846085,1.037755
1,1.298942,-0.994713,-1.85283,0.259055
2,-0.8648,-0.228652,-0.465934,0.959612
3,-0.178981,-0.193575,3.005254,-0.749252
4,1.138283,-0.237546,-0.640991,1.35524
5,0.194088,-1.371822,0.497645,2.157433
6,1.077661,-0.29053,0.52589,-0.00423
7,1.780902,1.181523,-0.479603,-1.464199
8,1.598723,1.507339,0.930278,-1.835974
9,-0.371421,-0.028686,0.83833,-1.55622


In [49]:
df= pd.read_csv('C:/Users/ANKIT/preprocess.csv')
df

Unnamed: 0,Country,Age,Salary,Purchased
0,France,58.0,48000.0,Y
1,Spain,62.0,54400.0,N
2,Germany,43.0,61000.0,N
3,Spain,62.0,48000.0,N
4,Germany,67.0,,Y
5,France,,45000.0,Y
6,Spain,83.0,56000.0,N
7,France,60.0,65000.0,Y
8,Germany,48.0,72000.0,N
9,France,44.0,92000.0,Y


In [None]:
de.head, info, tail, describe