# Series

A Series is very similar to a NumPy array (in fact it is built on top of the NumPy array object). What differentiates the NumPy array from a Series, is that a Series can have axis labels, meaning it can be indexed by a label, instead of just a number location. It also doesn't need to hold numeric data, it can hold any arbitrary Python Object.

Let's explore this concept through some examples:

In [1]:
import numpy as np
import pandas as pd

In [2]:
import  numpy as np
import pandas as pd

In [3]:
 []

[]

### Creating a Series

You can convert a list,numpy array, or dictionary to a Series:

In [4]:
my_list = [10,20,30]

In [None]:
my_list = [10,20,30]

In [5]:
pd.Series(data=my_list)

0    10
1    20
2    30
dtype: int64

In [None]:
pd.Series(data=my_list)

0    10
1    20
2    30
dtype: int64

In [None]:
labels = ['a','b','c']
my_list = [10,20,30]
pd.Series(data=my_list,index=labels)

a    10
b    20
c    30
dtype: int64

In [None]:
sr_no = ['a','b','c','d']
my_list = [10,20,30,40]
pd.Series(data=my_list,index=sr_no)

a    10
b    20
c    30
d    40
dtype: int64

In [None]:
arr = np.array([10,20,30])
pd.Series(arr)

0    10
1    20
2    30
dtype: int64

In [None]:
arr = np.array([10,20,30])
pd.Series(arr)

0    10
1    20
2    30
dtype: int64

In [None]:
d = {'a':10,'b':20,'c':30}
pd.Series(d)

a    10
b    20
c    30
dtype: int64

In [None]:
lst = {'a':10,'b':20,'c':40}
pd.Series(lst)

a    10
b    20
c    40
dtype: int64

In [None]:
# series with duplicate index
a = pd.Series(data=[1,2,3],index=[20,20,40])
print(a[20])

20    1
20    2
dtype: int64


In [None]:
x = pd.Series(data=[1,2,3,4],index=['a','b','b','d'])
print(x['b'])

b    2
b    3
dtype: int64


# DataFrames

DataFrames are the workhorse of pandas and are directly inspired by the R programming language. We can think of a DataFrame as a bunch of Series objects put together to share the same index. Let's use pandas to explore this topic!

In [None]:
df = pd.DataFrame(data={'maths':[98,99,82,91,78],'chemistry':[97,99,87,72,65],'physics':[97,82,98,68,55]})

In [None]:
print(df)

   maths  chemistry  physics
0     98         97       97
1     99         99       82
2     82         87       98
3     91         72       68
4     78         65       55


In [None]:
df['maths'] 

0    98
1    99
2    82
3    91
4    78
Name: maths, dtype: int64

In [None]:
type(df['maths'])

pandas.core.series.Series

## Selection and Indexing

Let's learn the various methods to grab data from a DataFrame

In [None]:
# select single column
df['maths']

0    98
1    99
2    82
3    91
4    78
Name: maths, dtype: int64

In [None]:
# Pass a list of column names
df[['maths','physics']]

Unnamed: 0,maths,physics
0,98,97
1,99,82
2,82,98
3,91,68
4,78,55


**Create a new column:**

In [None]:
df['total_marks'] = df['maths'] + df['chemistry']+df['physics']

In [None]:
print(df)

   maths  chemistry  physics  total_marks
0     98         97       97          292
1     99         99       82          280
2     82         87       98          267
3     91         72       68          231
4     78         65       55          198


** Remove Column**

In [None]:
df.drop('total_marks', axis=1)

Unnamed: 0,maths,chemistry,physics
0,98,97,97
1,99,99,82
2,82,87,98
3,91,72,68
4,78,65,55


In [None]:
# Not inplace unless specified!
df

Unnamed: 0,maths,chemistry,physics,total_marks
0,98,97,97,292
1,99,99,82,280
2,82,87,98,267
3,91,72,68,231
4,78,65,55,198


In [None]:
df.drop('total_marks',axis=1,inplace=True)

In [None]:
df

Unnamed: 0,maths,chemistry,physics
0,98,97,97
1,99,99,82
2,82,87,98
3,91,72,68
4,78,65,55


Can also drop rows this way:

In [None]:
df.drop(4,axis=0)

Unnamed: 0,maths,chemistry,physics
0,98,97,97
1,99,99,82
2,82,87,98
3,91,72,68


In [None]:
df[df>60]

Unnamed: 0,maths,chemistry,physics
0,98,97,97.0
1,99,99,82.0
2,82,87,98.0
3,91,72,68.0
4,78,65,


** Select Rows**

In [None]:
df.iloc[2]

maths           82
chemistry       87
physics         98
total_marks    267
Name: 2, dtype: int64

### Conditional Selection

An important feature of pandas is conditional selection using bracket notation, very similar to numpy:

In [None]:
df

Unnamed: 0,maths,chemistry,physics
0,98,97,97
1,99,99,82
2,82,87,98
3,91,72,68
4,78,65,55


In [None]:
df>90

Unnamed: 0,maths,chemistry,physics
0,True,True,True
1,True,True,False
2,False,False,True
3,True,False,False
4,False,False,False


In [None]:
df[df>90]  

Unnamed: 0,maths,chemistry,physics
0,98.0,97.0,97.0
1,99.0,99.0,
2,,,98.0
3,91.0,,
4,,,


## More Index Details

Let's discuss some more features of indexing, including resetting the index or setting it something else. We'll also talk about index hierarchy!

In [None]:
x = 'Amit Smith Allen Kathy John'.split()

In [None]:
df['names'] = x

In [None]:
df

Unnamed: 0,maths,chemistry,physics,total_marks,names
0,98,97,97,292,Amit
1,99,99,82,280,Smith
2,82,87,98,267,Allen
3,91,72,68,231,Kathy
4,78,65,55,198,John


In [None]:
df.set_index('names')

Unnamed: 0_level_0,maths,chemistry,physics,total_marks
names,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Amit,98,97,97,292
Smith,99,99,82,280
Allen,82,87,98,267
Kathy,91,72,68,231
John,78,65,55,198


In [None]:
df.set_index('names',inplace=True)

In [None]:
df

Unnamed: 0_level_0,maths,chemistry,physics,total_marks
names,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Amit,98,97,97,292
Smith,99,99,82,280
Allen,82,87,98,267
Kathy,91,72,68,231
John,78,65,55,198
