# Series

A Series is very similar to a NumPy array (in fact it is built on top of the NumPy array object). What differentiates the NumPy array from a Series, is that a Series can have axis labels, meaning it can be indexed by a label, instead of just a number location. It also doesn't need to hold numeric data, it can hold any arbitrary Python Object.

Let's explore this concept through some examples:

In [1]:
import numpy as np
import pandas as pd

### Creating a Series

You can convert a list,numpy array, or dictionary to a Series:

In [2]:
my_list = [10,20,30]

In [3]:
pd.Series(data=my_list)

0    10
1    20
2    30
dtype: int64

In [5]:
labels = ['a','b','c']
my_list = [10,20,30]
pd.Series(data=my_list,index=labels)

a    10
b    20
c    30
dtype: int64

In [6]:
arr = np.array([10,20,30])
pd.Series(arr)

0    10
1    20
2    30
dtype: int64

In [7]:
d = {'a':10,'b':20,'c':30}
pd.Series(d)

a    10
b    20
c    30
dtype: int64

In [9]:
# series with duplicate index
a = pd.Series(data=[1,2,3],index=[20,20,40])
print(a[20])

20    1
20    2
dtype: int64


# DataFrames

DataFrames are the workhorse of pandas and are directly inspired by the R programming language. We can think of a DataFrame as a bunch of Series objects put together to share the same index. Let's use pandas to explore this topic!

In [10]:
df = pd.DataFrame(data={'maths':[98,99,82,91,78],'chemistry':[97,99,87,72,65],'physics':[97,82,98,68,55]})

In [11]:
print(df)

   maths  chemistry  physics
0     98         97       97
1     99         99       82
2     82         87       98
3     91         72       68
4     78         65       55


In [12]:
df['maths'] 

0    98
1    99
2    82
3    91
4    78
Name: maths, dtype: int64

In [13]:
type(df['maths'])

pandas.core.series.Series

## Selection and Indexing

Let's learn the various methods to grab data from a DataFrame

In [14]:
# select single column
df['maths']

0    98
1    99
2    82
3    91
4    78
Name: maths, dtype: int64

In [15]:
# Pass a list of column names
df[['maths','physics']]

Unnamed: 0,maths,physics
0,98,97
1,99,82
2,82,98
3,91,68
4,78,55


**Create a new column:**

In [16]:
df['total_marks'] = df['maths'] + df['chemistry']+df['physics']

In [17]:
print(df)

   maths  chemistry  physics  total_marks
0     98         97       97          292
1     99         99       82          280
2     82         87       98          267
3     91         72       68          231
4     78         65       55          198


** Remove Column**

In [19]:
df.drop('total_marks', axis=1)

Unnamed: 0,maths,chemistry,physics
0,98,97,97
1,99,99,82
2,82,87,98
3,91,72,68
4,78,65,55


In [20]:
# Not inplace unless specified!
df

Unnamed: 0,maths,chemistry,physics,total_marks
0,98,97,97,292
1,99,99,82,280
2,82,87,98,267
3,91,72,68,231
4,78,65,55,198


In [21]:
df.drop('total_marks',axis=1,inplace=True)

In [22]:
df

Unnamed: 0,maths,chemistry,physics
0,98,97,97
1,99,99,82
2,82,87,98
3,91,72,68
4,78,65,55


Can also drop rows this way:

In [23]:
df.drop(4,axis=0)

Unnamed: 0,maths,chemistry,physics
0,98,97,97
1,99,99,82
2,82,87,98
3,91,72,68


In [24]:
df

Unnamed: 0,maths,chemistry,physics
0,98,97,97
1,99,99,82
2,82,87,98
3,91,72,68
4,78,65,55


** Select Rows**

In [25]:
df.iloc[2]

maths        82
chemistry    87
physics      98
Name: 2, dtype: int64

### Conditional Selection

An important feature of pandas is conditional selection using bracket notation, very similar to numpy:

In [28]:
df

Unnamed: 0,maths,chemistry,physics
0,98,97,97
1,99,99,82
2,82,87,98
3,91,72,68
4,78,65,55


In [29]:
df>90

Unnamed: 0,maths,chemistry,physics
0,True,True,True
1,True,True,False
2,False,False,True
3,True,False,False
4,False,False,False


In [30]:
df[df>90]  

Unnamed: 0,maths,chemistry,physics
0,98.0,97.0,97.0
1,99.0,99.0,
2,,,98.0
3,91.0,,
4,,,


## More Index Details

Let's discuss some more features of indexing, including resetting the index or setting it something else. We'll also talk about index hierarchy!

In [31]:
x = 'Amit Smith Allen Kathy John'.split()

In [32]:
df['names'] = x

In [33]:
df

Unnamed: 0,maths,chemistry,physics,names
0,98,97,97,Amit
1,99,99,82,Smith
2,82,87,98,Allen
3,91,72,68,Kathy
4,78,65,55,John


In [34]:
df.set_index('names')

Unnamed: 0_level_0,maths,chemistry,physics
names,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Amit,98,97,97
Smith,99,99,82
Allen,82,87,98
Kathy,91,72,68
John,78,65,55


In [35]:
df.set_index('names',inplace=True)

In [36]:
df

Unnamed: 0_level_0,maths,chemistry,physics
names,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Amit,98,97,97
Smith,99,99,82
Allen,82,87,98
Kathy,91,72,68
John,78,65,55
