Series is one of the core data structure in Pandas. You can think of it as a cross between a list and a dictionary.

In [21]:
import pandas as pd
import numpy as np

## Creating a series

In [2]:
pd.Series(['Akhil','Theerhtala'])

0         Akhil
1    Theerhtala
dtype: object

- Pandas tries to automatically identifiy the data type of the inputs. In this case, the strings that are given as an input to object.
- Any series, by default, will be indexed with integers starting from 0, this can be changed with specifying a list of indices.

In [3]:
num = [1,2,3,4,5,6,7,8,9]
pd.Series(num)

0    1
1    2
2    3
3    4
4    5
5    6
6    7
7    8
8    9
dtype: int64

There are null datatypes that we better we know.
- NaN - Not a number
- None- Not an object

In [4]:
num = [1,2,3,4,5,6,7,8,None]
pd.Series(num)

0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
5    6.0
6    7.0
7    8.0
8    NaN
dtype: float64

The NaN, is internally classified as a float, hence, pandas automatically converts the integers to floats. 

**Note:** NaN is **NOT** None. It is a numeric value and is treated differently for efficiency reasons.

In [7]:
grads = {'s1':61, 's2':64, 's3':65}
ser = pd.Series(grads)
ser

s1    61
s2    64
s3    65
dtype: int64

In [8]:
ser.index

Index(['s1', 's2', 's3'], dtype='object')

Let us now try tuples.

In [10]:
students = [('Kcik','Buttowski'),('Conan', 'Edogawa'), ('Mugiwara','Luffy')]
pd.Series(students)

0    (Kcik, Buttowski)
1     (Conan, Edogawa)
2    (Mugiwara, Luffy)
dtype: object

In [12]:
idx = ['P','C','M']
pd.Series([30,30,30], index =idx)

P    30
C    30
M    30
dtype: int64

In [13]:
dict = {'a':2,'b':3,'c':5}
pd.Series(dict, index=['a','b','d'])

a    2.0
b    3.0
d    NaN
dtype: float64

Notice that in the above cell the 'c' is dropped from the dictionary and the d is added and is given as NaN, because it is specified in the index provided.

## Querying series

### loc vs iloc

A series can be queried by the use of both the index position and the index label. We use `.iloc` attributes to query using index position i.e., integer values, while we use the `loc` to query by the `loc` attribure.

In [15]:
# let us bring back a series that we have created above.
grade = pd.Series(grads)

In [16]:
grade.iloc[0]

61

In [17]:
grade.loc['s1']

61

Pandas tries to make use of a 'smart' syntax, by trying to identify the attribute to use. Let us look at that.

In [18]:
grade[0]

61

In [19]:
grade['s1']

61

It is recommended to be more explicit my mentioning the attribute used, as  might be cases, where you have the index of the series as an integer list.

Now, let us see how we work with data. Let us first create random series.

In [39]:
numbers = pd.Series(np.random.randint(0,1000,10000))
numbers.head()

0    290
1    302
2    712
3    164
4    683
dtype: int32

In [40]:
%%timeit -n 150
total=0
for num in numbers:
    total+=num

total/len(numbers)

1.2 ms ± 53.2 µs per loop (mean ± std. dev. of 7 runs, 150 loops each)


In [41]:
%%timeit -n 150
total = np.sum(numbers)
total/len(numbers)

58.9 µs ± 3.86 µs per loop (mean ± std. dev. of 7 runs, 150 loops each)


#### Broadcasting

In [42]:
numbers.head()

0    290
1    302
2    712
3    164
4    683
dtype: int32

In [43]:
%%timeit -n 10
for label,values in numbers.iteritems():
    numbers.loc[label]= values+2

292 ms ± 6.17 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [48]:
%%timeit -n 10
numbers = pd.Series(np.random.randint(0,1000,10000))
numbers+=2

239 µs ± 52.5 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


Note: The `.loc` attribute lets us to add new data to the series as well. 

In [49]:
grade

s1    61
s2    64
s3    65
dtype: int64

In [50]:
grade.loc['s4'] = 64

In [51]:
grade

s1    61
s2    64
s3    65
s4    64
dtype: int64

Note that the index can be of any type. 

In [52]:
grade.loc[35] = 64

In [53]:
grade

s1    61
s2    64
s3    65
s4    64
35    64
dtype: int64

#### The Append method

To append a series to another series, we have the `append` method. The application is straightforward. 

In [54]:
grade

s1    61
s2    64
s3    65
s4    64
35    64
dtype: int64

In [56]:
d2 = {'s5':85, 's6':64, 's7':68}
grade2 = pd.Series(d2)
grade2

s5    85
s6    64
s7    68
dtype: int64

In [57]:
grade.append(grade2)

  grade.append(grade2)


s1    61
s2    64
s3    65
s4    64
35    64
s5    85
s6    64
s7    68
dtype: int64

One thing that we need to remember is that the method, doesn't change the existing grade series, but returns a new series instead.

In [58]:
grade

s1    61
s2    64
s3    65
s4    64
35    64
dtype: int64

This is common in most of the pandas methods. To use the new series, we need to store it in a variable.

In [60]:
grade_final = grade.append(grade2)
grade

  grade_final = grade.append(grade2)


s1    61
s2    64
s3    65
s4    64
35    64
dtype: int64

The `.append()` method will be deprecated in the future, so let us see the alternative provided. `pd.concat()`

In [68]:
pd.concat([grade,grade2],axis=0)

s1    61
s2    64
s3    65
s4    64
35    64
s5    85
s6    64
s7    68
dtype: int64

The `axis` argument in the above cell, determines the axis along which the 2 series or dataframes must concatenate. In this case, the `axis=0` denotes index. When we pass `axis=1` as the argument, the method tries to join the 2 series along the common indices, in case of not having common indices, the appropriate null values are used. Let us see in our case,

In [69]:
pd.concat([grade,grade2],axis=1)

Unnamed: 0,0,1
s1,61.0,
s2,64.0,
s3,65.0,
s4,64.0,
35,64.0,
s5,,85.0
s6,,64.0
s7,,68.0


As we have no common indices, NaNs are used to fill the empty entries. If you observe the columns are also automatically named with integres starting from 0. We will see more about them when we discuss about querying in dataframes.