# The Series Data Structure

In [1]:
import pandas as pd
pd.Series?

In [2]:
animals = ['Tiger', 'Bear', 'Cheetah']
pd.Series(animals)

0      Tiger
1       Bear
2    Cheetah
dtype: object

In [3]:
numbers = [1,2,3]
pd.Series(numbers)

0    1
1    2
2    3
dtype: int64

In [4]:
animals = ['Tiger', 'Bear', None]
pd.Series(animals)

0    Tiger
1     Bear
2     None
dtype: object

In [5]:
numbers = [1,2,None]
pd.Series(numbers)

0    1.0
1    2.0
2    NaN
dtype: float64

In [6]:
import numpy as np
np.nan == None

False

In [7]:
np.nan == np.nan

False

In [10]:
np.isnan(np.nan)

True

In [11]:
sports = {'Archery': 'Bhutan',
         'Golf': 'Scotland',
         'Sumo': 'Japan',
         'Taekwondo': 'South Korea'}
s = pd.Series(sports)
sports

Archery           Bhutan
Golf            Scotland
Sumo               Japan
Taekwondo    South Korea
dtype: object

In [12]:
s.index

Index(['Archery', 'Golf', 'Sumo', 'Taekwondo'], dtype='object')

In [13]:
s = pd.Series(['Tiger', 'cheetah', 'Bear'], index = ['India', 'Africa', 'Bhutan'])
s

India       Tiger
Africa    cheetah
Bhutan       Bear
dtype: object

In [15]:
sports = {'Archery': 'Bhutan',
         'Golf': 'Scotland',
         'Sumo': 'Japan',
         'Taekwondo': 'South Korea'}
s = pd.Series(sports, index=['Golf', 'Sumo', 'Hockey'])
s

Golf      Scotland
Sumo         Japan
Hockey         NaN
dtype: object

# Querying Series

National Sporting data from wikipedia using as an example

In [19]:
sports = {'Archery': 'Bhutan',
         'Golf': 'Scotland',
         'Sumo': 'Japan',
         'Taekwondo': 'South Korea'}
s = pd.Series(sports)
s

Archery           Bhutan
Golf            Scotland
Sumo               Japan
Taekwondo    South Korea
dtype: object

`iloc` gets rows (or columns) at particular positions in the index (so it only takes integers).

In [20]:
s.iloc[3]

'South Korea'

`loc` gets rows (or columns) with particular labels from the index.

In [21]:
s.loc['Golf']

'Scotland'

In [22]:
s[3]

'South Korea'

In [24]:
s['Golf']

'Scotland'

Another data example

In [25]:
sports = {99: 'Bhutan',
          100: 'Scotland',
          101: 'Japan',
          102: 'South Korea'}
s = pd.Series(sports)

In [26]:
s[0] #This won't call s.iloc[0] as one might expect, it generates an error instead

KeyError: 0

In [27]:
s = pd.Series([100.00, 120.00, 101.00, 3.00])
s

0    100.0
1    120.0
2    101.0
3      3.0
dtype: float64

Doing some operations to get a value from the data. Lets get the price iterating over all the items

In [30]:
total = 0
for items in s:
    total += items
print(total)

324.0


Using `numpy` libraries to make this sum process much faster and more efficient 

In [32]:
import numpy as np
total = np.sum(s)
print(total)

324.0


In [44]:
s = pd.Series(np.random.randint(0,1000,10000))
s.head()  # reducing and printing out first 5 elements of the series

0    477
1     52
2    533
3    945
4    637
dtype: int32

Just to check in we could check the length of series by `len` 

In [34]:
len(s)

10000

We are going to use `Cellular Magic Functions`. These starts with two `%%` sign

`timeit` this functions runs the code few times to determine on avg how long it takes 

In [39]:
%%timeit -n 100  # here the timeit runs a 100 run
summary = 0
for items in s:
    summary+=items

1.12 ms ± 53.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


Using numpy library it could be done much faster with more efficiency 

In [40]:
%%timeit -n 100
summary = np.sum(s)

70.6 µs ± 10.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [45]:
s+=2 #adds two to each item in s using broadcasting
s.head()

0    479
1     54
2    535
3    947
4    639
dtype: int32

Series using `set_value` method

In [46]:
%%timeit -n 10
s = pd.Series(np.random.randint(0,1000,10000))
for label, value in s.iteritems():
    s.loc[label]= value+2

490 ms ± 18.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


using functional libraries

In [48]:
%%timeit -n 10
s = pd.Series(np.random.randint(0,1000,10000))
s+=2

296 µs ± 60.9 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


`loc` not only let you lock the data but also add values as well

In [49]:
s = pd.Series([1,2,3])
s.loc['Animal'] = 'Bear'
s

0            1
1            2
2            3
Animal    Bear
dtype: object

Now looking at the example of where `index` values are not unique

In [53]:
original_sports = pd.Series({'Archery': 'Bhutan',
                             'Golf': 'Scotland',
                             'Sumo': 'Japan',
                             'Taekwondo': 'South Korea'})
cricket_loving_countries = pd.Series(['Australia',
                                     'India',
                                     'Pakistan',
                                     'England'],index=['Cricket',
                                                         'Cricket',
                                                         'Cricket',
                                                         'Cricket'])
all_countries = original_sports.append(cricket_loving_countries)

In [54]:
original_sports

Archery           Bhutan
Golf            Scotland
Sumo               Japan
Taekwondo    South Korea
dtype: object

In [55]:
cricket_loving_countries

Cricket    Australia
Cricket        India
Cricket     Pakistan
Cricket      England
dtype: object

In [56]:
all_countries

Archery           Bhutan
Golf            Scotland
Sumo               Japan
Taekwondo    South Korea
Cricket        Australia
Cricket            India
Cricket         Pakistan
Cricket          England
dtype: object

In [57]:
all_countries.loc['Cricket']

Cricket    Australia
Cricket        India
Cricket     Pakistan
Cricket      England
dtype: object