# Coding Exercise #0104

### 1. Pandas Series:

In [1]:
import pandas as pd
import numpy as np

#### 1.1. Creating Series objects:

In [2]:
my_index = ['a','b','c']
my_values = [222,333,444]
my_dict = {'a':222, 'b':333, 'c':444}
my_arr = np.array(my_values)

In [3]:
# Series with values.
pd.Series(my_values)

0    222
1    333
2    444
dtype: int64

In [4]:
# Series with values and index labels.
pd.Series(data=my_values, index=my_index)

a    222
b    333
c    444
dtype: int64

In [5]:
# Series from a NumPy array.
pd.Series(my_arr)

0    222
1    333
2    444
dtype: int32

In [6]:
# Series from a NumPy array with index labels.
pd.Series(my_arr, index=my_index)

a    222
b    333
c    444
dtype: int32

In [7]:
# Series from a dictionary.
pd.Series(my_dict)

a    222
b    333
c    444
dtype: int64

#### 1.2. Series attributes and indexing:

In [8]:
s = pd.Series(data=[111,222,333,444], index = ['a','b','c','d'], name='MySeries')
s

a    111
b    222
c    333
d    444
Name: MySeries, dtype: int64

Series attributes:

In [9]:
s.index                                   # Index labels.

Index(['a', 'b', 'c', 'd'], dtype='object')

In [10]:
s.name                                    # Name attribute.

'MySeries'

In [11]:
s.dtype                                   # Data type.

dtype('int64')

In [12]:
s.values                                  # The values as NumPy array.

array([111, 222, 333, 444], dtype=int64)

Series indexing and slicing:

In [13]:
s[1]

222

In [14]:
s['a']

111

In [15]:
s[2:4]

c    333
d    444
Name: MySeries, dtype: int64

In [16]:
s[['a','d']]

a    111
d    444
Name: MySeries, dtype: int64

#### 1.3. Series operations:

In [17]:
s1 = pd.Series(data=[1,2,3,4], index = ['d','b','c','a'])
s2 = pd.Series(data=[1,2,3,4], index = ['a','b','d','e'])

Please notice that in the Series to Series operations, the elements are matched by the index labels. <br>
When there is no mathing index label, we get **NaN**.

In [18]:
s1 + s2

a    5.0
b    4.0
c    NaN
d    4.0
e    NaN
dtype: float64

In [19]:
s1 - s2

a    3.0
b    0.0
c    NaN
d   -2.0
e    NaN
dtype: float64

In [20]:
s1 * s2

a    4.0
b    4.0
c    NaN
d    3.0
e    NaN
dtype: float64

In [21]:
s1/s2

a    4.000000
b    1.000000
c         NaN
d    0.333333
e         NaN
dtype: float64

In [22]:
2*s1

d    2
b    4
c    6
a    8
dtype: int64

#### 1.4. Series methods:

In [23]:
s1.sum()

10

In [24]:
s1.mean()

2.5

In [25]:
s1.median()

2.5

In [26]:
s1.max()

4

In [27]:
s1.std()

1.2909944487358056

In [28]:
s1.sort_values()

d    1
b    2
c    3
a    4
dtype: int64

In [29]:
s1.sort_index()

a    4
b    2
c    3
d    1
dtype: int64

apply() method:

In [30]:
ser_height = pd.Series([165.3, 170.1, 175.0, 182.1, 168.0, 162.0, 155.2, 176.9, 178.5, 176.1,
                        167.1, 180.0, 162.2, 176.1, 158.2, 168.6, 169.2],name='height')
ser_height

0     165.3
1     170.1
2     175.0
3     182.1
4     168.0
5     162.0
6     155.2
7     176.9
8     178.5
9     176.1
10    167.1
11    180.0
12    162.2
13    176.1
14    158.2
15    168.6
16    169.2
Name: height, dtype: float64

In [31]:
ser_height.apply(lambda x: x/100)

0     1.653
1     1.701
2     1.750
3     1.821
4     1.680
5     1.620
6     1.552
7     1.769
8     1.785
9     1.761
10    1.671
11    1.800
12    1.622
13    1.761
14    1.582
15    1.686
16    1.692
Name: height, dtype: float64

In [32]:
# Add

Create a series of objects from Python dictionary objects.

In [33]:
sdata = {'Ohio': 35000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000}
obj1 = pd.Series(sdata)
obj1

Ohio      35000
Texas     71000
Oregon    16000
Utah       5000
dtype: int64

When creating a series of objects with only dictionary objects, the key values of dictionaries enter in order. You can also index them yourself.

In [34]:
states = ['California', 'Ohio', 'Oregon', 'Texas']
obj2 = pd.Series(sdata, index=states)
obj2

California        NaN
Ohio          35000.0
Oregon        16000.0
Texas         71000.0
dtype: float64

Only three of the values in sdata in this example can be checked, because the values for 'California' cannot be found. This value is denoted as NaN(not a number), and is treated as a missing value or NA value in pandas. 'Utah' is not included in the states, so it is excluded from the execution result.

inull and notnull functions are used to find missing data.

In [35]:
pd.isnull(obj2)

California     True
Ohio          False
Oregon        False
Texas         False
dtype: bool

In [36]:
pd.notnull(obj2)

California    False
Ohio           True
Oregon         True
Texas          True
dtype: bool

This method also exists as an instance method of the series.

In [37]:
obj2.isnull()

California     True
Ohio          False
Oregon        False
Texas         False
dtype: bool

A useful function of the series is automatic aligning with indeces and labels in arithmetic operations.

In [38]:
obj1

Ohio      35000
Texas     71000
Oregon    16000
Utah       5000
dtype: int64

In [39]:
obj2

California        NaN
Ohio          35000.0
Oregon        16000.0
Texas         71000.0
dtype: float64

In [40]:
obj1 + obj2

California         NaN
Ohio           70000.0
Oregon         32000.0
Texas         142000.0
Utah               NaN
dtype: float64

Both the series object and the series index have a name attribute, which is closely related to the core function of pandas.

In [41]:
obj2.name = 'population'
obj2.index.name = 'state'
obj2

state
California        NaN
Ohio          35000.0
Oregon        16000.0
Texas         71000.0
Name: population, dtype: float64

The index of the series can be changed through substitution.

In [42]:
obj1

Ohio      35000
Texas     71000
Oregon    16000
Utah       5000
dtype: int64

In [43]:
obj1.index = ['Bob', 'Steve', 'Jeff', 'Ryan']
obj1

Bob      35000
Steve    71000
Jeff     16000
Ryan      5000
dtype: int64