# Pandas Series

Main difference between pandas series and numpy arrays  
    * Series can hold different data types
    * Series can have separate index labels for each element
    
Series has two parts **data** and the **index**. Data can be heterogenous

In [1]:
import pandas as pd

In [3]:
groceries = pd.Series(data=[5, "No", 3.6, 'Yes'], index=['eggs', 'bread', 'milk', 'cabbage'])
groceries

eggs         5
bread       No
milk       3.6
cabbage    Yes
dtype: object

In [43]:
# implicit initiation
fruits = pd.Series([10, 20, 30],['apples','oranges','bananas'])
fruits

apples     10
oranges    20
bananas    30
dtype: int64

In [4]:
groceries.shape

(4,)

In [5]:
groceries.size

4

In [6]:
groceries.ndim

1

In [7]:
groceries.index

Index(['eggs', 'bread', 'milk', 'cabbage'], dtype='object')

In [9]:
# groceries.data => error
groceries.values

array([5, 'No', 3.6, 'Yes'], dtype=object)

In [10]:
'eggs' in groceries.index

True

In [12]:
'banana' not in groceries.index

True

### Accessing and Deleting series elements

In [20]:
groceries['eggs']
# groceries['banana'] # key error

5

In [21]:
# notice double square brackets for multiple keys
groceries[['eggs', 'milk']]

eggs      5
milk    3.6
dtype: object

In [22]:
# number indices also work !
groceries[0]

5

In [23]:
groceries[-1]

'Yes'

In [24]:
groceries[[0,1]]

eggs      5
bread    No
dtype: object

**Explicitly index source**

* state numeric -iloc (integer location)
* series index - loc

In [25]:
groceries.iloc[[2, 3]]

milk       3.6
cabbage    Yes
dtype: object

In [26]:
groceries.loc[['eggs','milk']]

eggs      5
milk    3.6
dtype: object

**Mutable - like numpy arrays**

In [30]:
groceries[0] = 26
groceries

eggs        26
bread       No
milk       3.6
cabbage    Yes
dtype: object

**Deleting elements**

out of place - .drop
inplace -> set inplace=True

In [31]:
groceries = pd.Series(data=[5, "No", 3.6, 'Yes'], index=['eggs', 'bread', 'milk', 'cabbage'])
# note the circle braces ( )
groceries.drop('milk')

eggs        26
bread       No
cabbage    Yes
dtype: object

In [39]:
groceries

eggs        26
bread       No
cabbage    Yes
dtype: object

In [41]:
groceries = pd.Series(data=[5, "No", 3.6, 'Yes'], index=['eggs', 'bread', 'milk', 'cabbage'])
groceries.drop('milk', inplace=True)

In [37]:
groceries

eggs        26
bread       No
cabbage    Yes
dtype: object

### Arithmetic operations

In [44]:
fruits = pd.Series([10, 20, 30],['apples','oranges','bananas'])
fruits

apples     10
oranges    20
bananas    30
dtype: int64

In [45]:
fruits + 2

apples     12
oranges    22
bananas    32
dtype: int64

In [46]:
fruits - 2

apples      8
oranges    18
bananas    28
dtype: int64

In [47]:
fruits * 2

apples     20
oranges    40
bananas    60
dtype: int64

In [48]:
fruits / 2

apples      5.0
oranges    10.0
bananas    15.0
dtype: float64

**Use numpy operations on pandas Series**

In [49]:
import numpy as np
np.sqrt(fruits)

apples     3.162278
oranges    4.472136
bananas    5.477226
dtype: float64

In [51]:
np.exp(fruits)

apples     2.202647e+04
oranges    4.851652e+08
bananas    1.068647e+13
dtype: float64

**Arithmetics on individual elements**

In [54]:
fruits['apples'] + 23

33

In [56]:
fruits.iloc[1] + 55

75

In [57]:
fruits.loc[["apples","bananas"]] * 100

apples     1000
bananas    3000
dtype: int64

**Arithmetics on MIXED elements**

In [58]:
groceries = pd.Series(data=[5, "No", 3.6, 'Yes'], index=['eggs', 'bread', 'milk', 'cabbage'])
groceries

eggs         5
bread       No
milk       3.6
cabbage    Yes
dtype: object

Here, multiplication operation is defined for both strings and numbers, the code does not error out. But division is not supported by strings so it will error out 

In [59]:
groceries * 2

eggs           10
bread        NoNo
milk          7.2
cabbage    YesYes
dtype: object

In [60]:
groceries / 2

TypeError: unsupported operand type(s) for /: 'str' and 'int'

In [78]:
# Quiz:
# Create a Pandas Series that contains the distance of some planets from the Sun.
# Use the name of the planets as the index to your Pandas Series, and the distance
# from the Sun as your data. The distance from the Sun is in units of 10^6 km

distance_from_sun = [149.6, 1433.5, 227.9, 108.2, 778.6]

planets = ['Earth','Saturn', 'Mars','Venus', 'Jupiter']

# Create a Pandas Series using the above data, with the name of the planets as
# the index and the distance from the Sun as your data.
dist_planets = pd.Series(distance_from_sun, planets)

# Calculate the number of minutes it takes sunlight to reach each planet. You can
# do this by dividing the distance from the Sun for each planet by the speed of light.
# Since in the data above the distance from the Sun is in units of 10^6 km, you can
# use a value for the speed of light of c = 18, since light travels 18 x 10^6 km/minute.
time_light = dist_planets / 18

# Use Boolean indexing to select only those planets for which sunlight takes less
# than 40 minutes to reach them.
close_planets = time_light[time_light < 40]
close_planets

Earth     8.311111
Mars     12.661111
Venus     6.011111
dtype: float64