![rmotr](https://user-images.githubusercontent.com/7065401/52071918-bda15380-2562-11e9-828c-7f95297e4a82.png)
<hr style="margin-bottom: 40px;">

<img src="https://user-images.githubusercontent.com/7065401/75165824-badf4680-5701-11ea-9c5b-5475b0a33abf.png"
    style="width:300px; float: right; margin: 0 40px 40px 40px;"></img>

# Pandas - Series


![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## Hands on! 

In [2]:
import pandas as pd
import numpy as np

## Pandas Series

We'll start analyzing "[The Group of Seven](https://en.wikipedia.org/wiki/Group_of_Seven)". Which is a political formed by Canada, France, Germany, Italy, Japan, the United Kingdom and the United States. We'll start by analyzing population, and for that, we'll use a `pandas.Series` object.

In [3]:
# In millions
g7_pop = pd.Series([35.467, 63.951, 80.940, 60.665, 127.061, 64.511, 318.523])

In [4]:
g7_pop

0     35.467
1     63.951
2     80.940
3     60.665
4    127.061
5     64.511
6    318.523
dtype: float64

Someone might not know we're representing population in millions of inhabitants. Series can have a `name`, to better document the purpose of the Series:

In [5]:
g7_pop.name = 'G7 Population in millions'

In [6]:
g7_pop

0     35.467
1     63.951
2     80.940
3     60.665
4    127.061
5     64.511
6    318.523
Name: G7 Population in millions, dtype: float64

Series are pretty similar to numpy arrays:

In [8]:
g7_pop.dtype

dtype('float64')

In [9]:
g7_pop.values

array([ 35.467,  63.951,  80.94 ,  60.665, 127.061,  64.511, 318.523])

They're actually backed by numpy arrays:

In [10]:
type(g7_pop.values)

numpy.ndarray

And they _look_ like simple Python lists or Numpy Arrays. But they're actually more similar to Python `dict`s.

A Series has an `index`, that's similar to the automatic index assigned to Python's lists:

In [11]:
g7_pop

0     35.467
1     63.951
2     80.940
3     60.665
4    127.061
5     64.511
6    318.523
Name: G7 Population in millions, dtype: float64

In [12]:
g7_pop[0]

np.float64(35.467)

In [13]:
g7_pop[1]

np.float64(63.951)

In [14]:
g7_pop.index

RangeIndex(start=0, stop=7, step=1)

In [13]:
l = ['a', 'b', 'c']

But, in contrast to lists, we can explicitly define the index:

In [15]:
g7_pop.index = [
    'Canada',
    'France',
    'Germany',
    'Italy',
    'Japan',
    'United Kingdom',
    'United States',
]

In [16]:
g7_pop

Canada             35.467
France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: G7 Population in millions, dtype: float64

Compare it with the [following table](https://docs.google.com/spreadsheets/d/1IlorV2-Oh9Da1JAZ7weVw86PQrQydSMp-ydVMH135iI/edit?usp=sharing): 

<img width="350" src="https://user-images.githubusercontent.com/872296/38149656-b5ce9816-3431-11e8-88e4-195756e25355.png" />

We can say that Series look like "ordered dictionaries". We can actually create Series out of dictionaries:

In [17]:
pd.Series({
    'Canada': 35.467,
    'France': 63.951,
    'Germany': 80.94,
    'Italy': 60.665,
    'Japan': 127.061,
    'United Kingdom': 64.511,
    'United States': 318.523
}, name='G7 Population in millions')

Canada             35.467
France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: G7 Population in millions, dtype: float64

In [19]:
pd.Series(
    [35.467, 63.951, 80.94, 60.665, 127.061, 64.511, 318.523],
    index=['Canada', 'France', 'Germany', 'Italy', 'Japan', 'United Kingdom',
       'United States'],
    name='G7 Population in millions')

Canada             35.467
France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: G7 Population in millions, dtype: float64

You can also create Series out of other series, specifying indexes:

In [20]:
pd.Series(g7_pop, index=['France', 'Germany', 'Italy', 'Spain'])

France     63.951
Germany    80.940
Italy      60.665
Spain         NaN
Name: G7 Population in millions, dtype: float64

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Indexing

Indexing works similarly to lists and dictionaries, you use the **index** of the element you're looking for:

In [21]:
g7_pop

Canada             35.467
France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: G7 Population in millions, dtype: float64

In [22]:
g7_pop['Canada']

np.float64(35.467)

In [23]:
g7_pop['Japan']

np.float64(127.061)

Numeric positions can also be used, with the `iloc` attribute:

In [24]:
g7_pop.iloc[0]

np.float64(35.467)

In [25]:
g7_pop.iloc[-1]

np.float64(318.523)

Selecting multiple elements at once:

In [26]:
g7_pop[['Italy', 'France']]

Italy     60.665
France    63.951
Name: G7 Population in millions, dtype: float64

_(The result is another Series)_

In [27]:
g7_pop.iloc[[0, 1]]

Canada    35.467
France    63.951
Name: G7 Population in millions, dtype: float64

Slicing also works, but **important**, in Pandas, the upper limit is also included:

In [28]:
g7_pop['Canada': 'Italy']

Canada     35.467
France     63.951
Germany    80.940
Italy      60.665
Name: G7 Population in millions, dtype: float64

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Conditional selection (boolean arrays)

The same boolean array techniques we saw applied to numpy arrays can be used for Pandas `Series`:

In [29]:
g7_pop

Canada             35.467
France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: G7 Population in millions, dtype: float64

In [30]:
g7_pop > 70

Canada            False
France            False
Germany            True
Italy             False
Japan              True
United Kingdom    False
United States      True
Name: G7 Population in millions, dtype: bool

In [31]:
g7_pop[g7_pop > 70]

Germany           80.940
Japan            127.061
United States    318.523
Name: G7 Population in millions, dtype: float64

In [32]:
g7_pop.mean()

np.float64(107.30257142857144)

In [33]:
g7_pop[g7_pop > g7_pop.mean()]

Japan            127.061
United States    318.523
Name: G7 Population in millions, dtype: float64

In [34]:
g7_pop.std()

np.float64(97.24996987121581)

In [None]:
~ not
| or
& and

In [35]:
g7_pop[(g7_pop > g7_pop.mean() - g7_pop.std() / 2) | (g7_pop > g7_pop.mean() + g7_pop.std() / 2)]

France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: G7 Population in millions, dtype: float64

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Operations and methods
Series also support vectorized operations and aggregation functions as Numpy:

In [36]:
g7_pop

Canada             35.467
France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: G7 Population in millions, dtype: float64

In [37]:
g7_pop * 1_000_000

Canada             35467000.0
France             63951000.0
Germany            80940000.0
Italy              60665000.0
Japan             127061000.0
United Kingdom     64511000.0
United States     318523000.0
Name: G7 Population in millions, dtype: float64

In [38]:
g7_pop.mean()

np.float64(107.30257142857144)

In [39]:
np.log(g7_pop)

Canada            3.568603
France            4.158117
Germany           4.393708
Italy             4.105367
Japan             4.844667
United Kingdom    4.166836
United States     5.763695
Name: G7 Population in millions, dtype: float64

In [40]:
g7_pop['France': 'Italy'].mean()

np.float64(68.51866666666666)

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Boolean arrays
(Work in the same way as numpy)

In [41]:
g7_pop

Canada             35.467
France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: G7 Population in millions, dtype: float64

In [42]:
g7_pop > 80

Canada            False
France            False
Germany            True
Italy             False
Japan              True
United Kingdom    False
United States      True
Name: G7 Population in millions, dtype: bool

In [43]:
g7_pop[g7_pop > 80]

Germany           80.940
Japan            127.061
United States    318.523
Name: G7 Population in millions, dtype: float64

In [44]:
g7_pop[(g7_pop > 80) | (g7_pop < 40)]

Canada            35.467
Germany           80.940
Japan            127.061
United States    318.523
Name: G7 Population in millions, dtype: float64

In [45]:
g7_pop[(g7_pop > 80) & (g7_pop < 200)]

Germany     80.940
Japan      127.061
Name: G7 Population in millions, dtype: float64

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Modifying series


In [46]:
g7_pop['Canada'] = 40.5

In [47]:
g7_pop

Canada             40.500
France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: G7 Population in millions, dtype: float64

In [48]:
g7_pop.iloc[-1] = 500

In [49]:
g7_pop

Canada             40.500
France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     500.000
Name: G7 Population in millions, dtype: float64

In [50]:
g7_pop[g7_pop < 70]

Canada            40.500
France            63.951
Italy             60.665
United Kingdom    64.511
Name: G7 Population in millions, dtype: float64

In [51]:
g7_pop[g7_pop < 70] = 99.99

In [52]:
g7_pop

Canada             99.990
France             99.990
Germany            80.940
Italy              99.990
Japan             127.061
United Kingdom     99.990
United States     500.000
Name: G7 Population in millions, dtype: float64

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)


![rmotr](https://user-images.githubusercontent.com/7065401/52071918-bda15380-2562-11e9-828c-7f95297e4a82.png)
<hr style="margin-bottom: 40px;">

# Pandas Series exercises


In [1]:
# Import the numpy package under the name np
import numpy as np

# Import the pandas package under the name pd
import pandas as pd

# Print the pandas version and the configuration
print(pd.__version__)

2.3.3


![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## Series creation

### Create an empty pandas Series

In [2]:
# your code goes here
pd.Series()

Series([], dtype: object)

In [None]:
pd.Series()

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Given the X python list convert it to an Y pandas Series

In [8]:
# your code goes here
X = ['A','B','C','D']
X = pd.Series(X)
X

0    A
1    B
2    C
3    D
dtype: object

In [None]:
X = ['A','B','C']
print(X, type(X))

Y = pd.Series(X)
print(Y, type(Y)) # different type

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Given the X pandas Series, name it 'My letters'

In [9]:
# your code goes here
X.name = "My letters"
X

0    A
1    B
2    C
3    D
Name: My letters, dtype: object

In [None]:
X = pd.Series(['A','B','C'])

X.name = 'My letters'
X

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Given the X pandas Series, show its values


In [10]:
# your code goes here
X.values

array(['A', 'B', 'C', 'D'], dtype=object)

In [None]:
X = pd.Series(['A','B','C'])

X.values

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## Series indexation

### Assign index names to the given X pandas Series


In [14]:
# your code goes here
X.index = 'alpha','beta','gamma','delta'
X

alpha    A
beta     B
gamma    C
delta    D
Name: My letters, dtype: object

In [None]:
X = pd.Series(['A','B','C'])
index_names = ['first', 'second', 'third']

X.index = index_names
X

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Given the X pandas Series, show its first element


In [15]:
# your code goes here
X.iloc[0]

'A'

In [None]:
X = pd.Series(['A','B','C'], index=['first', 'second', 'third'])

#X[0] # by position
#X.iloc[0] # by position
X['first'] # by index

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Given the X pandas Series, show its last element


In [16]:
# your code goes here
X.iloc[-1]

'D'

In [None]:
X = pd.Series(['A','B','C'], index=['first', 'second', 'third'])

#X[-1] # by position
#X.iloc[-1] # by position
X['third'] # by index

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Given the X pandas Series, show all middle elements


In [17]:
# your code goes here
X.iloc[1:-1]

beta     B
gamma    C
Name: My letters, dtype: object

In [None]:
X = pd.Series(['A','B','C','D','E'],
              index=['first','second','third','forth','fifth'])

#X[['second', 'third', 'forth']]
#X.iloc[1:-1] # by position
X[1:-1] # by position

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Given the X pandas Series, show the elements in reverse position


In [18]:
# your code goes here
X[::-1]

delta    D
gamma    C
beta     B
alpha    A
Name: My letters, dtype: object

In [None]:
X = pd.Series(['A','B','C','D','E'],
              index=['first','second','third','forth','fifth'])

#X.iloc[::-1]
X[::-1]

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Given the X pandas Series, show the first and last elements


In [20]:
# your code goes here
X[[0,-1]]

  X[[0,-1]]


alpha    A
delta    D
Name: My letters, dtype: object

In [24]:
X[['alpha','beta']]

alpha    A
beta     B
Name: My letters, dtype: object

In [None]:
X = pd.Series(['A','B','C','D','E'],
              index=['first','second','third','forth','fifth'])

#X[['first', 'fifth']]
#X.iloc[[0, -1]]
X[[0, -1]]

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## Series manipulation

### Convert the given integer pandas Series to float


In [27]:
# your code goes here
X = pd.Series([1,2,3,4,5],
              index=['first','second','third','forth','fifth'])
pd.Series(X,dtype=np.float32)

first     1.0
second    2.0
third     3.0
forth     4.0
fifth     5.0
dtype: float32

In [None]:
X = pd.Series([1,2,3,4,5],
              index=['first','second','third','forth','fifth'])

pd.Series(X, dtype=np.float)

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Reverse the given pandas Series (first element becomes last)

In [None]:
# your code goes here


In [None]:
X = pd.Series([1,2,3,4,5],
              index=['first','second','third','forth','fifth'])

X[::-1]

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Order (sort) the given pandas Series


In [28]:
# your code goes here


In [29]:
X = pd.Series([4,2,5,1,3],
              index=['forth','second','fifth','first','third'])

X = X.sort_values()
X

first     1
second    2
third     3
forth     4
fifth     5
dtype: int64

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Given the X pandas Series, set the fifth element equal to 10


In [32]:
# your code goes here
X['fifth']=10
X

first      1
second     2
third      3
forth      4
fifth     10
dtype: int64

In [None]:
X = pd.Series([1,2,3,4,5],
              index=['A','B','C','D','E'])

X[4] = 10
X

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Given the X pandas Series, change all the middle elements to 0


In [None]:
# your code goes here


In [33]:
X = pd.Series([1,2,3,4,5],
              index=['A','B','C','D','E'])

X[1:-1] = 0
X

A    1
B    0
C    0
D    0
E    5
dtype: int64

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Given the X pandas Series, add 5 to every element


In [None]:
# your code goes here


In [34]:
X = pd.Series([1,2,3,4,5])

X + 5

0     6
1     7
2     8
3     9
4    10
dtype: int64

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## Series boolean arrays (also called masks)

### Given the X pandas Series, make a mask showing negative elements


In [None]:
# your code goes here


In [35]:
X = pd.Series([-1,2,0,-4,5,6,0,0,-9,10])

mask = X <= 0
mask

0     True
1    False
2     True
3     True
4    False
5    False
6     True
7     True
8     True
9    False
dtype: bool

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Given the X pandas Series, get the negative elements


In [None]:
# your code goes here


In [36]:
X = pd.Series([-1,2,0,-4,5,6,0,0,-9,10])

mask = X <= 0
X[mask]

0   -1
2    0
3   -4
6    0
7    0
8   -9
dtype: int64

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Given the X pandas Series, get numbers higher than 5


In [None]:
# your code goes here


In [37]:
X = pd.Series([-1,2,0,-4,5,6,0,0,-9,10])

mask = X > 5
X[mask]

5     6
9    10
dtype: int64

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Given the X pandas Series, get numbers higher than the elements mean

In [38]:
# your code goes here
X = pd.Series([-1,2,0,-4,5,6,0,0,-9,10])
mask = X > X.mean()
X[mask]

1     2
4     5
5     6
9    10
dtype: int64

In [39]:
X = pd.Series([-1,2,0,-4,5,6,0,0,-9,10])

mask = X > X.mean()
X[mask]

1     2
4     5
5     6
9    10
dtype: int64

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Given the X pandas Series, get numbers equal to 2 or 10


In [40]:
# your code goes here


In [41]:
X = pd.Series([-1,2,0,-4,5,6,0,0,-9,10])

mask = (X == 2) | (X == 10)
X[mask]

1     2
9    10
dtype: int64

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## Logic functions

### Given the X pandas Series, return True if none of its elements is zero

In [42]:
# your code goes here


In [43]:
X = pd.Series([-1,2,0,-4,5,6,0,0,-9,10])

X.all()

np.False_

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Given the X pandas Series, return True if any of its elements is zero


In [44]:
# your code goes here


In [45]:
X = pd.Series([-1,2,0,-4,5,6,0,0,-9,10])

X.any()

np.True_

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## Summary statistics

### Given the X pandas Series, show the sum of its elements


In [46]:
# your code goes here


In [47]:
X = pd.Series([3,5,6,7,2,3,4,9,4])

#np.sum(X)
X.sum()

np.int64(43)

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Given the X pandas Series, show the mean value of its elements

In [48]:
# your code goes here


In [49]:
X = pd.Series([1,2,0,4,5,6,0,0,9,10])

#np.mean(X)
X.mean()

np.float64(3.7)

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Given the X pandas Series, show the max value of its elements

In [50]:
# your code goes here


In [51]:
X = pd.Series([1,2,0,4,5,6,0,0,9,10])

#np.max(X)
X.max()

np.int64(10)

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)