<a href="https://colab.research.google.com/github/ECV21/Course-Data-Analysis-with-Python-FreeCodeCamp/blob/main/Pandas_FreeCodeCamp.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Pandas - Series

A Pandas Series is a one-dimensional array-like object that can hold varios types of data, including integers, floats, python objects, etc.

1. Homogeneous data: All elements in a Series are of the same data type
2. Labels: Each element is associated with a label(index), which can be used to access the elements
3. Array-like: Support operations similar to those of Numpy array, such as slicing and vectorized operations.

In [2]:
#Import library

import pandas as pd
import numpy as np

We'll start analyzing "The Group of Seven". Which is a political formed by Canada, France, Germany, Italy, Japan, the United Kingdom and the United States. We'll start by analyzing population, and for that, we'll use a pandas.Series object.

In [3]:
g7_pop = pd.Series([35.467, 63.951, 80.940, 60.665, 127.061, 64.511, 318.523])

In [4]:
g7_pop

0     35.467
1     63.951
2     80.940
3     60.665
4    127.061
5     64.511
6    318.523
dtype: float64

In [5]:
# Series can have a name:

g7_pop.name = 'G7 Population in millions'

In [6]:
g7_pop

0     35.467
1     63.951
2     80.940
3     60.665
4    127.061
5     64.511
6    318.523
Name: G7 Population in millions, dtype: float64

In [8]:
#Series are pretty similar to numpy arrays:

g7_pop.dtype

dtype('float64')

In [9]:
g7_pop.values

array([ 35.467,  63.951,  80.94 ,  60.665, 127.061,  64.511, 318.523])

In [10]:
type(g7_pop.values)

numpy.ndarray

In [11]:
# A series has a index, that's similar to the automatic index assigned to Python's lists:
g7_pop

0     35.467
1     63.951
2     80.940
3     60.665
4    127.061
5     64.511
6    318.523
Name: G7 Population in millions, dtype: float64

In [12]:
#see the first row
g7_pop[0]

35.467

In [13]:
#see the second row
g7_pop[1]

63.951

In [14]:
#see index of Serie

g7_pop.index

RangeIndex(start=0, stop=7, step=1)

In [15]:
# We can explicitly define the index:
g7_pop.index = [
    'Canada',
    'France',
    'Germany',
    'Italy',
    'Japan',
    'United Kingdom',
    'United States',
]

In [16]:
g7_pop

Canada             35.467
France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: G7 Population in millions, dtype: float64

We can actually create Series out of diccionaries

In [17]:
pd.Series({
    'Canada': 35.467,
    'France': 63.951,
    'Germany': 80.94,
    'Italy': 60.665,
    'Japan': 127.061,
    'United Kingdom': 64.511,
    'United States': 318.523
}, name='G7 Population in millions')

Canada             35.467
France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: G7 Population in millions, dtype: float64

In [18]:
pd.Series(
    [35.467, 63.951, 80.94, 60.665, 127.061, 64.511, 318.523],
    index=['Canada', 'France', 'Germany', 'Italy', 'Japan', 'United Kingdom',
       'United States'],
    name='G7 Population in millions')

Canada             35.467
France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: G7 Population in millions, dtype: float64

In [20]:
#We can also create Series out of the other series, specifying indexes:

pd.Series(g7_pop, index=['France', 'Germany', 'Italy', 'Spain'])

France     63.951
Germany    80.940
Italy      60.665
Spain         NaN
Name: G7 Population in millions, dtype: float64

#Indexing

Indexing works similary to lists and dictionaries, you use the index of the element you're looking for

In [21]:
g7_pop

Canada             35.467
France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: G7 Population in millions, dtype: float64

In [22]:
g7_pop['Canada']

35.467

In [23]:
#see JAPAN:
g7_pop['Japan']

127.061

In [25]:
#Numerical posistions can also be used, with the ILOC attribute:

g7_pop.iloc[0] #Canada

35.467

In [26]:
g7_pop.iloc[-1] #United States

318.523

In [28]:
#Selecting multiple elements of once:

g7_pop[['Italy', 'France']] #result is another Series

Italy     60.665
France    63.951
Name: G7 Population in millions, dtype: float64

In [29]:
g7_pop[[0,1]] #canda and France

Canada    35.467
France    63.951
Name: G7 Population in millions, dtype: float64

In [31]:
#Slicing also works, but important, in Pandas, the upper limit is also included:

g7_pop['Canada': 'Italy']

Canada     35.467
France     63.951
Germany    80.940
Italy      60.665
Name: G7 Population in millions, dtype: float64

#Conditional selection (boolena arrays)

The same boolean array techniques we saw applied to numpy arrays can be used for Pandas Series:

In [32]:
g7_pop


Canada             35.467
France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: G7 Population in millions, dtype: float64

In [33]:
g7_pop > 70

Canada            False
France            False
Germany            True
Italy             False
Japan              True
United Kingdom    False
United States      True
Name: G7 Population in millions, dtype: bool

In [34]:
g7_pop[g7_pop > 70]

Germany           80.940
Japan            127.061
United States    318.523
Name: G7 Population in millions, dtype: float64

In [35]:
#mean of population
g7_pop.mean()



107.30257142857144

In [36]:
#population of countries more than mean

g7_pop[g7_pop > g7_pop.mean()]

Japan            127.061
United States    318.523
Name: G7 Population in millions, dtype: float64

In [37]:
#standard desviation
g7_pop.std()

97.24996987121581

In [38]:
"""
~ not
| or
& and

"""

'\n~ not\n| or\n& and\n\n'

In [39]:
g7_pop[(g7_pop > g7_pop.mean() - g7_pop.std() / 2) | (g7_pop > g7_pop.mean() + g7_pop.std() / 2)]

France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: G7 Population in millions, dtype: float64

#Operations and methods

Series also support vectorized operations and aggregation functions as Numpy:

In [40]:
g7_pop

Canada             35.467
France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: G7 Population in millions, dtype: float64

In [41]:
g7_pop * 1_000_000

Canada             35467000.0
France             63951000.0
Germany            80940000.0
Italy              60665000.0
Japan             127061000.0
United Kingdom     64511000.0
United States     318523000.0
Name: G7 Population in millions, dtype: float64

In [42]:
g7_pop.mean()

107.30257142857144

In [43]:
#logarithm

np.log(g7_pop)

Canada            3.568603
France            4.158117
Germany           4.393708
Italy             4.105367
Japan             4.844667
United Kingdom    4.166836
United States     5.763695
Name: G7 Population in millions, dtype: float64

In [44]:
g7_pop['France': 'Italy'].mean()

68.51866666666666

#Boolean arrays

In [45]:
g7_pop

Canada             35.467
France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: G7 Population in millions, dtype: float64

In [46]:
g7_pop > 80

Canada            False
France            False
Germany            True
Italy             False
Japan              True
United Kingdom    False
United States      True
Name: G7 Population in millions, dtype: bool

In [47]:
g7_pop[g7_pop > 80]

Germany           80.940
Japan            127.061
United States    318.523
Name: G7 Population in millions, dtype: float64

In [48]:
#countries with populations more than 80 OR less than 40
g7_pop[(g7_pop > 80) | (g7_pop < 40)]

Canada            35.467
Germany           80.940
Japan            127.061
United States    318.523
Name: G7 Population in millions, dtype: float64

In [49]:
g7_pop[(g7_pop > 80) & (g7_pop < 200)]

Germany     80.940
Japan      127.061
Name: G7 Population in millions, dtype: float64

#Modifying series