# Series

- A series is a one-dimensional array of indexed data;

- In other words, an __array of data__ associcated with an array of __labels called the index of the array__

----------------------

### Importing the Necessary Modules

In [3]:
import numpy as np
import pandas as pd
from string import ascii_uppercase      # To import letters
from random import choices, seed         # To generate a series of random numbers

### Creating Pandas Series 

#### Creating an empty Series

In [4]:
ser_0 = pd.Series(dtype = 'int32')
ser_0

Series([], dtype: int32)

#### Check the Type

In [5]:
type(ser_0)

pandas.core.series.Series

#### 1. Creating a Series from a list

In [6]:
seed(234)
my_data = choices(range(30), k = 5)

In [7]:
ser_1 = pd.Series(my_data)
ser_1

0    10
1    25
2    25
3    27
4    15
dtype: int64

> we can see that the data is displayed on the right, and the indecies on the left (the default indexing is integers starting from **zero**)

> We can change the index by providing our own indecies, in this case __labels__

In [8]:
labels = list(ascii_uppercase[:5])

In [9]:
ser_2 = pd.Series(my_data, index= labels)
ser_2

A    10
B    25
C    25
D    27
E    15
dtype: int64

> #### 2. Creating a Series from a Numpy Array

In [10]:
arr = np.random.randint(1, 30, size = 5)
arr

array([23, 14,  4, 16,  8])

 > It is preferrable to have a __Series__ with an index identifying each data point. We can simply do that by providing an array-like data and passe it the __index argument__

In [11]:
ser_3 = pd.Series(arr, index = labels)
ser_3

A    23
B    14
C     4
D    16
E     8
dtype: int64

In [12]:
# If we don't provide the index, the series will be indexed with its default (0 up length -1)
pd.Series(arr)

0    23
1    14
2     4
3    16
4     8
dtype: int64

> #### 3. Creating a Series from a Dictionary

---

   - We will created a dictionayr randomly using the dictionary comprehension technique
   
   
   - Series Created from a dictionary will take the __keys__ automaticallh to be the index and the __values__ to be the data

In [13]:
labels = list(ascii_uppercase[:5])

In [14]:
my_dict = {labels.pop(0):x for x in choices(range(30), k = 5)}
my_dict

{'A': 14, 'B': 27, 'C': 28, 'D': 29, 'E': 10}

In [15]:
ser_4 = pd.Series(my_dict)
ser_4

A    14
B    27
C    28
D    29
E    10
dtype: int64

### Creating a Series of Scalar

  If data is a scalar value, an index must be provided. The value will be repeated to match the length of index.

In [16]:
scalar_ser = pd.Series(data = 10, index = labels)
scalar_ser

Series([], dtype: int64)

## Necessary Series Attributes

 - **values**: By passing the __values__ attribe to a series object allows us to get the array of the data.
 
 
 - **index**: We can retrieve the array of indecies using __index__ attribute
 
 - **dtype**: attribute to find the data type of the underlying data for the given Series object.

In [17]:
# The values of series 01
ser_1.values

array([10, 25, 25, 27, 15])

In [18]:
# The index of a series 01 (it is indexed by Panbas default values)
ser_1.index

RangeIndex(start=0, stop=5, step=1)

In [19]:
## The series 04  values 
ser_4.values

array([14, 27, 28, 29, 10])

In [20]:
## The series 04 indecies
ser_4.index

Index(['A', 'B', 'C', 'D', 'E'], dtype='object')

In [21]:
## The Data type of series
ser_1.dtype

dtype('int64')

In [22]:
ser_4.dtype

dtype('int64')

#### Practice Creating a Series from a dictionary

In [23]:
dict_data = {'Germany': 2500, 'Poland': 1500, 'Russia': 4300, 'Sweden': 1200}
pd.Series(dict_data)

Germany    2500
Poland     1500
Russia     4300
Sweden     1200
dtype: int64

### Series Data Types 

  Pandas Series objects can hold any data type (integers, strings, floating point numbers, Python objects, etc.).

In [24]:
names = ['Ahmed', 'Nassim', 'Idris']
names_ser = pd.Series(names, index = ['name1', 'name2', 'name3'])
names_ser

name1     Ahmed
name2    Nassim
name3     Idris
dtype: object

In [25]:
names_ser.dtype

dtype('O')

Object data type is pandas way of representing string objects

In [26]:
# Creating the Series
s = pd.Series([100, 500, 750, 900]) 
# Print the series
print(s)

0    100
1    500
2    750
3    900
dtype: int64


In [27]:
s.dtype

dtype('int64')

> **Question: Can a Series hold more than one data type at once?**

In [28]:
hyb = [1, 'a', True, 2.3, sum]

In [29]:
hyb_ser = pd.Series(hyb)
hyb_ser

0                          1
1                          a
2                       True
3                        2.3
4    <built-in function sum>
dtype: object

In [30]:
hyb_ser.dtype

dtype('O')

In [31]:
## Another Example:
pd.Series(data = [2., 3, 4, 5], index = ['b', 'e', 'f', 'w'])

b    2.0
e    3.0
f    4.0
w    5.0
dtype: float64

#### Creating a Series with a specific data type

In [32]:
pd.Series(data = [2.1, 3.4, 4.7, 5.8], index = ['b', 'e', 'f', 'w'], dtype = 'float32')

b    2.1
e    3.4
f    4.7
w    5.8
dtype: float32

> If we try to give a data type that is not compatible with the data, pandas with raise an error

In [33]:
pd.Series(data = [2.1, 3.4, 4.7, 5.8], index = ['b', 'e', 'f', 'w'], dtype = 'int32')

ValueError: Trying to coerce float values to integers

> Coercion to strings always guaranted to work

In [34]:
pd.Series(data = [2.1, 3.4, 4.7, 5.8], index = ['b', 'e', 'f', 'w'], dtype = 'str')

b    2.1
e    3.4
f    4.7
w    5.8
dtype: object

#### Funny Example: an Example with function objects

In [35]:
func_ser = pd.Series(data = [sum, min, max, sorted], index = ['func1', 'func2', 'func3', 'func4'])
func_ser

func1       <built-in function sum>
func2       <built-in function min>
func3       <built-in function max>
func4    <built-in function sorted>
dtype: object

## Naming a Series

  - The Series object and its index can have a __name__ attribute
  
#### 1. Naming the Series Object

   We can name a series object using the __name__ attribute

In [36]:
cities = pd.Series([42, 120, 9], index = ['Algeria', 'Egypt', 'Tunisia'])
cities.name = 'Population'
cities

Algeria     42
Egypt      120
Tunisia      9
Name: Population, dtype: int64

In [37]:
cities.name

'Population'

#### 2. Naming the index

 the index is named the same way as the series object

In [38]:
cities.index.name = 'Countries'
cities

Countries
Algeria     42
Egypt      120
Tunisia      9
Name: Population, dtype: int64

## Series Reindexing

   we can reindex the __series__ object whenever we need to. For example, after passing a dictionary without the index, the keys will be the index automatically, but we can override that by providing our own index

In [39]:
p_dict = {'Anas': 24, 'Walid': 28, 'Youcef': 21, 'Malik': 19}
people = pd.Series(p_dict)
people

Anas      24
Walid     28
Youcef    21
Malik     19
dtype: int64

In [40]:
names = ['Ahmed', 'Anas', 'Walid', 'Youcef', 'Malik']
people = pd.Series(p_dict, index = names)
people

Ahmed      NaN
Anas      24.0
Walid     28.0
Youcef    21.0
Malik     19.0
dtype: float64

> We see a new value associated with 'Ahmed', __NaN__ (**Not a Number**). This happened because the data points were placed in the appropriate location, but __Ahmed__ does not have a value so it is replace with __NaN__ which is the way of representing a __missing value__.

**Another nice feature is that the index can be altered in place by assignment. Thus, we can change that as shown in the example below**


In [41]:
people.index = ['Andro', 'Thomas', 'Peter', 'Donald', 'Harvey']
people

Andro      NaN
Thomas    24.0
Peter     28.0
Donald    21.0
Harvey    19.0
dtype: float64

### <u>Note:<u/> 

  > The same result can be achieved using __reindex__ attribute

In [42]:
people.reindex(['Andro', 'Thomas', 'Peter', 'Donald', 'Harvey'])

Andro      NaN
Thomas    24.0
Peter     28.0
Donald    21.0
Harvey    19.0
dtype: float64

> We still have NaN, we can assign a new value to it as follows

In [43]:
people['Andro'] = 26
people

Andro     26.0
Thomas    24.0
Peter     28.0
Donald    21.0
Harvey    19.0
dtype: float64

> We can change the values as well by reassigning a new value to associated index

In [44]:
people['Peter'] = 33
people

Andro     26.0
Thomas    24.0
Peter     33.0
Donald    21.0
Harvey    19.0
dtype: float64

### Renaming an Index:

   One way to change the index value is to use __rename__ method as shown Below

In [45]:
people.rename(index={'Peter':'Frank'},inplace=True)
people

Andro     26.0
Thomas    24.0
Frank     33.0
Donald    21.0
Harvey    19.0
dtype: float64

## Series Indexing (Lookup)

   Subsetting __series__ is similar to the lookup in python dictionary, we can use the values in the index to select a single value or a set of values.
   
   - It suffices to put the index in ['A']  
 

In [46]:
## Select the value with index A
ser_4['A']

14

Selection a set of values requires double square brackets [['a', 'b,]]

In [47]:
ser_4[['A', 'C', 'E']]

A    14
C    28
E    10
dtype: int64

### <u>Note:<u/>

> Series can be thought of as a fixed-lenght, ordered dictionary, as it is a mapping of index values to data values. This gives us the advantage of checking the existence of an index using __in__ operator 

In [48]:
'G' in ser_4

False

In [49]:
'D' in ser_4

True

## Operations with Series

   - NumPy array operations, such as filtering with a boolearn array, scalar multiplication or applying math functions, will preserve the __index-value link__. 
   
   
   - When there is no match between in the index among the series, a __NaN__ will be generated.
   

In [50]:
products_1 = pd.Series({'Coffee': 1200, 'Sugar': 2300, 'Tea': 12000})
products_1

Coffee     1200
Sugar      2300
Tea       12000
dtype: int64

In [51]:
products_2 = pd.Series({'tomato': 970,'Coffee': 1200, 'Sugar': 2300, 'Tea': 12000})
products_2

tomato      970
Coffee     1200
Sugar      2300
Tea       12000
dtype: int64

> #### Addition

In [52]:
products_1 + products_2

Coffee     2400.0
Sugar      4600.0
Tea       24000.0
tomato        NaN
dtype: float64

> #### Multiplication

In [53]:
products_1 * 5

Coffee     6000
Sugar     11500
Tea       60000
dtype: int64

> #### boolean Selection

In [54]:
products_2[products_2 > 1000]

Coffee     1200
Sugar      2300
Tea       12000
dtype: int64