# Series.

This section we will tackle Pandas Series, the Pandas equivalent of a column of data and cover the basic properties, creation, manipulation and useful functions for analysis.
Series are Pandas data structures built on top of NumPy arrays.
* Series also contain an Index and an optional name, in addition to the array of data.
* They can be created from other data types, but are usually imported from external sources.
* Two or more Series grouped together form a Pandas DataFrame.


In [1]:
import numpy as np
import pandas as pd # pd is standard alias for the Pandas library.

In [2]:
# Pandas Series function converts Python lists and NumPy arrays into Pandas Series.
# The name argument lets you specify a name.

sales = [0,5,155,0,518,0,1827,616,317,325]
sales_series = pd.Series(sales, name='Sales')
sales_series

# The seres function in turn stores the data with an additional field for indexing which starts with 0 to the last item on the series.
# The index can be edited to suit a custom pattern.

0       0
1       5
2     155
3       0
4     518
5       0
6    1827
7     616
8     317
9     325
Name: Sales, dtype: int64

Pandas Series have these key properties;
* values - the data array in the series.
* index - the index array in the series.
* name - the optional name for the series(Useful for accessing columns in a DataFrame)
* dtype - the data type of the elements in the value array.

In [3]:
sales_series.values

array([   0,    5,  155,    0,  518,    0, 1827,  616,  317,  325])

In [4]:
sales_series.index

RangeIndex(start=0, stop=10, step=1)

In [5]:
sales_series.name

'Sales'

In [6]:
sales_series.dtype

dtype('int64')

In [8]:
series = sales_series
series.mean()

376.3

In [9]:
series.values.mean()

376.3

In [10]:
series.index

RangeIndex(start=0, stop=10, step=1)

In [13]:
series.index = [0,10,20,30,40,50,60,70,80,90]
series

0        0
10       5
20     155
30       0
40     518
50       0
60    1827
70     616
80     317
90     325
Name: Sales, dtype: int64

In [14]:
series.name= 'special series'
series

0        0
10       5
20     155
30       0
40     518
50       0
60    1827
70     616
80     317
90     325
Name: special series, dtype: int64

# Pandas Data Types.

They mostly expand on their base Python and NumPy equivalents.
Numeric Data Types: 

NumPy Data types
* bool - boolean True/False
* int64 - Whole numbers
* float64 - Decimal numbers

Pandas Data types
* boolean - Nullable boolean True/False
* int64 - Nullable whole numbers
* float64 - Nullable decimal numbers

Object/Text data types:

Numpy
* object - Any python object

Pandas
* string - Only contains strings or text
* category - Maps categorical data to a numeric array for efficiency.

Time Series:

NumPy
* datetime64 - A single moment in time
* timedelta - The duration between two dates and times
* period - A span of time

# Type Conversion

You can convert the data type in Pandas Series by using the **.astype()** method and specifying the desired data type (if compatible)

In [18]:
series.astype('float')# Casts the int64 datatype to float datatype

0        0.0
10       5.0
20     155.0
30       0.0
40     518.0
50       0.0
60    1827.0
70     616.0
80     317.0
90     325.0
Name: special series, dtype: float64

In [19]:
series.astype('bool')# Casts the int64 datatype into boolean

0     False
10     True
20     True
30    False
40     True
50    False
60     True
70     True
80     True
90     True
Name: special series, dtype: bool

In [20]:
series.astype('datetime64') # Will throw an error since the int64 datatype cannot be casted to datetime format.

ValueError: The 'datetime64' dtype has no unit. Please pass in 'datetime64[ns]' instead.

# The Series Index
The index lets you easily access rows in a Pandas Series or DataFrame.  
You can slice and index Series like other sequence data types, but there is a more efficient method to be learnt later.

In [21]:
sales = [0,5,155,0,518]
sales_series = pd.Series(sales, name='Sales')
sales_series

0      0
1      5
2    155
3      0
4    518
Name: Sales, dtype: int64

In [22]:
sales_series[2]

155

In [23]:
sales_series[2:4] # Stop 

2    155
3      0
Name: Sales, dtype: int64

There are cases where it's applicable to use a custom index for accessing rows.

In [24]:
sales

[0, 5, 155, 0, 518]

In [26]:
items = ['coffee', 'bananas', 'tea', 'coconut', 'sugar']
sales_series = pd.Series(sales, index = items, name='sales')
sales_series

coffee       0
bananas      5
tea        155
coconut      0
sugar      518
Name: sales, dtype: int64

To use a custom list as custom index for the data, the list must be of the same length as the series values data.

In [28]:
sales_series['tea']

155

In [31]:
sales_series['bananas':'coconut'] # The stop point of slicing series data with custom labales is inclusive,
    # as opposed to the default indexing of the pandas series data.

bananas      5
tea        155
coconut      0
Name: sales, dtype: int64