<a href="https://colab.research.google.com/github/drpetros11111/Tensorflow_Portilia/blob/Numpy/01_Series.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

___

<a href='http://www.pieriandata.com'><img src='../Pierian_Data_Logo.png'/></a>
___
<center><em>Copyright Pierian Data</em></center>
<center><em>For more information, visit us at <a href='http://www.pieriandata.com'>www.pieriandata.com</a></em></center>

# Series
The first main data type we will learn about for pandas is the Series data type. Let's import Pandas and explore the Series object.

A Series is very similar to a NumPy array (in fact it is built on top of the NumPy array object). What differentiates the NumPy array from a Series, is that a Series can have axis labels, meaning it can be indexed by a label, instead of just a number location. It also doesn't need to hold numeric data, it can hold any arbitrary Python Object.

Let's explore this concept through some examples:

In [1]:
import numpy as np
import pandas as pd

## Creating a Series

You can convert a list,numpy array, or dictionary to a Series:

In [2]:
labels = ['a','b','c']
my_list = [10,20,30]
arr = np.array([10,20,30])
d = {'a':10,'b':20,'c':30}

# 1. labels = ['a', 'b', 'c']
This line creates a list called labels, containing the strings 'a', 'b', and 'c'.

    labels = ['a', 'b', 'c']

List: A list is a mutable, ordered sequence of elements in Python. It can contain various data types (e.g., strings, integers, etc.).

In this case, the list contains three strings: 'a', 'b', and 'c'.

--------------------------
# 2. my_list = [10, 20, 30]
This line creates another list called my_list, containing the integers 10, 20, and 30.

    my_list = [10, 20, 30]

This is another Python list, but this time it holds integers instead of strings.

The list contains the numbers 10, 20, and 30 in sequence.

---------------------------------
#3. arr = np.array([10, 20, 30])
This line creates a NumPy array called arr from the list [10, 20, 30].

    import numpy as np
    arr = np.array([10, 20, 30])

NumPy array:

A NumPy array is a powerful data structure provided by the NumPy library. It's more efficient for numerical computations than Python lists because it supports element-wise operations, matrix operations, etc.

The array contains the same values as my_list, but it is now a NumPy array instead of a regular Python list. This allows for more efficient mathematical operations.

------------------------------
#4. d = {'a': 10, 'b': 20, 'c': 30}
This line creates a dictionary called d, with key-value pairs.

    d = {'a': 10, 'b': 20, 'c': 30}

Dictionary:

A dictionary is a collection of key-value pairs in Python. It is unordered and indexed by unique keys. You can access values using the keys.

In this case, the dictionary d contains three key-value pairs:

'a' maps to 10.
'b' maps to 20.
'c' maps to 30.

--------------------
#Summary of the Code
labels is a list of string labels: ['a', 'b', 'c'].

my_list is a list of integers: [10, 20, 30].

arr is a NumPy array holding the same integers as my_list: [10, 20, 30].
d is a dictionary where the labels 'a', 'b', and 'c' are keys, and their corresponding values are 10, 20, and 30.

Each of these data structures serves a different purpose and is used in different contexts in Python:

Lists are useful for general-purpose collections.

NumPy arrays are great for numerical and scientific computing.

Dictionaries are used for mapping or associating data via keys and values.

# Lists vs. Series vs. Arrays
## 1. Python Lists

###Basic Structure:

A Python list is a built-in data structure that can hold multiple elements. It is ordered and mutable (elements can be changed).

###Data Types:

Lists can contain elements of different data types, such as integers, strings, floats, or even other lists.

###Indexing:

Lists are indexed using integers, starting from 0.

###Operations:
Lists support basic operations like appending, slicing, and removing elements, but they don't support element-wise operations as efficiently as NumPy arrays.

##Example:

    my_list = [10, "a", 3.14, [1, 2, 3]]
    print(my_list[0])  # Output: 10

----------------------------
##2. NumPy Arrays
###Basic Structure:

A NumPy array is a more efficient structure provided by the NumPy library, designed specifically for numerical computations.

###Data Types:

Unlike lists, a NumPy array is homogeneous—it can only store elements of the same data type (e.g., all integers or all floats).

###Indexing:

Like lists, NumPy arrays use integer-based indexing starting at 0, but they can be multi-dimensional (1D, 2D, etc.).

###Operations:

NumPy arrays are highly optimized for mathematical operations.

You can perform element-wise operations (e.g., addition, multiplication) across the entire array easily.

Example:

    import numpy as np
    arr = np.array([10, 20, 30])
    print(arr * 2)  # Output: [20 40 60]

------------------------
##3. Pandas Series
###Basic Structure:

A Pandas Series is a one-dimensional labeled array provided by the Pandas library.

It combines the functionality of both a list and a NumPy array, with labels (index) and efficient numerical operations.

###Data Types:

A Pandas Series can hold any data type, like lists, but it is often used for numerical data. The key feature is that it has an index (which can be custom labels or default integers).

###Indexing:

You can access elements using either the integer position or the custom labels.

###Operations:

Like NumPy arrays, you can perform element-wise operations on a Series, and it automatically aligns data by the index for operations like addition.

Example:

    import pandas as pd
    series = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
    print(series['a'])  # Output: 10

----------------------------
#Key Differences

Structure	General-purpose collection	Efficient, homogeneous array	Labeled 1D array

Data Types	Mixed data types allowed	Homogeneous (single type)	Mixed data types allowed

Indexing	Integer-based (0, 1, 2, ...)	Integer-based (0, 1, 2, ...)	Custom index or integer-based

Operations	Not efficient for math	Efficient for math (element-wise)	Efficient and label-aware

Mutability	Mutable	Mutable	Mutable
Performance	Slower for large data	Fast for numerical computations	Good for labeled data

Main Use	General-purpose lists	Fast mathematical calculations	Labeled data, especially in data analysis

----------------------
#Summary
Python lists are general-purpose, can hold different data types, and are useful for basic collection needs.

NumPy arrays are specialized for efficient numerical computations, especially with large datasets. They are faster than lists for operations like addition, multiplication, etc.

Pandas Series are like 1D labeled arrays. They combine the benefits of lists and NumPy arrays, providing labeled data structures with efficient math operations and automatic alignment by index.

Each structure serves different needs, so choosing the right one depends on your use case.

In [6]:
arr = np.array([10, 20, 30])
arr

array([10, 20, 30])

### Using Lists

In [7]:
pd.Series(data=my_list)

Unnamed: 0,0
0,10
1,20
2,30


In [3]:
pd.Series(data=my_list,index=labels)

Unnamed: 0,0
a,10
b,20
c,30


In [5]:
pd.Series(my_list,labels)

Unnamed: 0,0
a,10
b,20
c,30


### Using NumPy Arrays

In [8]:
pd.Series(arr)

Unnamed: 0,0
0,10
1,20
2,30


In [9]:
pd.Series(arr,labels)

Unnamed: 0,0
a,10
b,20
c,30


### Using Dictionaries

In [10]:
pd.Series(d)

Unnamed: 0,0
a,10
b,20
c,30


### Data in a Series

A pandas Series can hold a variety of object types:

In [11]:
pd.Series(data=labels)

Unnamed: 0,0
0,a
1,b
2,c


In [12]:
# Even functions (although unlikely that you will use this)
pd.Series([sum,print,len])

Unnamed: 0,0
0,<built-in function sum>
1,<built-in function print>
2,<built-in function len>


## Using an Index

The key to using a Series is understanding its index. Pandas makes use of these index names or numbers by allowing for fast look ups of information (works like a hash table or dictionary).

Let's see some examples of how to grab information from a Series. Let us create two sereis, ser1 and ser2:

In [13]:
sales_Q1 = pd.Series(data=[250,450,200,150],index = ['USA', 'China','India', 'Brazil'])

In [14]:
sales_Q1

Unnamed: 0,0
USA,250
China,450
India,200
Brazil,150


In [15]:
sales_Q2 = pd.Series([260,500,210,100],index = ['USA', 'China','India', 'Japan'])

In [17]:
sales_Q2

Unnamed: 0,0
USA,260
China,500
India,210
Japan,100


In [16]:
sales_Q1['USA']

250

In [18]:
# KEY ERROR!
# sales_Q1['Russia'] # wrong name!
# sales_Q1['USA '] # wrong string spacing!

Operations are then also done based off of index:

In [19]:
# We'll explore how to deal with this later on!
sales_Q1 + sales_Q2

Unnamed: 0,0
Brazil,
China,950.0
India,410.0
Japan,
USA,510.0


Let's stop here for now and move on to DataFrames, which will expand on the concept of Series!
# Great Job!