<a href="https://colab.research.google.com/github/akonak/IntroductionToMachineLearningForSecurityPros/blob/master/PythonList_DataDimensions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>




# Python Lists

=[*Product_Index*][*Region_Index*][*Season_Index*]

#Data Dimensions and Python Lists

Before discussing the details of Numpy Arrays, let us briefly think about the dimension of data.  Typically, we represent and store data as tables.  Consider the following table which stores the sales amounts of three products: A, B, and C.   

| Product |	Sales|
| --- | --- |
| A |	10|
| B |	20|
| C |	30|


Let us assume that we want to perform some operations on the sales column such as finding the average sales.  In Python, we can represent the sales column as a one-dimensional list. In Python, a list is created by placing all the list elements inside a square bracket [ ] with each element separated by a comma.  A list can have any number of elements that may be of different types (integer, float, string etc.)


A list of the sales of products named Sales would be represented as follows: 

In [0]:
Sales=[10, 20, 30]
print (Sales)

But, what happens to the product column? Do we need the product column? Although we can store the products as a list of strings, the products are implicitly represented by the index of the Sales list. We can consider the index of the list as the row number of the table. As long as we know that index 0 represents Product A, index 1 represents Product B, and index 2 Product C, we do not need to store the products explicitly.

In [0]:
print(Sales[0])

In this case, our data is one dimensional because we can map the products to the sales values using a single index, i.e, 0→A, 1→B, 2→C.  

| 0 |	1| 2|
| --- | --- |---|
| 10 |	20| 30 |



Now let us consider the following table which breaks down the sales of the products over two regions, East and West. 

| Product | Region | Sales |
|---------|--------|-------|
| A       | East   | 5     |
| A       | West   | 5     |
| B       | East   | 8     |
| B       | West   | 12    |
| C       | East   | 20    |
| C       | West   | 10    |

Can we represent the data using a one-dimensional Python list? Certainly, we can use a simple list as follows:




In [0]:
Sales=[5, 5, 8, 12, 20, 10]

However, a one-dimensional list is not an effective way of storing data in this case.  Consider that we want to find the total sales in Region East. To do so, we need to know which indexes correspond to Region East.  In this small table, this is not a problem but consider a large table with an arbitrary order of rows.   

We have a better option to organize the data. We can use a two-dimensional matrix where the rows represent products, and the columns the regions as follows.   

| Product | East | West |
|---------|------|------|
| A       | 5    | 5    |
| B       | 8    | 12   |
| C       | 20   | 10   |

To represent the matrix above, we use a two-dimensional array (or a nested list in Python) as follows:

In [0]:
Sales=[[5, 5],[8,12],[20, 10]]

Let us try the following statement:

In [0]:
Sales=[[5, 5],[8,12],[20, 10]]
print(Sales[0])

The output will be [5, 5].  The output may look a bit strange but is in fact very useful.  In this case, Sales[0] stores the list [5, 5] not a single value.  In other words, Sales[0] points to the first row of the matrix.  

Then, what about if we wish to access the individual sales data, for example, Product B-Region East sales?  The index of Product B's row is 1. Hence, Sales[1] corresponds to the sales of Product B over the regions.  Since the **index of the column** for Region East is 0, we can access Product B-Region East sales as:

In [0]:
print(Sales[1][0])

Now let us add more dimension to the sales data by considering seasons (the Season column).  

| Product | Region | Season | Sales |
|---------|--------|--------|-------|
| A       | East   | Fall   | 2     |
| A       | East   | Spring | 3     |
| A       | West   | Fall   | 3     |
| A       | West   | Spring | 2     |
| B       | East   | Fall   | 5     |
| B       | East   | Spring | 3     |
| B       | West   | Fall   | 7     |
| B       | West   | Spring | 5     |
| C       | East   | Fall   | 15    |
| C       | East   | Spring | 5     |
| C       | West   | Fall   | 7     |
| C       | West   | Spring | 3     |


In the matrix form, we can store the same data as follows:

| A    |      |        |  | B    |      |        |   | C    |      |        |
|------|------|--------|---|------|------|--------|---|------|------|--------|
|      | Fall | Spring |  |      | Fall | Spring |  |      | Fall | Spring |
| East | 2    | 3      |  | East | 5    | 3      |   | East | 15   | 5      |
| West | 3    | 2      |  | West | 7    | 5      |     | West | 7    | 3      |   

In Python, we need to use a nested list with three layers (a tree-dimensional array) as follows:

In [0]:
Sales=[ [[2,3],[3,2]], [[5,3],[7,5]], [[15,5],[7,3]] ]

We can access the matrix of the products, A, B, and C using Sales[0], Sales[1], and Sales[2], respectively.  We can access the sales values of Product A in Region West using Sales[0][1]. We can access the sales of Product A in Region West at Season Fall using Sales[0][1][0].  To access the individual data values, we need to use three indexes as follows:   

Sales=[*Product_Index*][*Region_Index*][*Season_Index*]

#NumPy Shape Function
Numpy shape function returns the shape **to** an array without changing its data. To use shape function, we need to import the numpy first 

In [0]:
import numpy as np
Sales=[ [[2,3],[3,2]], [[5,3],[7,5]], [[15,5],[7,3]] ]
np.shape(Sales)

(3, 2, 2)

**The output of** is not surprising.  The shape (3,2,2) means that the Sales list has three dimensions.  The size of the first dimension is three, meaning that the first index can have values 0, 1, and 2.  The second and third dimensions have the size of two.

Let us try the following statement:

In [0]:
np.shape(Sales[0])

The output suggest that Sales[0] is a 2 x 2 matrix.

Now let us run the following statement:

In [0]:
np.shape(Sales[0][1])

(2,)

Here the shape (2,) means that Sales[0][1] is a list indexed by a single index which runs from 0 to 1. We will discuss the function in the further section. 

#The index slicing operator (:)
Python's slicing operator (:) allows us to access and manipulate a range of items in a list by specifying where to start and where to end the range.

Let us start with a list named A.  We can retrieve the first four for elements of the list A as shown. 


In [0]:
A = [0,1,2,3,4,5,6,7]
print(A[0:4])

[0, 1, 2, 3]


0:4 is called a slice. The slice 0:4 starts at index 0 (included) and ends at index 4 (not included).
Since 0 is the first index, we can omit 0 in the slice 0:4 and write it as follows:


In [0]:
print (A[:4])

We can also take a slice within the middle of a list. For example,

In [0]:
print (A[2:5])

If we omit the end of a slice, all items after the indicated start index will be returned. 
For example, the following slice will return all items from index 2 to the end of the list (included).


In [0]:
print (A[2:])

[2, 3, 4, 5, 6, 7]
