<img src = "images/Right logo Transparent Big.png" width = 300, align = "center">

<h1 align=center><font size = 5>ARRAYS IN PYTHON</font></h1>

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">
<li><a href="#ref0">About the Dataset</a></li>
<li><a href="#ref1">Multidimensional Lists</a></li>
<li><a href="#ref3"> Numpy Arrays </a></li>
<li><a href="#ref4">Using Logical Conditions to Subset Arrays</a></p></li>
</div>

----------------

<a id="ref0"></a>
<h1 align=center><font size = 5>About the Dataset</font></h1>

Imagine you got many album recommendations from your friends and compiled all of the recommendations in a table, with specific info about each album.

The table has one row for each album and several columns

- **artist** - Name of the artist
- **album** - Name of the album
- **released_year** - Year the album was released
- **length_min_sec** - Length of the album (hours,minutes,seconds)
- **genre** - Genre of the album
- **music_recording_sales_millions** - Music recording sales (millions in USD)
- **claimed_sales_millions** - Album's claimed sales (millions in USD)
- **date_released** - Date on which the album was released
- **soundtrack** - Indicates if the album is the movie soundtrack (Y) or (N)
- **rating_of_friends** - Indicates the rating from your friends from 1 to 10
<br>

You can see the dataset below:

In [1]:
import pandas as pd
bigmart = pd.read_csv("dataset/music_dataset.csv")
bigmart.head()

Unnamed: 0,artist,album,released_year,length_min_sec,genre,music_recording_sales_millions,claimed_sales_millions,date_released,soundtrack,rating_of_friends
0,Michael Jackson,Thriller,1982,42:19:00,"Pop, rock, R&B",46.0,65,30/11/82,N,10.0
1,AC/DC,Back in Black,1980,42:11:00,Hard rock,26.1,50,25/07/80,N,8.5
2,Pink Floyd,The Dark Side of the Moon,1973,42:49:00,Prigressive rock,24.2,45,01/03/73,N,9.5
3,Whtney Houston,The Bodyguard,1992,57:44:00,"R&B, soul, pop",27.4,44,17/11/92,Y,7.5
4,Meat Loaf,Bat Out of Hell,1977,46:33:00,"Hard rock, progressive rock",20.6,43,21/10/77,N,7.0


#### What is an Array?

An array is a collection of ordered elements. In python, we have arrays in the form of `Lists`.

<a id="ref1"></a>
<h2 align=center> Multidimensional Arrays </h2>
<br>
You can construct multidimensional arrays (multidimensional lists), like a 2 x 2 table of 2 rows and 2 columns. We can even create a 2 x 2 x 2 array.

#### Let's create a 1 x 8 array and then a 4 x 2 array (4 rows, 2 columns)
The example below is a list of 8 album names.

In [2]:
album_names = ["Thriller","Back in Black",\
                "The Dark side of the Moon","The BodyGuard", \
                "Bat Out of Hell","Their Greatest Hits(1971-1975)",\
                "Saturday Night Fever","Rumours"]
album_names

['Thriller',
 'Back in Black',
 'The Dark side of the Moon',
 'The BodyGuard',
 'Bat Out of Hell',
 'Their Greatest Hits(1971-1975)',
 'Saturday Night Fever',
 'Rumours']

We would like to convert the 1x8 list in to the following 4x2 list:

 <a ><img src = "https://ibm.box.com/shared/static/jhfg95h18uzqkpm45xxwjfxtqp3x2jan.png" width = 700, align = "center"></a>
  <h4 align=center> Figure 1: An example of a rectangular array 
  </h4>

 <a id="ref3"></a>
<h2 align=center> Numpy Arrays 
 </h2>

In many applications you will require numerical operations. The python library “`numpy`” provides us with a data structure called a numpy array. This array is used in many python libraries and is useful in its own right. First we must import numpy. Here we import it as np to save keystrokes.


In [3]:
import numpy as np

 First create a multi-dimensional list with 4 rows and 2 columns:

In [4]:
album_list=[["Thriller","Back in Black"],\
                        ["The Dark side of the Moon","The BodyGuard"], \
                        ["Bat Out of Hell","Their Greatest Hits(1971-1975)"],\
                        ["Saturday Night Fever","Rumours"]]

we then "cast" the list to a numpy array:

In [5]:
album_array = np.array(album_list)
album_array                        

array([['Thriller', 'Back in Black'],
       ['The Dark side of the Moon', 'The BodyGuard'],
       ['Bat Out of Hell', 'Their Greatest Hits(1971-1975)'],
       ['Saturday Night Fever', 'Rumours']], dtype='|S30')

The part of the code “np.array()” casts the list “album_list” to an numpy array  

Note that **arrays are created column-wise**. Did you also notice that there were only 8 album names, but the array was 4 x 2? 

Let's look at our array again:

In [6]:
album_array  

array([['Thriller', 'Back in Black'],
       ['The Dark side of the Moon', 'The BodyGuard'],
       ['Bat Out of Hell', 'Their Greatest Hits(1971-1975)'],
       ['Saturday Night Fever', 'Rumours']], dtype='|S30')

To access an element of an array, we should pass in **[row, column]** as the row and column number of that element.  
For example, here we retrieve **Their Greatest Hits(1971-1975)** from row 3 and column 2:

In [7]:
album_array[2,1] #[row, column]

'Their Greatest Hits(1971-1975)'

 We can visualize this as the yellow element in the rectangular array:

 <a ><img src = "https://ibm.box.com/shared/static/6aott4bv1lfxwx74azg2e1x3gy53uj7s.png" width = 700, align = "center"></a>
  <h4 align=center> Figure 3:An array with row two  and column one in yellow:  
  </h4>

To display all the elements of the first row, we should put 1 in the row and nothing in the column part. Be sure to keep in the comma after the `1`.

In [8]:
album_array[0,]

array(['Thriller', 'Back in Black'], dtype='|S30')

An example of the zeroth row of an array in yellow:

 <a ><img src = "https://ibm.box.com/shared/static/dyqzk21fm0yw715e0lbjocdwvwlhadrr.png" width = 700, align = "center"></a>
  <h4 align=center> Figure 4:An array with zeroth row in yellow 
  </h4>

Likewise, you can get the elements by column as shown below.

In [9]:
album_array[:,1]

array(['Back in Black', 'The BodyGuard', 'Their Greatest Hits(1971-1975)',
       'Rumours'], dtype='|S30')

To get the shape of the array, **shape** should be used.

 We can color column 1 in yellow 

 <a ><img src = "https://ibm.box.com/shared/static/m5k2iat4hx8gnyzuewueh54t9ctmm2gp.png" width = 700, align = "center"></a>
  <h4 align=center> Figure 5: An array with column 1​ in yellow 
  </h4>

In [10]:
album_array.shape

(4, 2)

The flowing figure my be helpful for visualizing the size of the array:   

 <a ><img src = "https://ibm.box.com/shared/static/6bjzygrnpyxkgjfubt0wbgcp1txi68gc.png" width = 900, align = "center"></a>
  <h4 align=center> Figure 6: An array with dimensions superimposed on side 
  </h4>

To get the datatype of an array , **dtype** should be used.

In [11]:
album_array.dtype

dtype('S30')

<a id="ref4"></a>
<h2 align=center> Using Logical Conditions to Subset Arrays </h2>

Now let us explore how to use logical conditions to subset arrays.

Using this array of True and False values, we can subset the array of album names:

In [12]:
album_array 

array([['Thriller', 'Back in Black'],
       ['The Dark side of the Moon', 'The BodyGuard'],
       ['Bat Out of Hell', 'Their Greatest Hits(1971-1975)'],
       ['Saturday Night Fever', 'Rumours']], dtype='|S30')

In [13]:
album_array =='Thriller'

array([[ True, False],
       [False, False],
       [False, False],
       [False, False]])

 You can create a subset of the array with elements that correspond to those values, for example you can create a new array the contains only element Thriller:  

In [14]:
Thriller_array=album_array[album_array =='Thriller']
Thriller_array

array(['Thriller'], dtype='|S30')

If the value does not exist the operation will not return anything. For example’s say you want find the album “The College Dropout”:


In [15]:
album_array[album_array =='The College Dropout']

array([], dtype='|S30')

 the operation returns an empty array.

---