# Init

Initiate Google Drive:

In [0]:
!pip install -U -q PyDrive
 
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
 
# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

List all files and file IDs in the a specific folder using Google Drive folder ID:

*Folder id below can be copied from URLs of folers in Google Drive (If the files are located at the top level of Google Drive, replace <FOLDER ID> with ‘root’). *

In [2]:
# List contents of the folder 'My Drive/__Projects__/Code/Notebooks/Python/test_data'
file_list = drive.ListFile({'q': "'1t3Izh8czGgM--HRp2Zm8PxtL6HObJy1b' in parents and trashed=false"}).GetList()
for file1 in file_list:
  print('title: %s, id: %s' % (file1['title'], file1['id']))

title: write_test.txt, id: 1pcIk1OU9s6oPQr_v2mhIkESNVDj0pSXG
title: string_print.txt, id: 1xFlxpk1ylR6QwCkiiMSzUAt-v80ueQiM
title: world_alcohol.csv, id: 1TFkaAOJ8tNoiZcsoCfgViCf2QCIqO-Th
title: my_data.csv, id: 16ezEoAaH7nWhVXAgCiN0s_xOX9Wh78jK
title: nfl.csv, id: 1er8UzTVmqKzJSvtZhUhkyXOMYjOHfVXJ
title: dummy.txt, id: 1sstgxpv_kEVDKeqGBcgaAXYxxXNCjuPf
title: askreddit_2015.csv, id: 16kHujyvU9HPosl8LEc0Gzz6T1h1D_qDz
title: legislators.csv, id: 1zV3pz5rwbDBXAaShtc96r20iib-y7GUn
title: open_close_test.txt, id: 19qpWOHaywbTo4ro8nosfjOMYMNK4e23v
title: dummy.csv, id: 1spjySxuaY5S2VZPr7mVQociLaYuTq4Lc


Import a specific file to local environment using its id, and name it in the local environment:

In [0]:
# import file by GDrive id
dataset_world_alcohol = drive.CreateFile({'id': '1TFkaAOJ8tNoiZcsoCfgViCf2QCIqO-Th'})
# name the file for local workspace
dataset_world_alcohol.GetContentFile('desired_name.csv')

After this point, the imported files can be called with the names assigned to them (e.g., 'desired_name.csv' as it is in the same folder with this Jupyter notebook:

In [4]:
import pandas as pd
my_data = pd.read_csv('desired_name.csv')
my_data.head()

Unnamed: 0,Year,WHO region,Country,Beverage Types,Display Value
0,1986,Western Pacific,Viet Nam,Wine,0.0
1,1986,Americas,Uruguay,Other,0.5
2,1985,Africa,Cte d'Ivoire,Wine,1.62
3,1986,Americas,Colombia,Beer,4.27
4,1987,Americas,Saint Kitts and Nevis,Beer,1.98


Import necessary datasets for this notebook:

In [0]:
# import file by GDrive id
dataset_world_alcohol = drive.CreateFile({'id': '1TFkaAOJ8tNoiZcsoCfgViCf2QCIqO-Th'})
# name the file for local workspace
dataset_world_alcohol.GetContentFile('world_alcohol.csv')

# NUMPY 

Also see: [DataQuest takeaways for NumPy](https://drive.google.com/uc?id=17UMatoy2kLkIFhf6z48DIy0rDiKvDOiK).

In [0]:
import numpy as np

## N-dimensional Arrays in Numpy

Each list in Python can be considered a one dimensional array. Numpy adds more dimensions to objects via its object type, ***n-dimensional*** (as in '2'-' or '- 3' dimensional) ***array*** or ***ndarray*** objects. Each ndarray object is the equivalent of a list of lists, constructed as a Numpy object.

Arrays with different dimensions, and their names (image source: dataquest.io)

<img src="https://drive.google.com/uc?id=1d2RT3JyddpO13RecgxGS8swaPbR856Zq" height="500" width="500"> 

*Image source: dataquest.io*

## Creating Arrays

Create ndarrays:

In [0]:
# one-dimensional array
my_vector = np.array([10, 20, 30])

# two-dimensional array (with an axis 0 that has only one row)
my_matrix_1 = np.array([[10, 20, 30]])

# two-dimensional array
my_matrix_2 = np.array([[1, 2, 3], [10, 20, 30]])

In [11]:
my_vector

array([10, 20, 30])

In [12]:
my_matrix_1

array([[10, 20, 30]])

In [13]:
my_matrix_2

array([[ 1,  2,  3],
       [10, 20, 30]])

Create an ndarray from a list:

In [54]:
my_nested_list = [[1, 2, 3],['a', 'b', 'c']]
np.array(my_nested_list)

array([['1', '2', '3'],
       ['a', 'b', 'c']], dtype='<U21')

## Basic Queries on a NumPy Array

### Array Dimensions
**.shape**

Returns dimensions of a ndarray as a tuple in "(no_of_rows, no_of_columns)" format

In [0]:
my_vector = np.array([10, 20, 30])
my_matrix = np.array([[1, 2, 3], [10, 20, 30]])

print(my_vector.shape)
print(my_matrix.shape)

(3,)
(2, 3)


### Array Type
**.dtype**

Every value in a numpy array are of the same type. 

Array types: 
- '**Float64**'   : 64-bit floating-point number
- '**uint32**'    : 32-bit unsigned integer 
- '**U75**'       : 75 byte unicode data type
- **(str, 35)**   : 35-character string
- **('U', 10)**   : 10-character unicode string

In [0]:
np.array(['a', 'b', 'c']).dtype

dtype('<U1')

In [0]:
np.array([1, 2, 3]).dtype

dtype('int64')

## Indexing and Slicing numpy Arrays
**[x,y] | [:, y] | [a:b, x:y] | [:,x:y] | [[a,b,c,d], y]** <br>
<br>
**[row]** <br>
**[row, column]** <br>
**: = every location** <br>
**a:b = range** <br>

*For another comparison of indexing with numpy and native Python lists, see [Indexing with Python](https://colab.research.google.com/drive/1g0JAnKc-PL20gstCDLQSdRNQK7RTjtcv#scrollTo=L6OKMpMI1ZrV) section in Python.ipynb.*

A comparison of lists of list methods and NumPy methods for selecting and slicing: 

<img src="https://drive.google.com/uc?id=1KylJ9k_KYXehldrofYvARXoNg0nGwUFW" height="500" width="500"> 
<img src="https://drive.google.com/uc?id=12RXkzq9WmTj9HVFBpQaMkdvDOrf1WZ1-" height="500" width="500"> 
<img src="https://drive.google.com/uc?id=1RxnOzIDWSXoF4nW3lxnuSfxz03oG00sg" height="500" width="500"> 
<img src="https://drive.google.com/uc?id=1gwAMBb48nDjnFwKHJQNCApLPXWf_wz87" height="500" width="500"> 
<img src="https://drive.google.com/uc?id=1izuCNUXNvZtoZ_3_FbQXR8rM3Sz0XiLV" height="500" width="500"> 

*Image sources: dataquest.io*

### Examples

In [0]:
my_ndarray = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

In [0]:
my_list = [
    ['a', 'b', 'c'],
    ['d', 'e', 'f'],
    ['g', 'h' ,'i']
]

**First row's third column**:

In [16]:
# for ndarray
row_1_column_3_N = my_ndarray[0,2]

# for list
row_1_column_3_L = my_list[0][2]


print(row_1_column_3_N, '\n', row_1_column_3_L)

3 
 c


**First row**:

In [17]:
# for ndarray
row_1_N = my_ndarray[0]
row_1_N_formal = my_ndarray[0,:]  # more formal notation

# for list

row_1_L = my_list[0]


print(row_1_N, '\n\n', row_1_N_formal, '\n\n', row_1_L )

[1 2 3] 

 [1 2 3] 

 ['a', 'b', 'c']


**First two rows:**

In [18]:
# for ndarray
rows_1_to_2_N = my_ndarray[:2]

# for list
rows_1_to_2_L = my_list[:2]


print(rows_1_to_2_N, '\n\n', rows_1_to_2_L)

[[1 2 3]
 [4 5 6]] 

 [['a', 'b', 'c'], ['d', 'e', 'f']]


**First column**:

In [19]:
# for ndarray
column_1_N = my_ndarray[:,0]

# for list
column_1_L = []
for each_row in my_list:
  column_1_L.append(each_row[0])

  
print(column_1_N, '\n\n', column_1_L)

[1 4 7] 

 ['a', 'd', 'g']


**First two columns**:

In [20]:
# for ndarray
columns_1_to_2_N = my_ndarray[:,:2]

# for list
columns_1_to_2_L = []
for each_row in my_list:
  columns_1_to_2_L.append(each_row[:2])

  
print(columns_1_to_2_N, '\n\n', columns_1_to_2_L)

[[1 2]
 [4 5]
 [7 8]] 

 [['a', 'b'], ['d', 'e'], ['g', 'h']]


**
**:

In [25]:
# for ndarray
rows_1_to_2_column_1_N = my_ndarray[1:3,2]

# for list
rows_1_to_2_column_1_L = []
rows_1_to_2_column_1_L.append(my_list[1][2])
rows_1_to_2_column_1_L.append(my_list[2][2])


print(rows_1_to_2_column_1_N, '\n\n', rows_1_to_2_column_1_L)

[6 9] 

 ['f', 'i']


**Rows 1 and 3 of the 1st column**:

In [26]:
# for ndarray
rows_1_and_3_column_1 = my_ndarray[[0,2],0]

print(rows_1_and_3_column_1)

[1 7]


**Items 2, 5, 7, and 9 of a vector (1-dimensional numpy array)**:

In [56]:
vector = np.array([1,2,3,4,5,6,7,8,9])
vector[[1,4,6,8]]

array([2, 5, 7, 9])

## Sorting Arrays

NumPy can sort arrays ordinally (for integers and floats) and alphabetically (for strings). Sorting is done in an ascending order (a descending-order sorting method in NumPy is not available at the time of this writing). 

<img src="https://drive.google.com/uc?id=1fyOzTre5JcUY0zjw1Cxp37eRxxIhpYJe" height="500"> 

*Image source: clokman*

### Sorting 1-Dimensional Arrays (Vectors)

In [0]:
my_1d_array = np.array([4,3,1,7,6,2,8,9,5])

**Get the *sort vector* **:
Get the index values of the hyptohetically sorted version of the array. That is, if the given array was to be sorted, the **indexes that return from this method represent the index positions that *the current items in the array would be moved to, if it was to be sorted**:

(Note that this command does not sort the array, but returns only the indices that can be used to do so)

In [98]:
sort_vector = my_1d_array.argsort()
sort_vector

array([2, 5, 1, 0, 8, 4, 3, 6, 7])

**Use the sort vector to sort the array**:

In [99]:
sorted_1d_array = my_1d_array[sort_vector]
sorted_1d_array

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

### Sorting 2-Dimensional Arrays (Matrices)

In [0]:
my_2d_array = np.array([
    [4, 5, 1],
    [7, 6, 3],
    [8, 9, 2]
])

**Extract the column of interest:** 

In [0]:
column_to_sort_by = my_2d_array[:,2]
column_to_sort_by

**Create the *sort vector* of the extracted column:**

In [0]:
sort_vector = column_to_sort_by.argsort()
sort_vector

**Sort the extracted column using the *sort vector*:**

*(to see if sorting is being done correctly)*

In [92]:
sorted_column = column_to_sort_by[sort_vector]
sorted_column

array([1, 2, 3])

**Sort the entire array using the *sort vector***:

*Essentially, what happens here is to use the sort vector to **request a version of the array in which the order of the rows are altered according to the sorting vector** (which itself reflects the ascending order of the row values in the column_to_sort_by)*

In [93]:
sorted_array = my_2d_array[sort_vector]
sorted_array

array([[4, 5, 1],
       [8, 9, 2],
       [7, 6, 3]])

## Vectorized Operations
***Vector Arithmetics***

A list of arithmetic operations that can be used for numpy vectors:

*Where possible, the default Python arithmetic operators (e.g., '+',  '/') can also be used, and they would produce the same effect.*

<table class="longtable docutils" border="1">
<colgroup>
<col width="10%">
<col width="90%">
</colgroup>
<tbody valign="top">
<tr class="row-odd"><td><a class="reference internal" href="generated/numpy.add.html#numpy.add" title="numpy.add"><code class="xref py py-obj docutils literal"><span class="pre">add</span></code></a>(x1,&nbsp;x2,&nbsp;/[,&nbsp;out,&nbsp;where,&nbsp;casting,&nbsp;order,&nbsp;...])</td>
<td>Add arguments element-wise.</td>
</tr>
<tr class="row-even"><td><a class="reference internal" href="generated/numpy.reciprocal.html#numpy.reciprocal" title="numpy.reciprocal"><code class="xref py py-obj docutils literal"><span class="pre">reciprocal</span></code></a>(x,&nbsp;/[,&nbsp;out,&nbsp;where,&nbsp;casting,&nbsp;...])</td>
<td>Return the reciprocal of the argument, element-wise.</td>
</tr>
<tr class="row-odd"><td><a class="reference internal" href="generated/numpy.positive.html#numpy.positive" title="numpy.positive"><code class="xref py py-obj docutils literal"><span class="pre">positive</span></code></a>(x,&nbsp;/[,&nbsp;out,&nbsp;where,&nbsp;casting,&nbsp;order,&nbsp;...])</td>
<td>Numerical positive, element-wise.</td>
</tr>
<tr class="row-even"><td><a class="reference internal" href="generated/numpy.negative.html#numpy.negative" title="numpy.negative"><code class="xref py py-obj docutils literal"><span class="pre">negative</span></code></a>(x,&nbsp;/[,&nbsp;out,&nbsp;where,&nbsp;casting,&nbsp;order,&nbsp;...])</td>
<td>Numerical negative, element-wise.</td>
</tr>
<tr class="row-odd"><td><a class="reference internal" href="generated/numpy.multiply.html#numpy.multiply" title="numpy.multiply"><code class="xref py py-obj docutils literal"><span class="pre">multiply</span></code></a>(x1,&nbsp;x2,&nbsp;/[,&nbsp;out,&nbsp;where,&nbsp;casting,&nbsp;...])</td>
<td>Multiply arguments element-wise.</td>
</tr>
<tr class="row-even"><td><a class="reference internal" href="generated/numpy.divide.html#numpy.divide" title="numpy.divide"><code class="xref py py-obj docutils literal"><span class="pre">divide</span></code></a>(x1,&nbsp;x2,&nbsp;/[,&nbsp;out,&nbsp;where,&nbsp;casting,&nbsp;...])</td>
<td>Divide arguments element-wise.</td>
</tr>
<tr class="row-odd"><td><a class="reference internal" href="generated/numpy.power.html#numpy.power" title="numpy.power"><code class="xref py py-obj docutils literal"><span class="pre">power</span></code></a>(x1,&nbsp;x2,&nbsp;/[,&nbsp;out,&nbsp;where,&nbsp;casting,&nbsp;...])</td>
<td>First array elements raised to powers from second array, element-wise.</td>
</tr>
<tr class="row-even"><td><a class="reference internal" href="generated/numpy.subtract.html#numpy.subtract" title="numpy.subtract"><code class="xref py py-obj docutils literal"><span class="pre">subtract</span></code></a>(x1,&nbsp;x2,&nbsp;/[,&nbsp;out,&nbsp;where,&nbsp;casting,&nbsp;...])</td>
<td>Subtract arguments, element-wise.</td>
</tr>
<tr class="row-odd"><td><a class="reference internal" href="generated/numpy.true_divide.html#numpy.true_divide" title="numpy.true_divide"><code class="xref py py-obj docutils literal"><span class="pre">true_divide</span></code></a>(x1,&nbsp;x2,&nbsp;/[,&nbsp;out,&nbsp;where,&nbsp;...])</td>
<td>Returns a true division of the inputs, element-wise.</td>
</tr>
<tr class="row-even"><td><a class="reference internal" href="generated/numpy.floor_divide.html#numpy.floor_divide" title="numpy.floor_divide"><code class="xref py py-obj docutils literal"><span class="pre">floor_divide</span></code></a>(x1,&nbsp;x2,&nbsp;/[,&nbsp;out,&nbsp;where,&nbsp;...])</td>
<td>Return the largest integer smaller or equal to the division of the inputs.</td>
</tr>
<tr class="row-odd"><td><a class="reference internal" href="generated/numpy.float_power.html#numpy.float_power" title="numpy.float_power"><code class="xref py py-obj docutils literal"><span class="pre">float_power</span></code></a>(x1,&nbsp;x2,&nbsp;/[,&nbsp;out,&nbsp;where,&nbsp;...])</td>
<td>First array elements raised to powers from second array, element-wise.</td>
</tr>
<tr class="row-even"><td><a class="reference internal" href="generated/numpy.fmod.html#numpy.fmod" title="numpy.fmod"><code class="xref py py-obj docutils literal"><span class="pre">fmod</span></code></a>(x1,&nbsp;x2,&nbsp;/[,&nbsp;out,&nbsp;where,&nbsp;casting,&nbsp;...])</td>
<td>Return the element-wise remainder of division.</td>
</tr>
<tr class="row-odd"><td><a class="reference internal" href="generated/numpy.mod.html#numpy.mod" title="numpy.mod"><code class="xref py py-obj docutils literal"><span class="pre">mod</span></code></a>(x1,&nbsp;x2,&nbsp;/[,&nbsp;out,&nbsp;where,&nbsp;casting,&nbsp;order,&nbsp;...])</td>
<td>Return element-wise remainder of division.</td>
</tr>
<tr class="row-even"><td><a class="reference internal" href="generated/numpy.modf.html#numpy.modf" title="numpy.modf"><code class="xref py py-obj docutils literal"><span class="pre">modf</span></code></a>(x[,&nbsp;out1,&nbsp;out2],&nbsp;/&nbsp;[[,&nbsp;out,&nbsp;where,&nbsp;...])</td>
<td>Return the fractional and integral parts of an array, element-wise.</td>
</tr>
<tr class="row-odd"><td><a class="reference internal" href="generated/numpy.remainder.html#numpy.remainder" title="numpy.remainder"><code class="xref py py-obj docutils literal"><span class="pre">remainder</span></code></a>(x1,&nbsp;x2,&nbsp;/[,&nbsp;out,&nbsp;where,&nbsp;casting,&nbsp;...])</td>
<td>Return element-wise remainder of division.</td>
</tr>
<tr class="row-even"><td><a class="reference internal" href="generated/numpy.divmod.html#numpy.divmod" title="numpy.divmod"><code class="xref py py-obj docutils literal"><span class="pre">divmod</span></code></a>(x1,&nbsp;x2[,&nbsp;out1,&nbsp;out2],&nbsp;/&nbsp;[[,&nbsp;out,&nbsp;...])</td>
<td>Return element-wise quotient and remainder simultaneously.</td>
</tr>
</tbody>
</table>


*(For more functions that may be applicable in vector operations, see the table source: https://docs.scipy.org/doc/numpy-1.14.0/reference/routines.math.html#arithmetic-operations)*

For all array operators and methods, see: https://docs.scipy.org/doc/numpy-1.14.0/reference/arrays.ndarray.html#calculation

### Function and Method Representation of Numpy Operations

Most  numpy methods can also be written with a functional style (but not all [e.g., median can only be written in the functional style]):

<table>
<thead>
<tr>
<th>Calculation</th>
<th>Function Representation</th>
<th>Method Representation</th>
</tr>
</thead>
<tbody>
<tr>
<td>Calculate the minimum value of <code>trip_mph</code></td>
<td><code>np.min(trip_mph)</code></td>
<td><code>trip_mph.min()</code></td>
</tr>
<tr>
<td>Calculate the maximum value of <code>trip_mph</code></td>
<td><code>np.max(trip_mph)</code></td>
<td><code>trip_mph.max()</code></td>
</tr>
<tr>
<td>Calculate the <a target="_blank" href="https://en.wikipedia.org/wiki/Mean">mean average</a> value of <code>trip_mph</code></td>
<td><code>np.mean(trip_mph)</code></td>
<td><code>trip_mph.mean()</code></td>
</tr>
<tr>
<td>Calculate the <a target="_blank" href="https://en.wikipedia.org/wiki/Median">median average</a> value of <code>trip_mph</code></td>
<td><code>np.median(trip_mph)</code></td>
<td>There is no ndarray median method</td>
</tr>
</tbody>
</table>
*Source: Dataquest.io*

**Method representation example**:

In [0]:
my_array = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

In [35]:
my_array.max()

9

**Function representation example**:

In [36]:
np.max(my_array)  # same with my_array.max() notation; this simply its function representation

9

### Native Functionality vs Numpy Methods

Sometimes native Python functionality may exist that mirror NumPy methods. (e.g., the native min() function of Python vs np.min() method of NumPy). It is important to note that such native functions do not take advantage of vectorization.

This does not apply to Python arithmetic operators such as '/' or '+' —if used instead of NumPy fucntionality, they result in the same operations (i.e. they utilize vectorization).

### Examples

In [0]:
my_array = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

**Vector division**:

In [38]:
my_array / 2

array([[0.5, 1. , 1.5],
       [2. , 2.5, 3. ],
       [3.5, 4. , 4.5]])

**Statistics**:


In [39]:
min = my_array.min()
max = my_array.max()
mean = my_array.mean()
median = np.median(my_array)
sum = my_array.sum()

print (' min: %d \n max: %d \n median: %d \n mean: %d \n sum: %d' % (min, max, median, mean, sum))

 min: 1 
 max: 9 
 median: 5 
 mean: 5 
 sum: 45


### Applying Methods to Column and Rows
*(Column sums and row sums)*

<img src="https://drive.google.com/uc?id=1E9lbCglVYUZ-zP6bH3bzD57toAAkr44R" height="500" width="500"> 
<img src="https://drive.google.com/uc?id=1r8DN_9D8FPGQj1x5KkjIsDY3Rf13KCpV" height="500" width="500"> 
<img src="https://drive.google.com/uc?id=1XSul3v8EMSS9cgo7-3lRZTOH9rGGRMxP" height="500" width="500"> 
<img src="https://drive.google.com/uc?id=1RlH_ek0T1VxEaHq4oqM_sVFtB5oy4-af" height="500" width="500"> 

Image source: dataquest.io

#### Examples

In [0]:
my_array = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

In [49]:
table_sum = my_array.sum()
table_sum

45

In [50]:
row_sums = my_array.sum(axis=0)
row_sums

array([12, 15, 18])

In [51]:
col_sums = my_array.sum(axis=1)
col_sums

array([ 6, 15, 24])

## Array Concatenation

### Ensuring Dimensionalty Match between Arrays

In [0]:
my_array = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])  # 2-dimensional array (i.e., a matrix)

zeros_array = np.array([0,0,0]) # 1-dimensional array (i.e., a vector)

Concatenation in NumPy is acomplished by adding two arrays to each other. The critical rule is both arrays must have the same number of dimensions. That is, a one-dimensional array cannot be added to a two-dimensional array.

Trying to add these two arrays to each other will given an error, because they have a different number of dimensions:

In [53]:
try:
    np.concatenate([my_array, zeros_array], axis=0)  # 'axis=0' adds as rows
except ValueError as error:
    print ('Error caught: ', error)


Error caught:  all the input arrays must have same number of dimensions


 To add a one-dimensional array (e.g., a single column or row) to a two-dimensional array (e.g., a dataset) the one-dimensional array must be first be converted to a two-dimensional array.

In [45]:
zeros_array_two_dimensional = np.expand_dims(zeros_array, axis=0)
np.concatenate([my_array,zeros_array_two_dimensional], axis=0)

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9],
       [0, 0, 0]])

### Adding New Elements (to 1-dimensional arrays/series)



In [46]:
one_dimensional_array = np.array(['a', 'b', 'c'])  
# Is  different from   "np.array([['a', 'b', 'c']])", which is a two-dimensional 
# array with one row

new_elements = np.array(['d', 'e', 'f'])  # a one-dimensional array (~series)

np.concatenate([one_dimensional_array, new_elements])  # no need to specify an axis

array(['a', 'b', 'c', 'd', 'e', 'f'], dtype='<U1')

### Adding New Rows (to 2-dimensional arrays/matrices)

In [47]:
two_dimensional_array_with_one_row = np.array([['a', 'b', 'c']]) 
# Is different from                 "np.array(['a', 'b', 'c'])", which is a one-
# dimensional array (~series)

new_row = np.array(['d', 'e', 'f'])
new_row = np.expand_dims(new_row, axis=0)  # 'axis=0' for row-wise (vertical) dimension


np.concatenate([two_dimensional_array_with_one_row, new_row], axis=0) # 'axis=0' 
# ... adds new array as a row

array([['a', 'b', 'c'],
       ['d', 'e', 'f']], dtype='<U1')

Alternative method:

In [69]:
two_dimensional_array_with_one_row = np.array([['a', 'b', 'c']]) 

new_row = np.array([['d', 'e', 'f']])  # extra brackets specify that new_row is 
                                       # a two-dimensional array (so no need to 
                                       # expand dimensions later)

np.concatenate([two_dimensional_array_with_one_row, new_row], axis=0)

array([['a', 'b', 'c'],
       ['d', 'e', 'f']], dtype='<U1')

### Adding New Columns

In [70]:
letters_array = np.array([
    ['A', 'B', 'C'],
    ['D', 'E', 'F']
])

new_column = np.array(['X', 'Y'])
new_column = np.expand_dims(new_column, axis=1)

np.concatenate([letters_array, new_column], axis=1)

array([['A', 'B', 'C', 'X'],
       ['D', 'E', 'F', 'Y']], dtype='<U1')

# TODO: REVISE AND COMPLETE THE BOOLEAN INDEXING SECTION FROM CODEQUEST
## Boolean Indexing

### Boolean Indexing for One-Dimensional Arrays (Vectors)

<img src="https://drive.google.com/uc?id=1sZVbSihboCuqTELfXy_lN6ryOW3bbFtA" height="500" width="500">

<img src="https://drive.google.com/uc?id=1WreKK8IoNpeSxuspso1g02FrfKUe5HjU" height="500" width="500">
<img src="https://drive.google.com/uc?id=1MJkACvyz7klog_LwOkpqocuD_PUulnrJ" height="500" width="500">



Image source: dataquest.io

**Create a 1-dimensioonal array (vector)**:

In [129]:
vector_array = np.array([1,2,3,4,5])
vector_array

array([1, 2, 3, 4, 5])

**Index the vector array for elements that match a criteria**:

In [127]:
bool_vector_array = vector_array < 4
bool_vector_array

array([ True,  True,  True, False, False])

**Use the boolean index for generating a filtered version of the vector array**:

In [0]:
filtered_vector_array = vector_array[bool_vector_array]
filtered_vector_array

### Boolean Indexing for Two-Dimensional Arrays (Matrices)

<img src="https://drive.google.com/uc?id=118MuPnXQf6dbMz7m50dshxyzCVtUeloH" height="500" width="500">

In [0]:
matrix_array = np.array([
    [1, 3, 5, 7],
    [1, 3, 5, 7],
    [2, 4, 6, 8],
    [2, 4, 6, 8]
])  # 2-dimensional array (i.e., a matrix)


In [164]:
bool_array = matrix_array > 5
bool_array

array([[False, False, False,  True],
       [False, False, False,  True],
       [False, False,  True,  True],
       [False, False,  True,  True]])

In [166]:
matrix_array[bool_array] = 100
matrix_array

array([[  1,   3,   5, 100],
       [  1,   3,   5, 100],
       [  2,   4, 100, 100],
       [  2,   4, 100, 100]])

## Importing Data from a CSV File into a NumPy Array


**Import data from CSV**:

A NumPy array can only contain one type of variable (e.g., str or int).

While creating arrays, NumPy tends to assume that data consists of floating point values (float).

In [112]:
my_data = np.genfromtxt("world_alcohol.csv", delimiter = ",")  # file name and delimiter are the two fundamental arguments for NumPy CSV import
print(my_data)

[[      nan       nan       nan       nan       nan]
 [1.986e+03       nan       nan       nan 0.000e+00]
 [1.986e+03       nan       nan       nan 5.000e-01]
 ...
 [1.986e+03       nan       nan       nan 2.540e+00]
 [1.987e+03       nan       nan       nan 0.000e+00]
 [1.986e+03       nan       nan       nan 5.150e+00]]


If the data contains strings (or other types than float), it could be beneficial to **specify what the data type (dtype)** should be:


In [113]:
my_data = np.genfromtxt("world_alcohol.csv", delimiter = "," , dtype="U75")
print(my_data)

[['Year' 'WHO region' 'Country' 'Beverage Types' 'Display Value']
 ['1986' 'Western Pacific' 'Viet Nam' 'Wine' '0']
 ['1986' 'Americas' 'Uruguay' 'Other' '0.5']
 ...
 ['1986' 'Europe' 'Switzerland' 'Spirits' '2.54']
 ['1987' 'Western Pacific' 'Papua New Guinea' 'Other' '0']
 ['1986' 'Africa' 'Swaziland' 'Other' '5.15']]


It may sometimes be desirable to omit the header row (e.g., if the data is needed in numerical form):

In [116]:
# in this case, headers are not desirable
my_data = np.genfromtxt("world_alcohol.csv", delimiter = "," , dtype="U75")
print(my_data[:,4])


['Display Value' '0' '0.5' ... '2.54' '0' '5.15']


In [117]:

# read CSV without headers
my_data = np.genfromtxt("world_alcohol.csv", delimiter = "," , dtype="U75", skip_header=True)
print(my_data[:,4])

['0' '0.5' '1.62' ... '2.54' '0' '5.15']


#TODO FOR NUMPY

## Generating Data in NumPy

In [0]:
# create a new column filled with `0`.
zeros = np.zeros([taxi_modified.shape[0], 1])
taxi_modified = np.concatenate([taxi, zeros], axis=1)
print(taxi_modified)

# PANDAS

Prep:

In [0]:
import os
os.getcwd()

'C:\\Users\\Clokman\\Google Drive\\__Projects__\\Code\\Notebooks\\Python'

In [0]:
import sys, os, re

working_directory = os.getcwd()
if re.search('\\\\Notebooks\\\\Python$', working_directory):
    kfir_directory = re.sub('\\\\Notebooks\\\\Python$', '\\KFIR', working_directory)
    sys.path.append(kfir_directory)
    
sys.path

['',
 'C:\\ProgramData\\Anaconda3\\python36.zip',
 'C:\\ProgramData\\Anaconda3\\DLLs',
 'C:\\ProgramData\\Anaconda3\\lib',
 'C:\\ProgramData\\Anaconda3',
 'C:\\ProgramData\\Anaconda3\\lib\\site-packages',
 'C:\\ProgramData\\Anaconda3\\lib\\site-packages\\Sphinx-1.5.1-py3.6.egg',
 'C:\\ProgramData\\Anaconda3\\lib\\site-packages\\win32',
 'C:\\ProgramData\\Anaconda3\\lib\\site-packages\\win32\\lib',
 'C:\\ProgramData\\Anaconda3\\lib\\site-packages\\Pythonwin',
 'C:\\ProgramData\\Anaconda3\\lib\\site-packages\\setuptools-27.2.0-py3.6.egg',
 'C:\\ProgramData\\Anaconda3\\lib\\site-packages\\IPython\\extensions',
 'C:\\Users\\Clokman\\.ipython',
 'C:\\Users\\Clokman\\Google Drive\\__Projects__\\Code',
 'C:\\Users\\Clokman\\Google Drive\\__Projects__\\Code\\KFIR',
 'C:\\Users\\Clokman\\Google Drive\\__Projects__\\Code\\KFIR',
 'C:\\Users\\Clokman\\Google Drive\\__Projects__\\Code\\KFIR']

## Data Structures

### DataFrame

Prep:

In [0]:
import pandas
import numpy
from numpy import array
from pandas import Series, DataFrame

Create empty dataframe:

In [0]:
df1 = DataFrame(columns=['Column A', 'Column B', 'Column C'])
df1

Unnamed: 0,Column A,Column B,Column C


Append data to specific row:

In [0]:
df2 = DataFrame( {'Column A' : Series([100], index=['Row 1'])} )
df1 = df1.append(df2)
df1


Unnamed: 0,Column A,Column B,Column C
Row 1,100.0,,


Alternative way to append to specific row with loc():

In [0]:
df1.loc['Row 1'] = (1,2,3)
df1.loc['Row 2'] = (3,4,5)
df1

Unnamed: 0,Column A,Column B,Column C
Row 1,1.0,2,3
Row 2,3.0,4,5


Append to end:

In [0]:
df1 = DataFrame(array([[2, 3, 4]]), columns=['Column A', 'Column B', 'Column C'])
df2 = DataFrame(array([[5, 6, 7]]), columns=['Column A', 'Column B', 'Column C'])
df1.append(df2, ignore_index=True)

Unnamed: 0,Column A,Column B,Column C
0,2,3,4
1,5,6,7


Create dataframe from data:

In [0]:
data = {'Column one' : Series([1, 2, 3],    index=['Row 1', 'Row 2', 'Row 3']),
        'Column two' : Series([1, 2, 3, 4], index=['Row 1', 'Row 2', 'Row 3', 'Row 4'])}

DataFrame(data)

Unnamed: 0,Column one,Column two
Row 1,1.0,1
Row 2,2.0,2
Row 3,3.0,3
Row 4,,4
